-
Notifications
You must be signed in to change notification settings - Fork 17
Streaming elements in a list #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @jxnl , The reason you're seeing this behaviour, is because the data from the iterator is first being buffered (up to the system default buffer size, which for me is 8096). If your data is longer than this buffer size, then you would see it being processed in chunks. This is a side-effect of using the iterable wrapping, I'm not really sure if there's a way around that, but I'm thinking about it. If you use an unbuffered stream, for example a network socket stream with buffering=0 as below, then you will not see this behaviour: import asyncio
import socket
import json_stream
json_string = b'{"tasks":[{"id":1,"title":"task1"},{"id":2,"title":"task2"},{"id":3,"title":"task3"}]}'
async def handle_client(_, writer):
for i in range(0, len(json_string), 15):
message = json_string[i:i + 15]
print(f"Sending: {message!r}")
writer.write(message)
await writer.drain()
await asyncio.sleep(1)
print("Closing connection")
writer.close()
await writer.wait_closed()
def test():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(('127.0.0.1', 8888))
f = sock.makefile(mode='b', buffering=0)
data = json_stream.load(f)
for task in data["tasks"]:
print(task)
async def main():
server = await asyncio.start_server(handle_client, '127.0.0.1', 8888)
addrs = ', '.join(str(sock.getsockname()) for sock in server.sockets)
print(f'Serving on {addrs}')
async with server:
await asyncio.to_thread(test)
asyncio.run(main()) |
Here is an example where all the data doesn't fit inside an 8096 length buffer: import json
import json_stream
def read_json_in_chunks(json_string, chunk_size=100):
for i in range(0, len(json_string), chunk_size):
ss = json_string[i:i+chunk_size]
yield ss
print('.', end='', flush=True)
data = {
'tasks': [{
'id': i,
'title': f'task{i}'
} for i in range(1000)],
}
data = json_stream.load(read_json_in_chunks(json.dumps(data).encode()))
for task in data["tasks"]:
print(task) |
Ah ha! I have found the issue. There was an unnecessary Removing this means that each iterator chunk is passed directly out to the tokenizer un-buffered. New version 2.3.2 on it's way soon. |
Is there any day to stream the data within a list one by one?
Currently it streams from
[
to]
before returningdata["tasks"][0]
or for
It streams everything first.
The text was updated successfully, but these errors were encountered: