Streaming elements in a list #49

jxnl · 2023-06-29T09:48:26Z

Is there any day to stream the data within a list one by one?

# This is your existing generator that yields chunks of JSON string
def read_json_in_chunks(json_string):
    for i in range(0, len(json_string), 5):  # replace 5 with the chunk size you want
        ss = json_string[i : i + 5]
        yield ss
        print(f"generated {ss}")


import json_stream

json_string = b'{"tasks":[{"id":1,"title":"task1"},{"id":2,"title":"task2"},{"id":3,"title":"task3"}]}'
data = json_stream.load(read_json_in_chunks(json_string), persistent=True)

print(data["tasks"][0])

Currently it streams from [ to ] before returning data["tasks"][0]

or for

for task in data["tasks"]
   ...

It streams everything first.

The text was updated successfully, but these errors were encountered:

daggaz · 2023-06-29T12:50:52Z

Hi @jxnl ,

The reason you're seeing this behaviour, is because the data from the iterator is first being buffered (up to the system default buffer size, which for me is 8096).

If your data is longer than this buffer size, then you would see it being processed in chunks.

This is a side-effect of using the iterable wrapping, I'm not really sure if there's a way around that, but I'm thinking about it.

If you use an unbuffered stream, for example a network socket stream with buffering=0 as below, then you will not see this behaviour:

import asyncio
import socket

import json_stream

json_string = b'{"tasks":[{"id":1,"title":"task1"},{"id":2,"title":"task2"},{"id":3,"title":"task3"}]}'


async def handle_client(_, writer):
    for i in range(0, len(json_string), 15):
        message = json_string[i:i + 15]
        print(f"Sending: {message!r}")
        writer.write(message)
        await writer.drain()
        await asyncio.sleep(1)

    print("Closing connection")
    writer.close()
    await writer.wait_closed()


def test():
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect(('127.0.0.1', 8888))
    f = sock.makefile(mode='b', buffering=0)
    data = json_stream.load(f)

    for task in data["tasks"]:
        print(task)


async def main():
    server = await asyncio.start_server(handle_client, '127.0.0.1', 8888)
    addrs = ', '.join(str(sock.getsockname()) for sock in server.sockets)
    print(f'Serving on {addrs}')
    async with server:
        await asyncio.to_thread(test)


asyncio.run(main())

daggaz · 2023-06-29T13:00:57Z

Here is an example where all the data doesn't fit inside an 8096 length buffer:

import json

import json_stream


def read_json_in_chunks(json_string, chunk_size=100):
    for i in range(0, len(json_string), chunk_size):
        ss = json_string[i:i+chunk_size]
        yield ss
        print('.', end='', flush=True)


data = {
    'tasks': [{
        'id': i,
        'title': f'task{i}'
    } for i in range(1000)],
}
data = json_stream.load(read_json_in_chunks(json.dumps(data).encode()))

for task in data["tasks"]:
    print(task)

daggaz · 2023-06-30T09:26:17Z

Ah ha!

I have found the issue. There was an unnecessary io.BuffereReader wrapping the IterableStream that wraps the iterable.

Removing this means that each iterator chunk is passed directly out to the tokenizer un-buffered.

New version 2.3.2 on it's way soon.

daggaz mentioned this issue Jun 30, 2023

Add option to receive strings as file-like streams #45

Open

daggaz added a commit that referenced this issue Jun 30, 2023

re #49 - remove buffering from iterable wrapping

c81dfdd

daggaz closed this as completed Jun 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Streaming elements in a list #49

Streaming elements in a list #49

jxnl commented Jun 29, 2023

daggaz commented Jun 29, 2023

Uh oh!

daggaz commented Jun 29, 2023

Uh oh!

daggaz commented Jun 30, 2023

Uh oh!

Streaming elements in a list #49

Streaming elements in a list #49

Comments

jxnl commented Jun 29, 2023

daggaz commented Jun 29, 2023

Uh oh!

daggaz commented Jun 29, 2023

Uh oh!

daggaz commented Jun 30, 2023

Uh oh!