Proposal: make it possible to observe chunk delimiters.

Motivation: urllib3's `stream` method allows users to request that they be streamed each chunk as it arrives. This is very difficult to do with h11 as it currently stands, because while h11 will try to emit one `Data` event per chunk, if the buffer contains a partial chunk h11 will prefer to emit that `Data` event and empty the buffer than to sit on it.

This is an entirely defensible design decision: while urllib3's users seem to want to be able to receive the chunks as they come in, chunk delimiters are not supposed to be *semantic*. However, for better or worse there are some use-cases where it is very helpful to know where chunk delimiters are.

There are three ways I can see of doing this:

1. Change h11's behaviour to emit `NEED_DATA` when a partial chunk is in the buffer, rather than a `Data` event for that partial chunk. This is probably inefficient in the case where people don't care about the chunk sizes, and also allows for pernicious behaviour where the user just keeps shoving data into h11's buffer without h11 ever being able to emit it. 

    (I should note that this is basically what h2 does with DATA frames: it emits one DataReceived event per frame. This is less problematic for h2 because of SETTINGS_MAX_FRAME_SIZE, which limits the total memory cost of buffering an entire frame.)

2. Add a flag to swap between the current mode and the mode described in (1), which defaults to the current mode. I think this is a bad idea, but I did want to bring it up for completeness' sake. This has all the downsides of (1) plus an extra bit of interface complexity and testing surface to go with it. Not recommended.

3. Add a flag to Data events that signal whether they mark the completed end of a chunk: otherwise keep the current behaviour the same. This would allow tools like urllib3 that want to care about where the chunk boundaries are basically just do a tight loop on `recv()` until they see a `Data` event with `end_chunk=True`. Because of h11's current semantics, any prior `Data` events that don't have that flag set are part of the same chunk as the one that does, and any subsequent `Data` events are part of a new chunk.

    This has the advantage of being the smallest logical change, it's likely pretty easy and preformant to implement, and it is extremely unobtrusive to users that don't care about this concept. Altogether I think this is the best of the three possibilities in terms of giving tools that care about this (and, to be clear: as much as possible tools should try *not* to care about this) the ability to get what they need, while keeping that unusual use-case as far away from affecting other users as possible.

Thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: make it possible to observe chunk delimiters. #19

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: make it possible to observe chunk delimiters. #19

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions