Skip to content

proto: streaming or scatter/gather API for wire encoding/decoding #912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dsnet opened this issue Jul 26, 2019 · 9 comments
Closed

proto: streaming or scatter/gather API for wire encoding/decoding #912

dsnet opened this issue Jul 26, 2019 · 9 comments
Assignees

Comments

@dsnet
Copy link
Member

dsnet commented Jul 26, 2019

No description provided.

@dsnet
Copy link
Member Author

dsnet commented Jul 26, 2019

This has impact on runtime/protoiface

@dsnet dsnet added this to the v2 release milestone Aug 21, 2019
@neild
Copy link
Contributor

neild commented Jan 22, 2020

We still would like a fast streaming marshal/unmarshal API.

I believe that CL/215719 abstracts the fast-path API enough that we can add streaming at a later time without needing to add new methods to the protoiface.Methods struct. If I'm wrong, well, adding new methods isn't the end of the world; the fast-path API is designed for extension.

@neild neild removed the blocks-v2 label Jan 22, 2020
@neild
Copy link
Contributor

neild commented Jan 22, 2020

There are two incompatible approaches we can take to a streaming marshal/unmarshal API, which each come with their own tradeoffs.

One is a streaming API, which does not require that all of the full encoded message be held in memory at once. This would allow you to, for example, marshal a gigabyte-large message to an io.Writer without the need to first allocate a gigabyte of output buffers.

The other is a scatter-gather API, which requires that the full encoded message be held in memory at once but does not require that it all be held in a single buffer. Essentially, this allows you to marshal to or from a [][]byte.

Scatter-gather is simpler to implement efficiently and allows for some optimizations that are impossible in a streaming API. For example, a scatter-gather marshal implementation could avoid the need for a per-message size cache field by encoding messages back-to-front.

There are obvious use cases for the full streaming API. (Marshaling that gigabyte-large message.) There are other cases where streaming is less useful than it might seem; for example, an RPC system which checks a signature on each received message before unmarshaling it probably needs to hold the entire encoded message in memory already.

I am inclined to say that scatter/gather is the better tradeoff; more memory consumption (but no worse than we require today) is a fair trade for simpler code and better performance.

@puellanivis
Copy link
Collaborator

I think real world usage of encoding/json can provide some tangential information. I have seen a lot of mistaken use of the streaming API from that package, even when handling just one message per reader/writer, and where most of the messages are extremely small. Most of these uses are not aware of the specific caveats of the streaming API, and this tends to lead people astray, rather than towards a simple correct solution. (e.g. Servers and Clients that will happily ignore pure garbage if it appears after a single validly encoded value.)

I think a full streaming API provides a very specific use case and purpose, which often gets in the way when people are just looking for the simple way to do something.

@neild neild changed the title APIv2: should there be a streaming API for the wire fast-path proto: streaming or scatter/gather API for wire encoding/decoding Mar 3, 2020
@dsnet dsnet removed this from the v2 release milestone Mar 4, 2020
@tamird
Copy link
Contributor

tamird commented Apr 18, 2020

It is worth noting that a streaming API also allows the integrator to implement scatter-gather using io.MultiReader and io.MultiWriter.

@neild
Copy link
Contributor

neild commented Apr 18, 2020

Unfortunately, an io.MultiReader or io.MultiWriter does not allow us to take advantage of efficiencies available when marshaling to or from a [][]byte.

The tradeoff is that a streaming API is more flexibile (as you point out, you can trivially convert a streaming operation to a scatter/gather one), but a more limited scatter/gather API is simpler to implement and enables performance optimizations that are difficult in the more general-purpose streaming API.

@tamird
Copy link
Contributor

tamird commented Apr 18, 2020

Can you help me understand what kinds of efficiencies you're referring to? An example of an optimization that is possible with [][]byte but isn't with io.{Reader,Writer} would be very helpful.

@dsnet
Copy link
Member Author

dsnet commented Apr 19, 2020

An example of an optimization that is possible with [][]byte but isn't with io.{Reader,Writer} would be very helpful.

For one, a [][]byte provides random access, while a io.{Reader,Writer} does not. This difference is significant since the protobuf wire format requires computing the size before serializing the payload. In Go, we made this efficient using a size cache. Java, on the other hand, takes the approach of serializing messages backwards (not possible with an io.Writer).

@dsnet
Copy link
Member Author

dsnet commented Jun 9, 2020

Was triaging the issue list and discovered that this is a duplicate of #609. Closing in favor of the older one.

@dsnet dsnet closed this as completed Jun 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants