proto: streaming or scatter/gather API for wire encoding/decoding #912

dsnet · 2019-07-26T20:27:51Z

No description provided.

dsnet · 2019-07-26T20:30:19Z

This has impact on runtime/protoiface

neild · 2020-01-22T16:29:50Z

We still would like a fast streaming marshal/unmarshal API.

I believe that CL/215719 abstracts the fast-path API enough that we can add streaming at a later time without needing to add new methods to the protoiface.Methods struct. If I'm wrong, well, adding new methods isn't the end of the world; the fast-path API is designed for extension.

neild · 2020-01-22T16:49:49Z

There are two incompatible approaches we can take to a streaming marshal/unmarshal API, which each come with their own tradeoffs.

One is a streaming API, which does not require that all of the full encoded message be held in memory at once. This would allow you to, for example, marshal a gigabyte-large message to an io.Writer without the need to first allocate a gigabyte of output buffers.

The other is a scatter-gather API, which requires that the full encoded message be held in memory at once but does not require that it all be held in a single buffer. Essentially, this allows you to marshal to or from a [][]byte.

Scatter-gather is simpler to implement efficiently and allows for some optimizations that are impossible in a streaming API. For example, a scatter-gather marshal implementation could avoid the need for a per-message size cache field by encoding messages back-to-front.

There are obvious use cases for the full streaming API. (Marshaling that gigabyte-large message.) There are other cases where streaming is less useful than it might seem; for example, an RPC system which checks a signature on each received message before unmarshaling it probably needs to hold the entire encoded message in memory already.

I am inclined to say that scatter/gather is the better tradeoff; more memory consumption (but no worse than we require today) is a fair trade for simpler code and better performance.

puellanivis · 2020-01-22T17:25:05Z

I think real world usage of encoding/json can provide some tangential information. I have seen a lot of mistaken use of the streaming API from that package, even when handling just one message per reader/writer, and where most of the messages are extremely small. Most of these uses are not aware of the specific caveats of the streaming API, and this tends to lead people astray, rather than towards a simple correct solution. (e.g. Servers and Clients that will happily ignore pure garbage if it appears after a single validly encoded value.)

I think a full streaming API provides a very specific use case and purpose, which often gets in the way when people are just looking for the simple way to do something.

tamird · 2020-04-18T15:57:43Z

It is worth noting that a streaming API also allows the integrator to implement scatter-gather using io.MultiReader and io.MultiWriter.

neild · 2020-04-18T20:46:28Z

Unfortunately, an io.MultiReader or io.MultiWriter does not allow us to take advantage of efficiencies available when marshaling to or from a [][]byte.

The tradeoff is that a streaming API is more flexibile (as you point out, you can trivially convert a streaming operation to a scatter/gather one), but a more limited scatter/gather API is simpler to implement and enables performance optimizations that are difficult in the more general-purpose streaming API.

tamird · 2020-04-18T22:33:54Z

Can you help me understand what kinds of efficiencies you're referring to? An example of an optimization that is possible with [][]byte but isn't with io.{Reader,Writer} would be very helpful.

dsnet · 2020-04-19T01:08:53Z

An example of an optimization that is possible with [][]byte but isn't with io.{Reader,Writer} would be very helpful.

For one, a [][]byte provides random access, while a io.{Reader,Writer} does not. This difference is significant since the protobuf wire format requires computing the size before serializing the payload. In Go, we made this efficient using a size cache. Java, on the other hand, takes the approach of serializing messages backwards (not possible with an io.Writer).

dsnet · 2020-06-09T21:44:30Z

Was triaging the issue list and discovered that this is a duplicate of #609. Closing in favor of the older one.

dsnet assigned neild Jul 26, 2019

dsnet added the blocks-v2 label Jul 30, 2019

dsnet added this to the v2 release milestone Aug 21, 2019

neild removed the blocks-v2 label Jan 22, 2020

neild changed the title ~~APIv2: should there be a streaming API for the wire fast-path~~ proto: streaming or scatter/gather API for wire encoding/decoding Mar 3, 2020

dsnet removed this from the v2 release milestone Mar 4, 2020

neild mentioned this issue Mar 16, 2020

proto: add streaming APIs for Unmarshal/Marshal #507

Closed

dsnet closed this as completed Jun 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proto: streaming or scatter/gather API for wire encoding/decoding #912

proto: streaming or scatter/gather API for wire encoding/decoding #912

dsnet commented Jul 26, 2019

dsnet commented Jul 26, 2019

neild commented Jan 22, 2020

neild commented Jan 22, 2020

puellanivis commented Jan 22, 2020

tamird commented Apr 18, 2020

neild commented Apr 18, 2020

tamird commented Apr 18, 2020

dsnet commented Apr 19, 2020 •

edited

Loading

dsnet commented Jun 9, 2020

proto: streaming or scatter/gather API for wire encoding/decoding #912

proto: streaming or scatter/gather API for wire encoding/decoding #912

Comments

dsnet commented Jul 26, 2019

dsnet commented Jul 26, 2019

neild commented Jan 22, 2020

neild commented Jan 22, 2020

puellanivis commented Jan 22, 2020

tamird commented Apr 18, 2020

neild commented Apr 18, 2020

tamird commented Apr 18, 2020

dsnet commented Apr 19, 2020 • edited Loading

dsnet commented Jun 9, 2020

dsnet commented Apr 19, 2020 •

edited

Loading