-
Notifications
You must be signed in to change notification settings - Fork 4.9k
avoid async overhead in ReadNextLineAsync when possible #23210
Conversation
@@ -26,7 +26,16 @@ public ChunkedEncodingReadStream(HttpConnection connection) | |||
Debug.Assert(_chunkBytesRemaining == 0); | |||
|
|||
// Start of chunk, read chunk size. | |||
ulong chunkSize = ParseHexSize(await _connection.ReadNextLineAsync(cancellationToken).ConfigureAwait(false)); | |||
ArraySegment<byte> line; | |||
while (!_connection.TryReadNextLine(out line)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it'll mess up the pattern here, but I'm curious if you see any throughput benefits by changing the signature to return the ArraySegment<byte>
with an out bool
, or if you return a (bool, ArraySegment<byte>)
. I'm specifically wondering about the write barriers that may be incurred from writing to the out array (we can check the asm to verify there is one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give it a try. The tuple return syntax is kinda cool anyway :)
{ | ||
if (!await _connection.FillAsync(cancellationToken).ConfigureAwait(false)) | ||
{ | ||
throw new IOException(SR.net_http_invalid_response); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth moving this throw into a helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, if every time we call FillAsync we expect it to successfully get more or else it's an error, the exception can just be moved into FillAsync.
@@ -771,7 +780,7 @@ private Task FillAsync(CancellationToken cancellationToken) | |||
int bytesRead = t.GetAwaiter().GetResult(); | |||
if (NetEventSource.IsEnabled) Trace($"Received {bytesRead} bytes."); | |||
_readLength += bytesRead; | |||
return Task.CompletedTask; | |||
return Task.FromResult(bytesRead > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FromResult doesn't return cached tasks (at least not currently). If we keep this returning Task<bool>
rather than throwing on EOF per my earlier question, we should cache this task:
private static readonly Task<bool> s_trueTask = Task.FromResult(true);
private static readonly Task<bool> s_falseTask = Task.FromResult(false);
...
return bytesRead > 0 ? s_trueTask : s_falseTask;
There's an active discussion about whether FromResult should be changed to pull from the same task cache that async methods do for synchronously completing methods, and if that changes this could be undone, but for now, let's manually cache.
Is it worth instead having a method that returns ValueTask and bidding the need to call fill yourself? Then the optimisation is hidden as an implementation detail from the user rather than a new pattern? |
|
I thought if you can do the operation sync then you could just return sync with a value task and if it needs async then defer to an async method. If not my mistake. |
If a task-returning async method completes synchronously, it'll try to use a cached task, and in the case of |
I think you guys are talking about different things. @Drawaes I think you are suggesting something like this:
Am I understanding? We do use this pattern elsewhere, I should probably use it here. (edited to fix the async method function name) |
Yeah that is exactly what i was thinking. Keeping the sync / async part internal to the implementation rather than leaking it to the consumer. |
Ah, I misunderstood the question then. As long as doing so doesn't negatively impact the throughput, sounds good. Note that there is still a downside to this, though, that the current/manually inlined version addresses: if the await inside the ReadNextLineAsync yields, all of the machinery associated with yielding in an async method will occur for that method, which means another task/delegate/state machine/etc. allocated. All of that is ammortized away when it's manually inlined into the async call site (e.g. it shares in the same state machine, the same delegate, etc.) |
Is that any better? or it makes no difference? |
If FillAsync may complete synchronously, it's likely worse. |
Can we track porting this to netfx please? |
Track porting what? |
TryReadNextLine? Hang on, let me double check. |
None of this code exists on netfx. This whole ManagedHandler component is new. |
I'm sorry, never mind. I saw want I wanted to see. I keep wanting sync |
😄 |
I should try out pipelines before asking for that even. Perhaps that will provide a cleaner solution. |
You shouldn't need this in core 2.0 anymore, at least for Socket and NetworkStream. The async methods will complete immediately if the operation can be complete synchronously. Not sure about higher level APIs -- but if they don't complete synchronously when the underlying NetworkStream/Socket does, then that's something we should look at fixing. |
I'll still need to choose between peppering even single-char appends with |
If you're using NetworkStream or Socket directly (or any Stream, really), then don't do single char WriteAsync. NetworkStream does not do buffering; every time you call Write[Async], it's a send on the socket. Either use BufferedStream or do buffering manually. The managed HttpClient here does buffering manually. |
Even with pipelines you are doing the same thing. Do you write then flush (write to socket/file etc) or write and write and write (buffered writes) the issue you have with pipelines is... not supported officially yet. And lack of adapters (unless it's changed there is only a read not a file write for instance). Other than that you are good :) |
@geoffkizer Assuming a buffer, l still need to choose between peppering even single-char appends with await or writing buffer-aware code (code coupled to a certain buffer size), right? Would it lower throughput to have a system where I can write synchronously to an expandable buffer which is being relieved via async IO and do many fewer awaits in between? |
The awaits aren't really the issue. When you're using NetworkStream or Sockets, you should not do lots of small writes/sends. Each one will result in (a) a kernel call to do the send and (b) a separate packet on the wire. Use a buffer to construct what you want to send, then send it. If it's bigger than a reasonably sized buffer, then send when the buffer is full and then continue. |
@geoffkizer exactly. But like I said, assuming I am in fact using a buffer and am constructing text a piece at a time due to the nature of the algorithm constructing the data, I still have to sprinkle await everywhere through the algorithm just in case any of the writes ends up flushing the buffer. Most of the time none of them will flush, but any of them could. I don't like writing code like that writes each tiny piece asynchronously to a buffer, knowing that each write flushes if necessary, unless this system is actually the pinnacle of throughput. I don't like one alternative which is to couple the algorithm producing the data to a certain buffer size. The only alternative that remains is to have an expandable buffer and make all writes (to the buffer) synchronous and only flush at certain points. Even better, the synchronous buffer writes could choose to begin an async flush operation which would not be awaited until the next time you call the (asynchronous) buffer flush method. You could even have a method which lets you await any flush operations already started by the synchronous write methods but doesn't cause an additional flush if the buffer isn't full yet. You could also have flushing if no writes have occurred in a certain period of time, on the assumption that it's nicer to get the data out and it'll be less to send once the following data becomes available. Is there a reason this would be worse for throughput than awaiting every tiny append to the buffer in case a flush is necessary? Even though this isn't multithreaded, it feels similar in spirit to ASP.NET Core's IO where your writes complete synchronously using a buffer unless you get too far ahead of the thread relieving the buffer. |
@jnm2 Ah, I understand better now. There's no reason you can't have a Task-returning API that writes to your buffer, and only does an async write to the underlying Stream when the buffer is full. You can call this and await it however you like. In fact, we do exactly that in the managed HttpClient, see e.g.: https://github.com/dotnet/corefx/blob/master/src/System.Net.Http/src/System/Net/Http/Managed/HttpConnection.cs#L647 For maximum performance, you want to use the pattern we use here: If you need to return Task instead of Task, use ValueTask to avoid allocating a Task in the synchronous case. The key issues here are: Hope that helps... |
I'm going to close this PR out for now and will resubmit later. |
I felt like we were talking past each other, so I wrote up a proof of concept: The point is that you use So now that I've described better what I'm talking about, how would the throughput compare to the conventional code (which uses a normal fixed-size buffer and an |
Thanks, I think I understand better now. The approach in this code will be fine from a throughput point of view. You're doing reasonably large writes to the underlying stream. The problem is you have no backpressure. So if the caller keeps writing, you'll keep allocating buffers and holding on to them until the underlying stream is ready to deal with them. If you don't mind paying the price for allocating and holding all these buffers, then it's fine. On the other hand, if you do want to limit the amount of queued data in some way, then you need a mechanism for backpressure, and once you do that, having your own queue of buffers is probably counterproductive. |
@geoffkizer Rather than forcibly limiting the buffer, I'm letting the writer await If this type of API was exposed everywhere, how much of a need would there be a need for strict backpressure? |
@jnm2 Okay, that seems like a reasonable way to do backpressure, as long as the code using this is well-behaved.
That's true, but what's wrong with just awaiting every tiny append? The cost of awaiting a completed task is small. Is this a perf concern or a usability concern?
I don't think we'd expose it everywhere. I think we'd surface it similar to what you've done in your code, as a Stream that can wrap an underlying Stream. Sort of a BufferedStream without a size limit. |
So far my concern has been usability, especially when you conscientiously use ConfigureAwait, but potentially perf as well? I guess it's worth applying this to a scenario and doing perf tests. |
Yes, from a usability point of view, it's nice to be able to just call Write a bunch of times. The tradeoff is that you need to be careful so you don't explode your buffers. |
A cached completed task should be super quick. |
The other reason back pressure is important is that if the other side disconnects or goes away (maybe not an issue in the file case but the socket case for sure) you want to stop the work as soon as possible rather than just building up data in buffers. Remember that upstream you might have an expensive database cursor or some other resource that is good to release early if no one wants the result. |
Somewhat related: For this sort of write buffer stream, it's useful to be able to access the buffer(s) directly. For example, you want to be able to encode strings into UTF8 or format numbers/dates directly into the buffer, assuming there is space. We have something like this in the managed HttpClient, though it's a limited buffer size, not unlimited as above. I've been wondering if it might be useful to generalize this so other folks could use it, e.g. SslStream. @Drawaes does that seem like it would be useful for SslStream? |
Possibly... however I fear we are making pipelines and duplicating work. Also SSLStream is very fixed buffer as Schannel doesn't like fragmented buffers (although openssl has no issue) |
ReadNextLineAsync typically completes synchronously. So, change it to be TryReadNextLine and change callers to call FillAsync themselves when necessary. This avoids the overhead of an async method invocation unless it's actually needed.
This shows about a 4% improvement on the HttpClientPerf GET test.
@stephentoub