Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

avoid async overhead in ReadNextLineAsync when possible #23210

Closed
wants to merge 1 commit into from

Conversation

geoffkizer
Copy link

@geoffkizer geoffkizer commented Aug 14, 2017

ReadNextLineAsync typically completes synchronously. So, change it to be TryReadNextLine and change callers to call FillAsync themselves when necessary. This avoids the overhead of an async method invocation unless it's actually needed.

This shows about a 4% improvement on the HttpClientPerf GET test.

@stephentoub

@@ -26,7 +26,16 @@ public ChunkedEncodingReadStream(HttpConnection connection)
Debug.Assert(_chunkBytesRemaining == 0);

// Start of chunk, read chunk size.
ulong chunkSize = ParseHexSize(await _connection.ReadNextLineAsync(cancellationToken).ConfigureAwait(false));
ArraySegment<byte> line;
while (!_connection.TryReadNextLine(out line))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it'll mess up the pattern here, but I'm curious if you see any throughput benefits by changing the signature to return the ArraySegment<byte> with an out bool, or if you return a (bool, ArraySegment<byte>). I'm specifically wondering about the write barriers that may be incurred from writing to the out array (we can check the asm to verify there is one).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give it a try. The tuple return syntax is kinda cool anyway :)

{
if (!await _connection.FillAsync(cancellationToken).ConfigureAwait(false))
{
throw new IOException(SR.net_http_invalid_response);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth moving this throw into a helper.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, if every time we call FillAsync we expect it to successfully get more or else it's an error, the exception can just be moved into FillAsync.

@@ -771,7 +780,7 @@ private Task FillAsync(CancellationToken cancellationToken)
int bytesRead = t.GetAwaiter().GetResult();
if (NetEventSource.IsEnabled) Trace($"Received {bytesRead} bytes.");
_readLength += bytesRead;
return Task.CompletedTask;
return Task.FromResult(bytesRead > 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FromResult doesn't return cached tasks (at least not currently). If we keep this returning Task<bool> rather than throwing on EOF per my earlier question, we should cache this task:

private static readonly Task<bool> s_trueTask = Task.FromResult(true);
private static readonly Task<bool> s_falseTask = Task.FromResult(false);
...
return bytesRead > 0 ? s_trueTask : s_falseTask;

There's an active discussion about whether FromResult should be changed to pull from the same task cache that async methods do for synchronously completing methods, and if that changes this could be undone, but for now, let's manually cache.

@Drawaes
Copy link

Drawaes commented Aug 14, 2017

Is it worth instead having a method that returns ValueTask and bidding the need to call fill yourself? Then the optimisation is hidden as an implementation detail from the user rather than a new pattern?

@stephentoub
Copy link
Member

Is it worth instead having a method that returns ValueTask and bidding the need to call fill yourself?

ValueTask<bool> currently provides no benefit over Task or Task<bool>; there are actually ways in which it's worse.

@Drawaes
Copy link

Drawaes commented Aug 14, 2017

I thought if you can do the operation sync then you could just return sync with a value task and if it needs async then defer to an async method. If not my mistake.

@stephentoub
Copy link
Member

I thought if you can do the operation sync then you could just return sync with a value task and if it needs async then defer to an async method. If not my mistake.

If a task-returning async method completes synchronously, it'll try to use a cached task, and in the case of Task and Task<bool>, it'll always succeed in using a cached task.

@geoffkizer
Copy link
Author

geoffkizer commented Aug 14, 2017

I think you guys are talking about different things.

@Drawaes I think you are suggesting something like this:

public ValueTask<ArraySegment<byte>> ReadNextLineAsync()
{
  if (/* can read line */) return ValueTask.FromResult(line);
  else return ReadNextLineAsyncSlow();
}

public async ValueTask<ArraySegment<byte>> ReadNextLineAsyncSlow()
{
  if (!await FillAsync()) throw ...;
  line = /* read line */
  return line;
}

Am I understanding? We do use this pattern elsewhere, I should probably use it here.

(edited to fix the async method function name)

@Drawaes
Copy link

Drawaes commented Aug 14, 2017

Yeah that is exactly what i was thinking. Keeping the sync / async part internal to the implementation rather than leaking it to the consumer.

@stephentoub
Copy link
Member

stephentoub commented Aug 14, 2017

I think you guys are talking about different things.

Ah, I misunderstood the question then. As long as doing so doesn't negatively impact the throughput, sounds good.

Note that there is still a downside to this, though, that the current/manually inlined version addresses: if the await inside the ReadNextLineAsync yields, all of the machinery associated with yielding in an async method will occur for that method, which means another task/delegate/state machine/etc. allocated. All of that is ammortized away when it's manually inlined into the async call site (e.g. it shares in the same state machine, the same delegate, etc.)

@Drawaes
Copy link

Drawaes commented Aug 14, 2017

else return FillAsync().ContinueWith(_ => {
    if(_.Result)
    {
        line = \* readline *;
        return line;
    }
    else
    {
       throw bla bla;
    }
});

Is that any better? or it makes no difference?

@stephentoub
Copy link
Member

Is that any better? or it makes no difference?

If FillAsync may complete synchronously, it's likely worse.
If you need to capture any state into the continuation, it's likely worse.
If you would need a second continuation (e.g. the equivalent of a second await in the async method), it's worse.
Otherwise, assuming you made the continuation ExecuteSynchronously, it could be better; currently ContinueWith incurs two allocations whereas currently the first await to yield in an async method incurs four... but that'll be reduced by dotnet/coreclr#13105 to the point where it likely won't be better, or at least not significantly better.

@jnm2
Copy link

jnm2 commented Aug 14, 2017

Can we track porting this to netfx please?

@stephentoub
Copy link
Member

Can we track porting this to netfx please?

Track porting what?

@jnm2
Copy link

jnm2 commented Aug 14, 2017

TryReadNextLine? Hang on, let me double check.

@stephentoub
Copy link
Member

stephentoub commented Aug 14, 2017

TryReadNextLine

None of this code exists on netfx. This whole ManagedHandler component is new.

@jnm2
Copy link

jnm2 commented Aug 14, 2017

I'm sorry, never mind. I saw want I wanted to see. I keep wanting sync Try methods for IO so I can implement async fallbacks only when necessary, mainly on Stream, TextReader/Writer and DbDataReader.

@stephentoub
Copy link
Member

I saw want I wanted to see.

😄

@jnm2
Copy link

jnm2 commented Aug 14, 2017

I should try out pipelines before asking for that even. Perhaps that will provide a cleaner solution.

@geoffkizer
Copy link
Author

I keep wanting sync Try methods for IO so I can implement async fallbacks only when necessary

You shouldn't need this in core 2.0 anymore, at least for Socket and NetworkStream. The async methods will complete immediately if the operation can be complete synchronously.

Not sure about higher level APIs -- but if they don't complete synchronously when the underlying NetworkStream/Socket does, then that's something we should look at fixing.

@jnm2
Copy link

jnm2 commented Aug 14, 2017

I'll still need to choose between peppering even single-char appends with await or handling buffering manually. I'm not sure what the best practice is for greatest throughput.

@geoffkizer
Copy link
Author

If you're using NetworkStream or Socket directly (or any Stream, really), then don't do single char WriteAsync. NetworkStream does not do buffering; every time you call Write[Async], it's a send on the socket.

Either use BufferedStream or do buffering manually.

The managed HttpClient here does buffering manually.

@Drawaes
Copy link

Drawaes commented Aug 14, 2017

Even with pipelines you are doing the same thing. Do you write then flush (write to socket/file etc) or write and write and write (buffered writes) the issue you have with pipelines is... not supported officially yet. And lack of adapters (unless it's changed there is only a read not a file write for instance). Other than that you are good :)

@jnm2
Copy link

jnm2 commented Aug 14, 2017

@geoffkizer Assuming a buffer, l still need to choose between peppering even single-char appends with await or writing buffer-aware code (code coupled to a certain buffer size), right?

Would it lower throughput to have a system where I can write synchronously to an expandable buffer which is being relieved via async IO and do many fewer awaits in between?

@geoffkizer
Copy link
Author

The awaits aren't really the issue. When you're using NetworkStream or Sockets, you should not do lots of small writes/sends. Each one will result in (a) a kernel call to do the send and (b) a separate packet on the wire. Use a buffer to construct what you want to send, then send it. If it's bigger than a reasonably sized buffer, then send when the buffer is full and then continue.

@jnm2
Copy link

jnm2 commented Aug 14, 2017

@geoffkizer exactly. But like I said, assuming I am in fact using a buffer and am constructing text a piece at a time due to the nature of the algorithm constructing the data, I still have to sprinkle await everywhere through the algorithm just in case any of the writes ends up flushing the buffer. Most of the time none of them will flush, but any of them could.

I don't like writing code like that writes each tiny piece asynchronously to a buffer, knowing that each write flushes if necessary, unless this system is actually the pinnacle of throughput. I don't like one alternative which is to couple the algorithm producing the data to a certain buffer size. The only alternative that remains is to have an expandable buffer and make all writes (to the buffer) synchronous and only flush at certain points.

Even better, the synchronous buffer writes could choose to begin an async flush operation which would not be awaited until the next time you call the (asynchronous) buffer flush method. You could even have a method which lets you await any flush operations already started by the synchronous write methods but doesn't cause an additional flush if the buffer isn't full yet. You could also have flushing if no writes have occurred in a certain period of time, on the assumption that it's nicer to get the data out and it'll be less to send once the following data becomes available.

Is there a reason this would be worse for throughput than awaiting every tiny append to the buffer in case a flush is necessary? Even though this isn't multithreaded, it feels similar in spirit to ASP.NET Core's IO where your writes complete synchronously using a buffer unless you get too far ahead of the thread relieving the buffer.

@geoffkizer
Copy link
Author

geoffkizer commented Aug 14, 2017

@jnm2 Ah, I understand better now.

There's no reason you can't have a Task-returning API that writes to your buffer, and only does an async write to the underlying Stream when the buffer is full. You can call this and await it however you like. In fact, we do exactly that in the managed HttpClient, see e.g.: https://github.com/dotnet/corefx/blob/master/src/System.Net.Http/src/System/Net/Http/Managed/HttpConnection.cs#L647

For maximum performance, you want to use the pattern we use here:
(1) Have a Task-returning but not async method, like WriteByteAsync
(2) If there's space in the buffer, do the write, and return Task.Completed
(3) If not, call a separate async method to do the async write of your buffer to the underlying stream and, after awaiting this, write whatever you originally had into the buffer. (This is the WriteByteAsyncSlow method)

If you need to return Task instead of Task, use ValueTask to avoid allocating a Task in the synchronous case.

The key issues here are:
(1) Invoking an async method has some setup overhead, so avoid it if the method often returns synchronously
(2) Doing 'await' on a completed task has very little overhead, so it's fine to do this even for tasks that are typically completed synchronously.

Hope that helps...

@geoffkizer
Copy link
Author

I'm going to close this PR out for now and will resubmit later.

@geoffkizer geoffkizer closed this Aug 15, 2017
@jnm2
Copy link

jnm2 commented Aug 15, 2017

I felt like we were talking past each other, so I wrote up a proof of concept:
https://gist.github.com/jnm2/3e2fc8531ecd41c9c092fd2c3c6be886
(Warning: under minimal tests, and I have a known race condition between calling WriteChunk/FlushDestination and setting the return value to backgroundTask, whether or not Volatile is used. I'll look at it later.)

The point is that you use Write instead of WriteAsync so that you can layer things like BinaryWriter, StreamWriter, JsonTextWriter etc on top without having to await each Write call and so that you're still able to work even if you don't have async support in each higher layer. Write starts background writes and flushes without waiting for them. It reports errors from faulted tasks started by previous calls to Write.
You await WaitForCurrentFlush whenever it makes sense so that you can handle errors and so that the buffer doesn't get too far ahead of the IO. At the end, you call FlushAsync or Flush which will wait for all writes, write out the incomplete buffer, and flush the underlying stream.
(I didn't add the option to flush an unfilled buffer after a given ms delay, but it would be trivial.)

So now that I've described better what I'm talking about, how would the throughput compare to the conventional code (which uses a normal fixed-size buffer and an await on every tiny append in case that append happens to be the append that causes a flush)?

@geoffkizer
Copy link
Author

Thanks, I think I understand better now.

The approach in this code will be fine from a throughput point of view. You're doing reasonably large writes to the underlying stream.

The problem is you have no backpressure. So if the caller keeps writing, you'll keep allocating buffers and holding on to them until the underlying stream is ready to deal with them. If you don't mind paying the price for allocating and holding all these buffers, then it's fine.

On the other hand, if you do want to limit the amount of queued data in some way, then you need a mechanism for backpressure, and once you do that, having your own queue of buffers is probably counterproductive.

@jnm2
Copy link

jnm2 commented Aug 16, 2017

@geoffkizer Rather than forcibly limiting the buffer, I'm letting the writer await WaitForCurrentFlush which doesn't return until the buffer is empty. There's often a loop in situations like this where I'm doing twenty or thirty appends per iteration. This allows me to avoid a whole await per tiny append etc, but I can still do a single await once per iteration (or once per n iterations) to make sure the allocation doesn't get out of hand.

If this type of API was exposed everywhere, how much of a need would there be a need for strict backpressure?

@geoffkizer
Copy link
Author

@jnm2 Okay, that seems like a reasonable way to do backpressure, as long as the code using this is well-behaved.

This allows me to avoid a whole await per tiny append etc, but I can still do a single await once per iteration (or once per n iterations) to make sure the allocation doesn't get out of hand.

That's true, but what's wrong with just awaiting every tiny append? The cost of awaiting a completed task is small. Is this a perf concern or a usability concern?

If this type of API was exposed everywhere, how much of a need would there be a need for strict backpressure?

I don't think we'd expose it everywhere. I think we'd surface it similar to what you've done in your code, as a Stream that can wrap an underlying Stream. Sort of a BufferedStream without a size limit.

@jnm2
Copy link

jnm2 commented Aug 16, 2017

So far my concern has been usability, especially when you conscientiously use ConfigureAwait, but potentially perf as well? I guess it's worth applying this to a scenario and doing perf tests.

@geoffkizer
Copy link
Author

Yes, from a usability point of view, it's nice to be able to just call Write a bunch of times. The tradeoff is that you need to be careful so you don't explode your buffers.

@Drawaes
Copy link

Drawaes commented Aug 16, 2017

A cached completed task should be super quick.

@Drawaes
Copy link

Drawaes commented Aug 16, 2017

The other reason back pressure is important is that if the other side disconnects or goes away (maybe not an issue in the file case but the socket case for sure) you want to stop the work as soon as possible rather than just building up data in buffers. Remember that upstream you might have an expensive database cursor or some other resource that is good to release early if no one wants the result.

@geoffkizer
Copy link
Author

Somewhat related:

For this sort of write buffer stream, it's useful to be able to access the buffer(s) directly. For example, you want to be able to encode strings into UTF8 or format numbers/dates directly into the buffer, assuming there is space.

We have something like this in the managed HttpClient, though it's a limited buffer size, not unlimited as above. I've been wondering if it might be useful to generalize this so other folks could use it, e.g. SslStream. @Drawaes does that seem like it would be useful for SslStream?

@Drawaes
Copy link

Drawaes commented Aug 16, 2017

Possibly... however I fear we are making pipelines and duplicating work. Also SSLStream is very fixed buffer as Schannel doesn't like fragmented buffers (although openssl has no issue)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants