Skip to content

Network errors when refreshing an app repeatedly #11560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
javiercn opened this issue Jun 25, 2019 · 33 comments · Fixed by #11601
Closed

Network errors when refreshing an app repeatedly #11560

javiercn opened this issue Jun 25, 2019 · 33 comments · Fixed by #11601
Assignees
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions bug This issue describes a behavior which is not expected - a bug.

Comments

@javiercn
Copy link
Member

I created a blazorserverside app using the latest sdk from core-sdk

dotnet new blazorserverside
dotnet run
Open chrome
Press F5 repeatedly at some point a request will fail and the UI will change.

A8mjiYKM8d

@analogrelay
Copy link
Contributor

Can you confirm the runtime version you were on? (ye olde dotnet --info output)

@jkotalik jkotalik added the triage-focus Add this label to flag the issue for focus at triage label Jun 25, 2019
@javiercn
Copy link
Member Author

This seems to be on the latest sdk

sdk\3.0.100-preview7-012605\ Host (useful for support): Version: 3.0.0-preview7-27825-01 Commit: afd0301944

@Tratcher
Copy link
Member

F5 aborts requests that are in progress so some networking errors are expected. What makes you think there is a bug here?

Do you have server logs to go with this?

@analogrelay
Copy link
Contributor

Should there be an ERR_SPDY_PROTOCOL_ERROR when F5 aborts a request though?

@analogrelay
Copy link
Contributor

Server error:

Microsoft.AspNetCore.Connections.ConnectionResetException: An established connection was aborted by the software in your host machine.
 ---> System.Net.Sockets.SocketException (10053): An established connection was aborted by the software in your host machine.
   at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketAwaitableEventArgs.<GetResult>g__ThrowSocketException|7_0(SocketError e)
   at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketAwaitableEventArgs.GetResult()
   at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.ProcessSends()
   at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.DoSend()
   --- End of inner exception stack trace ---
   at System.IO.Pipelines.PipeCompletion.ThrowLatchedException()
   at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
   at System.IO.Pipelines.Pipe.GetReadAsyncResult()
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.DuplexPipeStream.ReadAsyncInternal(Memory`1 destination, CancellationToken cancellationToken)
   at System.Net.Security.SslStream.<FillBufferAsync>g__InternalFillBufferAsync|215_0[TReadAdapter](TReadAdapter adap, ValueTask`1 task, Int32 min, Int32 initial)
   at System.Net.Security.SslStream.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory`1 buffer)
   at System.IO.Pipelines.StreamPipeReader.ReadAsync(CancellationToken cancellationToken)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2Connection.ProcessRequestsAsync[TContext](IHttpApplication`1 application)

@analogrelay
Copy link
Contributor

I don't think it's the abort. Once I get a repro, I can press F5 once and it repros (with no second F5 to interrupt/abort requests)

@analogrelay
Copy link
Contributor

analogrelay commented Jun 25, 2019

Seems to be HTTPS only Of course it is, it's an HTTP/2 error 🙄. I still blame @davidfowl ;).

@analogrelay
Copy link
Contributor

@analogrelay
Copy link
Contributor

Let's try bisecting out the HTTPS changes and the Bedrock changes and see if this repros there. @halter73 can you do that and see if we can either fix it or revert the right thing and try again in preview 8.

cc @davidfowl

@analogrelay analogrelay added this to the 3.0.0-preview7 milestone Jun 25, 2019
@analogrelay analogrelay added PRI: 0 - Critical and removed triage-focus Add this label to flag the issue for focus at triage labels Jun 25, 2019
@halter73
Copy link
Member

What tool do you use to create the gif @javiercn? Is there something built in to Windows yet?

@halter73
Copy link
Member

I ran this with Kestrel in debug and saw the following assertion failure?/NullReferenceException:

Process terminated. Assertion failed.
trceSystem.NullReferenceException: Object reference not set to an instance of an object.
   at System.IO.Pipelines.StreamPipeWriter.AllocateSegment(Int32 sizeHint)
   at System.IO.Pipelines.StreamPipeWriter.AllocateMemory(Int32 sizeHint)
   at System.IO.Pipelines.StreamPipeWriter.GetSpan(Int32 sizeHint)
   at System.Buffers.BuffersExtensions.WriteMultiSegment[T](IBufferWriter`1 writer, ReadOnlySpan`1& source, Span`1 destination)
   at System.Buffers.BuffersExtensions.Write[T](IBufferWriter`1 writer, ReadOnlySpan`1 value)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2FrameWriter.WriteDataUnsynchronized(Int32 streamId, ReadOnlySequence`1 data, Int64 dataLength, Boolean endStream) in C:\dev\aspnet\AspNetCore\src\Servers\Kestrel\Core\src\Internal\Http2\Http2FrameWriter.cs:line 290
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2FrameWriter.WriteDataAsync(Int32 streamId, StreamOutputFlowControl flowControl, ReadOnlySequence`1 data, Boolean endStream) in C:\dev\aspnet\AspNetCore\src\Servers\Kestrel\Core\src\Internal\Http2\Http2FrameWriter.cs:line 260
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2OutputProducer.ProcessDataWrites() in C:\dev\aspnet\AspNetCore\src\Servers\Kestrel\Core\src\Internal\Http2\Http2OutputProducer.cs:line 337
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2OutputProducer.ProcessDataWrites() in C:\dev\aspnet\AspNetCore\src\Servers\Kestrel\Core\src\Internal\Http2\Http2OutputProducer.cs:line 337
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
   at System.IO.Pipelines.Pipe.FlushAsync(CancellationToken cancellationToken)
   at System.IO.Pipelines.Pipe.DefaultPipeWriter.FlushAsync(CancellationToken cancellationToken)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.TimingPipeFlusher.TimeFlushAsync(MinDataRate minRate, Int64 count, IHttpOutputAborter outputAborter, CancellationToken cancellationToken) in C:\dev\aspnet\AspNetCore\src\Servers\Kestrel\Core\src\Internal\Infrastructure\TimingPipeFlusher.cs:line 79
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.TimingPipeFlusher.FlushAsync(MinDataRate minRate, Int64 count, IHttpOutputAborter outputAborter, CancellationToken cancellationToken) in C:\dev\aspnet\AspNetCore\src\Servers\Kestrel\Core\src\Internal\Infrastructure\TimingPipeFlusher.cs:line 66
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.TimingPipeFlusher.FlushAsync(IHttpOutputAborter outputAborter, CancellationToken cancellationToken) in C:\dev\aspnet\AspNetCore\src\Servers\Kestrel\Core\src\Internal\Infrastructure\TimingPipeFlusher.cs:line 50
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2OutputProducer.WriteDataToPipeAsync(ReadOnlySpan`1 data, CancellationToken cancellationToken) in C:\dev\aspnet\AspNetCore\src\Servers\Kestrel\Core\src\Internal\Http2\Http2OutputProducer.cs:line 272
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.WritePipeAsync(ReadOnlyMemory`1 data, CancellationToken cancellationToken) in C:\dev\aspnet\AspNetCore\src\Servers\Kestrel\Core\src\Internal\Http\HttpProtocol.cs:line 1450
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpResponsePipeWriter.WriteAsync(ReadOnlyMemory`1 source, CancellationToken cancellationToken) in C:\dev\aspnet\AspNetCore\src\Servers\Kestrel\Core\src\Internal\Http\HttpResponsePipeWriter.cs:line 68
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpResponseStream.WriteAsyncInternal(ReadOnlyMemory`1 source, CancellationToken cancellationToken) in C:\dev\aspnet\AspNetCore\src\Servers\Kestrel\Core\src\Internal\Http\HttpResponseStream.cs:line 138
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpResponseStream.WriteAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken) in C:\dev\aspnet\AspNetCore\src\Servers\Kestrel\Core\src\Internal\Http\HttpResponseStream.cs:line 128
   at Microsoft.AspNetCore.Http.Extensions.StreamCopyOperation.CopyToAsync(Stream source, Stream destination, Nullable`1 count, Int32 bufferSize, CancellationToken cancel) in C:\dev\aspnet\AspNetCore\src\Http\Http.Extensions\src\StreamCopyOperation.cs:line 78
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch(): Microsoft.AspNetCore.Server.Kestrel[37]
      Connection id "0HLNPPA6TUSTI" sending DATA frame for stream ID 519 with length 16384 and flags NONE

We also saw AuthenticationExceptions:

System.Security.Authentication.AuthenticationException: Authentication failed, see inner exception.
 ---> System.ComponentModel.Win32Exception (0x80090327): An unknown error occurred while processing the certificate.
   --- End of inner exception stack trace ---
   at System.Net.Security.SslStream.StartSendAuthResetSignal(ProtocolToken message, AsyncProtocolRequest asyncRequest, ExceptionDispatchInfo exception)
   at System.Net.Security.SslStream.CheckCompletionBeforeNextReceive(ProtocolToken message, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.StartReadFrame(Byte[] buffer, Int32 readBytes, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.PartialFrameCallback(AsyncProtocolRequest asyncRequest)
--- End of stack trace from previous location where exception was thrown ---
   at System.Net.Security.SslStream.ThrowIfExceptional()
   at System.Net.Security.SslStream.InternalEndProcessAuthentication(LazyAsyncResult lazyResult)
   at System.Net.Security.SslStream.EndProcessAuthentication(IAsyncResult result)
   at System.Net.Security.SslStream.EndAuthenticateAsServer(IAsyncResult asyncResult)
   at System.Net.Security.SslStream.<>c.<AuthenticateAsServerAsync>b__69_1(IAsyncResult iar)

@halter73
Copy link
Member

halter73 commented Jun 26, 2019

@halter73
Copy link
Member

So @jkotalik and I think we might have figured out at least part of what's going on here. With "real" pipes, it's OK to call PipeWriter.Complete() immediately after PipeWriter.FlushAsync() returns a ValueTask, even if the ValueTask returned by FlushAsync() hasn't completed by that point. This comes into play when an HTTP/2 connection is "aborted" which happens after a timeout or any time the client sends a TCP FIN.

In this case, Http2Connection.InputOrOutputCompleted() calls Http2FrameWriter.Abort() which acquires the _writeLock and completes the connection-level PipeWriter which was recently made to be a StreamPipeWriter. Http2FrameWriter doesn't not ensure that any previous calls to FlushAsync() have completed, it just ensures no PipeWriter invocations are made in the future for that invocation. StreamPipeWriter, however, assumes that any calls to FlushAsync() have fully completed.

Ultimately we need to fix StreamPipeWriter to allow calls to Complete() prior to FlushAsync() completing in order to be as compatible as possible with the "real" PipeWriter.

Here's another seemingly-related error that I saw which isn't as easily attributable to an async operation continuing after the PipeWriterCompletion. This time it's an ODE from DiagnosticPoolBlock indicating a use-after-free error (which you would never see in a production app as the name implies). What's weird in this case is that StreamPipeWriter.GetSpan() did not throw an InvalidOperationException for being already completed. Despite this, by the time the ODE was caught, the StreamPipeWriter was completed. On the other hand, StreamPipeWriter.GetSpan() must have completed before the StreamPipeWriter was completed because Http2FrameWriter.WriteDataAsync() was upstack during the StreamPipeWriter.GetSpan() call, and WriteDataAsync must have acquired the _writeLock and checked Http2FrameWriter._completed was false.

System.ObjectDisposedException: 'Cannot access a disposed object.
Object name: 'MemoryPoolBlock'.'
   at System.Buffers.MemoryPoolThrowHelper.ThrowObjectDisposedException(ExceptionArgument argument) in C:\dev\aspnet\AspNetCore\src\Shared\Buffers.MemoryPool\MemoryPoolThrowHelper.cs:line 94
   at System.Buffers.DiagnosticPoolBlock.GetSpan() in C:\dev\aspnet\AspNetCore\src\Shared\Buffers.MemoryPool\DiagnosticPoolBlock.cs:line 107
   at System.IO.Pipelines.StreamPipeWriter.GetSpan(Int32 sizeHint)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2FrameWriter.WriteHeader(Http2Frame frame, PipeWriter output) in C:\dev\aspnet\AspNetCore\src\Servers\Kestrel\Core\src\Internal\Http2\Http2FrameWriter.cs:line 581
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2FrameWriter.WriteHeaderUnsynchronized() in C:\dev\aspnet\AspNetCore\src\Servers\Kestrel\Core\src\Internal\Http2\Http2FrameWriter.cs:line 562
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2FrameWriter.WriteDataUnsynchronized(Int32 streamId, ReadOnlySequence`1 data, Int64 dataLength, Boolean endStream) in C:\dev\aspnet\AspNetCore\src\Servers\Kestrel\Core\src\Internal\Http2\Http2FrameWriter.cs:line 291
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2FrameWriter.WriteDataAsync(Int32 streamId, StreamOutputFlowControl flowControl, ReadOnlySequence`1 data, Boolean endStream) in C:\dev\aspnet\AspNetCore\src\Servers\Kestrel\Core\src\Internal\Http2\Http2FrameWriter.cs:line 265
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2OutputProducer.<ProcessDataWrites>d__32.MoveNext() in C:\dev\aspnet\AspNetCore\src\Servers\Kestrel\Core\src\Internal\Http2\Http2OutputProducer.cs:line 337

image

      Connection id "0HLNPQ836ERBP" sending DATA frame for stream ID 81 with length 16384 and flags NONE
trce: Microsoft.AspNetCore.Server.Kestrel[37]
      Connection id "0HLNPQ836ERBP" sending DATA frame for stream ID 81 with length 16384 and flags NONE
trce: Microsoft.AspNetCore.Server.Kestrel[37]
      Connection id "0HLNPQ836ERBP" sending DATA frame for stream ID 81 with length 16384 and flags NONE
trce: Microsoft.AspNetCore.Server.Kestrel[37]
      Connection id "0HLNPQ836ERBP" sending DATA frame for stream ID 81 with length 16384 and flags NONE
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
      Connection id "0HLNPQ836ERBP" sending FIN because: "The client closed the connection."
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[19]
      Connection id "0HLNPQ836ERBP" reset.
info: Microsoft.AspNetCore.StaticFiles.StaticFileMiddleware[2]
      Sending file. Request path: '/_framework/blazor.server.js'. Physical path: 'C:\dev\aspnet\AspNetCore\src\Servers\Kestrel\samples\Http2SampleApp\wwwroot\_framework\blazor.server.js'

I only saw the ODE once so far, and it wasn't possible to correlate the Kestrel debug logs that included the connection id with the StaticFileMiddleware logs which don't because the console logger doesn't include scopes by default.

@davidfowl @anurse dotnet/extensions#1883

@halter73
Copy link
Member

The short-term fix is probably to bring back the AdaptedPipeline or something similar.

@halter73
Copy link
Member

This also doesn't explain why clients would send a GOAWAY frame indicating an HTTP/2 protocol violation. I haven't seen that in my testing. That would mean that there had to be an issue prior to the client sending a FIN which I haven't seen yet. Maybe fixing the other issues will either fix or unmask it.

@halter73
Copy link
Member

My current thinking with the GOAWAY errors and the DiagnosticMemoryPool ODEs, though I haven't proven it yet, is that blocks are being returned twice. First by StreamPipeWriter.Complete(), and second by the continuation StreamPipeWriter.FlushAsync(). This then allows the same block to subsequently be leased out twice concurrently to two different pipes leading to corruption.

@davidfowl
Copy link
Member

Can’t we just not complete the connection level pipe like what the http1connection does? That seems like a sane mitigation. This issue of completing the pipe actually came up when we were discussing the dispose/complete behavior of the StreamPipeWriter but we left it as is unstop further notice.

@halter73
Copy link
Member

halter73 commented Jun 26, 2019

That would almost work except for that StreamPipeWriter doesn't return unsent blocks to the pool when Stream.Write/FlushAsync throws even in the case where it's expected. Normally those unreturned blocks are picked up by Kestrel later calling StreamPipeWriter.Complete().

Your comment has me wondering whether Http1Connection now properly returns blocks for canceled HTTPS writes. I suspect not. I don't think we should leave it up to the block finalizers to handle something not-at-all-unusual like this.

Furthermore, I don't think we should be using or exposing StreamPipeWriter at all yet if it can fail in a way where it returns the same block twice. That could lead to all sorts of data corruption and information disclosure issues.

@davidfowl
Copy link
Member

That would almost work except for that StreamPipeWriter doesn't return unsent blocks to the pool when Stream.Write/FlushAsync throws even in the case where it's expected. Normally those unreturned blocks are picked up by Kestrel later calling StreamPipeWriter.Complete().

Complete is called at the right time in the pipeline, it just shouldn't be called by Kestrel. Instead it should just await the flush and yield.

Your comment has me wondering whether Http1Connection now properly returns blocks for canceled HTTPS writes. I suspect not. I don't think we should leave it up to the block finalizers to handle something not-at-all-unusual like this.

Why wouldn't it? I'm not following can you clarify.

Furthermore, I don't think we should be using or exposing StreamPipeWriter at all yet if it can fail in a way where it returns the same block twice. That could lead to all sorts of data corruption and information disclosure issues.

Exposing it is how we find real issues (that's what the previews and our tests are for). If blocks can be returned twice then the issue should just but fixed, no need to overreact and not use the type because we found an issue.

@halter73
Copy link
Member

Complete is called at the right time in the pipeline, it just shouldn't be called by Kestrel. Instead it should just await the flush and yield.

Not exactly. ConnectionContext.DisposeAsync() is called at the right time, and that induces both of our transports to complete the application pipes, but those are the raw pipes that feed into SslStream, not StreamPipeWriter.

Exposing it is how we find real issues (that's what the previews and our tests are for). If blocks can be returned twice then the issue should just but fixed, no need to overreact and not use the type because we found an issue.

I agree we should expose it, but we should fix this issue first. Hopefully we can get it fixed for preview7. StreamPipeWriter is the thing we're exposing to the user, and it provides a super subtle easy way to double-return memory pool blocks. Use-after-free bugs are super hard to diagnose, and I don't want to have to do it more than once because of this issue.

@davidfowl
Copy link
Member

Not exactly. ConnectionContext.DisposeAsync() is called at the right time, and that induces both of our transports to complete the application pipes, but those are the raw pipes that feed into SslStream, not StreamPipeWriter.

How would this issue happen if callers are properly awaiting flushes before calling complete?

I agree we should expose it, but we should fix this issue first. Hopefully we can get it fixed for preview7. StreamPipeWriter is the thing we're exposing to the user, and it provides a super subtle easy way to double-return memory pool blocks. Use-after-free bugs are super hard to diagnose, and I don't want to have to do it more than once because of this issue.

Don't forget it's already exposed on both the HttpContext and in the lesser used Kestrel middleware pipeline. We need to fix the issue, not panic. Reverting the change in the HttpContext layer would be expensive and the wrong fix at this point.

PS: Nobody wants use after free bugs, especially in managed code.

@halter73
Copy link
Member

How would this issue happen if callers are properly awaiting flushes before calling complete?

I'm still describing the hypothetical world you suggested where we change Http2Connection not to call complete like Http1Connection.

The problem is that Http1Connection just leaves it to ConnectionDispatcher.Execute() to call ConnectionContext.DisposeAsync() and hopes that cleans everything up. In the case where you don't adapt the ConnectionContext with StreamPipeWriter, it does clean up all the default pipes. The problem is that Socket/LibuvConnection.DisposeAsync() (i.e. ConnectionContext.DisposeAsync) complete the non-adapted "Transport" pipes, so the StreamPipeWriter never gets completed if we do what you suggest in your hypothetical.

Your comment has me wondering whether Http1Connection now properly returns blocks for canceled HTTPS writes. I suspect not. I don't think we should leave it up to the block finalizers to handle something not-at-all-unusual like this.

Why wouldn't it? I'm not following can you clarify.

See above.

@davidfowl
Copy link
Member

Except that;'s not how it works, when middleware adapts the transport it is responsbile for completing the pipes it creates. This is how the HttpsConnectionMiddleware works. The chain looks like this:

dispatcher -> httpsmiddleware -> httpmiddleware

httpmiddleware yields from it's delegate then the httpsmiddleware which cleans up the SslStream and disposes the SslDuplexPipe (which completes the StreamPipeReader and StreamPipeWriter). It looks like this (SslStream.DisposeAsync -> DuplexPipeStreamAdapter.DisposeAsync -> Input.Complete, Output.Complete) . The httpsmiddleware then restores the original transport pipe on the connection and yields control to the dispatcher which calls ConnectionContext.DisposeAsync which properly cleans up the transport pipes.

https://github.com/aspnet/AspNetCore/blob/9f52639909df27efa01017e777ca9703dc77a2ab/src/Servers/Kestrel/Core/src/Middleware/HttpsConnectionMiddleware.cs#L223-L238

https://github.com/aspnet/AspNetCore/blob/9f52639909df27efa01017e777ca9703dc77a2ab/src/Servers/Kestrel/Core/src/Middleware/Internal/DuplexPipeStreamAdapter.cs#L42-L47

@halter73
Copy link
Member

Thanks for the detailed call-chain. That actually helps a lot. My brain is still a bit stuck in the old ConnectionAdapter ways of last week 😆, but this definitely the way things should be now that middleware directly adapts pipes instead of streams.

I now completely agree that we don't need to be calling PipeWriter.Complete() in Http2FrameWriter anymore. And I think Http1Connection is fine too.

There can still be issues with app code that doesn't properly await writes/flushes, but that shouldn't be a huge concern since we can consider that user error. If we really want to make it safe, we could wait the TimingPipeFlusher task ourselves before letting the HTTP middleware complete.

Either way, we should fix StreamPipeWriter.

@halter73
Copy link
Member

I'm not sure the double-return is a real issue anymore either. Since BufferSegment.ResetMemory() clears its _memoryOwner field, a second call to ResetMemory() should only be a problem if the BufferSegment itself was leased again and its _memoryOwner field was set to a non-null value before the extraneous call to ResetMemory(). But if the StreamPipeWriter is already completed, there shouldn't be any subsequent attempts to lease a BufferSegment.

@davidfowl
Copy link
Member

My guess is that there's overlapping operations going on that ends up causing corruption in the StreamPipeWriter (it has no locks). Here's a trace from when things go bad:

[debug]: [0HLNQ7FUTVB8B]: GetSpan(0)
[debug]: [0HLNQ7FUTVB8B]: Begin Advance(4060)
[debug]: [0HLNQ7FUTVB8B]: End Advance(4060)
[debug]: [0HLNQ7FUTVB8B]: GetSpan(0)
[debug]: [0HLNQ7FUTVB8B]: Begin Advance(36)
[debug]: [0HLNQ7FUTVB8B]: End Advance(36)
[debug]: [0HLNQ7FUTVB8B]: Begin FlushAsyncInternal()
[debug]: [0HLNQ7FUTVB8B]: GetSpan(9)
[debug]: [0HLNQ7FUTVB8B]: Begin Advance(9)
[debug]: [0HLNQ7FUTVB8B]: End Advance(9)
[debug]: [0HLNQ7FUTVB8B]: GetSpan(9)
[debug]: [0HLNQ7FUTVB8B]: End FlushAsyncInternal()
[debug]: [0HLNQ7FUTVB8B]: Begin Advance(9)

This is the same connection and you can see we end up interleaving GetSpan/Advance and FlushAsync on the same connection.

@analogrelay
Copy link
Contributor

So, what's the plan for preview 7? Is there a tactical quick-fix we can do or is this going to require more in-depth work?

@jkotalik
Copy link
Contributor

Yeah, the quick fix is to bring back the adapted pipeline here: 25d5688#diff-1892f694d558e36b8367aad39ce40046L92 to start using pipes again. I need to sync with these two today to check on the status of this as it seems @davidfowl PR didn't work.

@davidfowl
Copy link
Member

@halter73
Copy link
Member

halter73 commented Jun 26, 2019

I agree that the StreamPipeWriter should track whether there's a currently-ongoing flush and reject any other API calls until the flush completes.

While this will probably break existing PipeWriter consumers when we switch from DefaultPipeWriter to StreamPipeWriter, at least the error should be easily observed and fixed.

In the case of Http2FrameWriter, it looks like we're going to need to use a task queue or something similar for all PipeWriter API calls, not just the FlushAsync calls.

@halter73
Copy link
Member

The TimingPipeFlusher used by Http2FrameWriter and a few other types in order to support unawaited calls to Stream.WriteAsync() on top of a PipeWriter is completely busted if it wraps anything other than the DefaultPipeWriter.

While the TimingPipeFlusher prevents overlapping calls to FlushAsync, it does not prevent any other PipeWriter API calls (e.g. GetSpan() and Advance()) from being called in parallel. That's because after the TimingPipeFlusher sees the last flush wasn't awaited, it releases Http2FrameWriter's writeLock. When the previous flush completes, the TimingPipeFlusher will then acquire its own lock to start the next flush. This prevents any other flushes from happening simultaneously, but it doesn't prevent the Http2FrameWriter from re-acquiring its own lock and calling GetSpan/Advance/etc during a call to flush.

Of course, would could just put all PipeWriter API calls and their associated arguments into a queue, so that API calls that were initiated from a bunch of different parallel HTTP/2 streams, always happen in a valid order.

If we want to do that though, likely the easiest and most efficient way to queue all the PipeWriter API calls is to just have Http2FrameWriter put a DefaultPipeWriter in front of the ConnectionContext's PipeWriter. We could probably optimize the case where the ConnectionContext's PipeWriter is a DefaultPipeWriter.

@davidfowl
Copy link
Member

I’m going to look into making StreamPipeWriter work with the overlapping calls. It shouldn’t be too hard actually (we have a bunch of this logic in pipe today but it might be easier since we don’t have to mess with as much state). We just need to make sure the linked list pointers are updated appropriately.

@davidfowl davidfowl assigned davidfowl and unassigned halter73 Jun 27, 2019
@davidfowl davidfowl added the bug This issue describes a behavior which is not expected - a bug. label Jun 27, 2019
@halter73
Copy link
Member

I’m going to look into making StreamPipeWriter work with the overlapping calls.

This helps if the ConnectionContext's PipeWriter is a StreamPipeWriter, but not if it's any other custom PipeWriter implementation. We should be careful to only use user-replaceable PipeWriter's in the normal way. That means awaiting FlushAsync calls before making any other PipeWriter call other than CancelPendingFlush().

@ghost ghost locked as resolved and limited conversation to collaborators Dec 3, 2019
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions bug This issue describes a behavior which is not expected - a bug.
Projects
None yet
7 participants