Skip to content
This repository was archived by the owner on Dec 18, 2018. It is now read-only.

Flaky test: MaxRequestBufferSizeTests.LargeUpload #2225

Closed
Tratcher opened this issue Dec 19, 2017 · 21 comments
Closed

Flaky test: MaxRequestBufferSizeTests.LargeUpload #2225

Tratcher opened this issue Dec 19, 2017 · 21 comments
Assignees
Milestone

Comments

@Tratcher
Copy link
Member

This has failed 50% of the time for me on recent local builds, always with the same actual result.

[xUnit.net 00:00:44.2552397]     LargeUpload(maxRequestBufferSize: 5242880, connectionAdapter: True, expectPause: True) [FAIL]
[xUnit.net 00:00:44.2588137]       Assert.InRange() Failure
[xUnit.net 00:00:44.2591272]       Range:  (5238785 - 20971519)
[xUnit.net 00:00:44.2592656]       Actual: 20971520
[xUnit.net 00:00:44.2619018]       Stack Trace:
[xUnit.net 00:00:44.2652259]         D:\github\AspNet\KestrelHttpServer\test\Kestrel.FunctionalTests\MaxRequestBufferSizeTests.cs(160,0): at Microsoft.AspNetCore.Server.Kestrel.FunctionalTests.MaxRequestBufferSizeTests.<LargeUpload>d__6.MoveNext()
[xUnit.net 00:00:44.2657057]         --- End of stack trace from previous location where exception was thrown ---
[xUnit.net 00:00:44.2658986]         --- End of stack trace from previous location where exception was thrown ---
[xUnit.net 00:00:44.2660844]         --- End of stack trace from previous location where exception was thrown ---
[xUnit.net 00:00:44.2682372]       Output:
[xUnit.net 00:00:44.2684638]         | Microsoft.AspNetCore.Hosting.Internal.WebHost Information: Request starting HTTP/1.0 POST http:///  20971520
Failed   LargeUpload(maxRequestBufferSize: 5242880, connectionAdapter: True, expectPause: True)
Error Message:
 Assert.InRange() Failure
Range:  (5238785 - 20971519)
Actual: 20971520
Stack Trace:
   at Microsoft.AspNetCore.Server.Kestrel.FunctionalTests.MaxRequestBufferSizeTests.<LargeUpload>d__6.MoveNext() in D:\github\AspNet\KestrelHttpServer\test\Kestrel.FunctionalTests\MaxRequestBufferSizeTests.cs:line 160
@Tratcher
Copy link
Member Author

Tratcher commented Jan 9, 2018

Still failing, same error.

@muratg
Copy link
Contributor

muratg commented Jan 11, 2018

cc @halter73

@halter73
Copy link
Member

@Tratcher What OS? Have you noticed it fail on the 2.0 or 2.1 runtime? Both?

@ryanbrandenburg
Copy link
Contributor

You can see the history of this test failure here.

@ryanbrandenburg
Copy link
Contributor

Another failure, but a different error this time:

System.IO.IOException : Unable to read data from the transport connection: Invalid argument.
---- System.Net.Sockets.SocketException : Invalid argument
   at Microsoft.AspNetCore.Server.Kestrel.FunctionalTests.MaxRequestBufferSizeTests.<>c__DisplayClass6_2.<<LargeUpload>b__0>d.MoveNext() in /_/test/Kestrel.FunctionalTests/MaxRequestBufferSizeTests.cs:line 120
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.AspNetCore.Server.Kestrel.FunctionalTests.MaxRequestBufferSizeTests.LargeUpload(Nullable`1 maxRequestBufferSize, Boolean connectionAdapter, Boolean expectPause) in /_/test/Kestrel.FunctionalTests/MaxRequestBufferSizeTests.cs:line 163
--- End of stack trace from previous location where exception was thrown ---
----- Inner Stack Trace -----

------- Stdout: -------
| [2018-04-10T13:17:35] Microsoft.AspNetCore.Hosting.Internal.WebHost Information: Request starting HTTP/1.0 POST http:///  20971520
| [2018-04-10T13:17:39] Microsoft.AspNetCore.Server.Kestrel Error: Connection id "0HLCUSLM70C7C", Request id "0HLCUSLM70C7C:00000001": An unhandled exception was thrown by the application.
|    at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpRequestStream.ValidateState(CancellationToken cancellationToken) in /_/src/Kestrel.Core/Internal/Http/HttpRequestStream.cs:line 216
|    at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpRequestStream.ReadAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken) in /_/src/Kestrel.Core/Internal/Http/HttpRequestStream.cs:line 110
|    at Microsoft.AspNetCore.Server.Kestrel.FunctionalTests.MaxRequestBufferSizeTests.<>c__DisplayClass8_0.<<StartWebHost>b__3>d.MoveNext() in /_/test/Kestrel.FunctionalTests/MaxRequestBufferSizeTests.cs:line 298
| --- End of stack trace from previous location where exception was thrown ---
|    at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication`1 application) in /_/src/Kestrel.Core/Internal/Http/HttpProtocol.cs:line 536
| [2018-04-10T13:17:39] Microsoft.AspNetCore.Hosting.Internal.WebHost Information: Request finished in 3976.3743ms 0

@muratg muratg added this to the 2.1.0-rc1 milestone Apr 10, 2018
@muratg muratg added flaky test and removed task labels Apr 10, 2018
@muratg
Copy link
Contributor

muratg commented Apr 12, 2018

@Tratcher did you get a chance to look into this yet?

@Tratcher
Copy link
Member Author

No, I should have some time to look at tests tomorrow.

@Tratcher
Copy link
Member Author

Tratcher commented Apr 13, 2018

@muratg muratg modified the milestones: 2.1.0-rc1, 2.2.0-mq Apr 13, 2018
@Tratcher
Copy link
Member Author

Tratcher commented Apr 16, 2018

RE the InRange failure, it does not appear to be TFM specific, and it reproduces both on Sockets and Libuv, though more readily on Sockets. Only the connectionAdapter: True variation fails with the above error.

@benaadams
Copy link
Contributor

Re: System.IO.IOException : Unable to read data from the transport connection: Invalid argument.

Invalid argument is now likely to be reported as OperationAborted dotnet/corefx#29091

@Tratcher
Copy link
Member Author

Mitigation for the original issue has been checked into dev for 2.2. If we see it again in 2.1 we can consider backporting it. If we see any more reports of the other errors we can open a separate issue.

@Tratcher
Copy link
Member Author

Nevermind, it's still failing.

@Tratcher Tratcher reopened this Apr 17, 2018
@ryanbrandenburg
Copy link
Contributor

This fails most frequently on our Win10 build, on which it has failed 4 times today.

@muratg
Copy link
Contributor

muratg commented Apr 30, 2018

@Tratcher did you look into this one yet?

@Tratcher
Copy link
Member Author

Tratcher commented May 1, 2018

I wasn't able to get a consistent repro so I moved on to tests with more failures. I'll revisit it when I get that far down the list.

@mikeharder
Copy link
Contributor

Just failed on my dev machine, same error as the first post in this issue:

  [xUnit.net 00:00:17.1104598]     LargeUpload(maxRequestBufferSize: 5242880, connectionAdapter: True, expectPause: True) [FAIL]
  Failed   LargeUpload(maxRequestBufferSize: 5242880, connectionAdapter: True, expectPause: True)
RUNDOTNET : error Message:  [C:\Users\mharder\.dotnet\buildtools\korebuild\2.2.0-preview1-17051\KoreBuild.proj]
   Assert.InRange() Failure
  Range:  (5238785 - 20971519)
  Actual: 20971520
  Stack Trace:
     at Microsoft.AspNetCore.Server.Kestrel.FunctionalTests.MaxRequestBufferSizeTests.LargeUpload(Nullable`1 maxRequestBufferSize, Boolean connectionAdapter, Boolean expectPause) in D:\Git\KestrelHttpServer\test\Kestrel.FunctionalTests\MaxRequestBufferSizeTests.cs:line 149
  --- End of stack trace from previous location where exception was thrown ---

It's possible the client and/or server buffers on Win10 have been increased, so we may need to change the test parameters:

// Larger than default, but still significantly lower than data, so client should be paused.
// On Windows, the client is usually paused around (MaxRequestBufferSize + 700,000).
// On Linux, the client is usually paused around (MaxRequestBufferSize + 10,000,000).
Tuple.Create((long?)5 * 1024 * 1024, true),

// The maximum is harder to determine, since there can be OS-level buffers in both the client
// and server, which allow the client to send more than maxRequestBufferSize before getting
// paused. We assume the combined buffers are smaller than the difference between
// data.Length and maxRequestBufferSize.

Specifically, we may need to increase the data sent from 20MB to something larger until the test is reliable on all platforms.

@mikeharder mikeharder assigned mikeharder and unassigned Tratcher May 16, 2018
@mikeharder
Copy link
Contributor

I can take this issue since I wrote these tests.

@mikeharder
Copy link
Contributor

mikeharder commented May 16, 2018

On my dev machine, the failure was in Sockets.FunctionalTests (rather than Libuv.FunctionalTests). This might explain why the test was initially reliable (when Libuv was the only transport), and it only became flaky once the Sockets transport was introduced.

The next step is to determine if the flakiness is caused by test code (e.g. the 15MB difference between upload size and max request buffer is not large enough) or product code (e.g. sometimes the server fails to apply backpressure, which would be a product bug).

@mikeharder
Copy link
Contributor

I've been running the LargeUpload tests in a loop on my dev machine:

for /l %n in () do dotnet test --no-build -f netcoreapp2.2 --filter "DisplayName~LargeUpload"

The following test (and only this specific test) fails about 25-30% of the time:

LargeUpload(maxRequestBufferSize: 5242880, connectionAdapter: True, expectPause: True)

It may be noteworthy that connectionAdapter: true, and the variant with connectionAdapter: false never fails.

I also tried running only the two tests with maxRequestBufferSize: 5242880 in a loop:

for /l %n in () do dotnet test --no-build -f netcoreapp2.2 --filter "DisplayName~LargeUpload & DisplayName~5242880

However, when running this subset I was unable to repro any failures.

@mikeharder
Copy link
Contributor

mikeharder commented May 17, 2018

I was also able to reproduce this in a Win10 Hyper-V VM on my dev machine. It fails significantly less often than my physical machine, but I was able to repro some failures with connectionAdapter: false:

Result Count Percentage
Pass 3930 95.8%
Fail (connectionAdapter: True) 167 4.0%
Fail (connectionAdapter: False) 6 0.2%

mikeharder added a commit that referenced this issue May 17, 2018
- Increase _dataLength from 20MB to 40MB to improve test reliability when using Sockets transport on Windows
- Addresses #2225
@pakrym
Copy link
Contributor

pakrym commented May 23, 2018

New occurance: https://ci3.dot.net/job/aspnet_KestrelHttpServer/job/dev/job/linux-Configuration_Release_prtest/307/

15:41:29   [xUnit.net 00:00:14.0401462]     LargeUpload(maxRequestBufferSize: 26, connectionAdapter: False, expectPause: True) [FAIL]
15:41:29   Failed   LargeUpload(maxRequestBufferSize: 26, connectionAdapter: False, expectPause: True)
15:41:29 RUNDOTNET : error Message:  [/home/dotnet-bot/.dotnet/buildtools/korebuild/2.2.0-preview1-17060/KoreBuild.proj]
15:41:29    Assert.Contains() Failure
15:41:29   Not found: bytesRead: 41943040
15:41:29   In value:  
15:41:29   Stack Trace:
15:41:29      at Microsoft.AspNetCore.Server.Kestrel.FunctionalTests.MaxRequestBufferSizeTests.LargeUpload(Nullable`1 maxRequestBufferSize, Boolean connectionAdapter, Boolean expectPause)
15:41:29   --- End of stack trace from previous location where exception was thrown ---
15:42:50   Results File: /mnt/j/workspace/aspnet_KestrelHttpServer/dev/linux-Configuration_Release_prtest/artifacts/logs/UnitTests-netcoreapp2.2-307.trx
15:42:50   

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants