Skip to content

Better BuildServerTestFixtureBase dispose timeouts #29544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

halter73
Copy link
Member

Two consecutive builds of our main branch failed yesterday because of BuildServerTestFixtureBase.Dispose() timeouts, but it wasn't immediately obvious this was the cause.

https://dev.azure.com/dnceng/public/_build/results?buildId=960451&view=results
https://dev.azure.com/dnceng/public/_build/results?buildId=960598&view=results

I still don't know exactly which test(s) lead to this timeout. Is this something that can happen for any test using BuildServerTestFixtureBase or only certain tests? There are many concurrent tests at the time of the crash so it's hard to determine.

Furthermore, having test fail in this way makes our CI test failure reporting inaccurate and make it seem as if tests relying on BuildServerTestFixtureBase are less flaky than they really are.

After this change, we should see an actual test failure when there's a timeout shutting down the build server instead of having to download artifacts to see the real exception thrown from a Timer thread.

Previous error in build output:

VSTest : error : Tests failed: F:\workspace\_work\1\s\artifacts\TestResults\Release\Microsoft.NET.Sdk.Razor.IntegrationTests_net6.0_x64.html [net6.0|x64] [F:\workspace\_work\1\s\src\Razor\Microsoft.NET.Sdk.Razor\integrationtests\Microsoft.NET.Sdk.Razor.IntegrationTests.csproj]
##[error]VSTest(0,0): error : (NETCORE_ENGINEERING_TELEMETRY=Build) Tests failed: F:\workspace\_work\1\s\artifacts\TestResults\Release\Microsoft.NET.Sdk.Razor.IntegrationTests_net6.0_x64.html [net6.0|x64]

Previous Windows_Test_Logs/Release/Microsoft.NET.Sdk.Razor.IntegrationTests_net6.0_x64.log:

The active test run was aborted. Reason: Test host process crashed : Unhandled exception. System.AggregateException: One or more errors occurred. (Shutting down the build server at pipe 2843ffa6-94b9-4e65-8b9f-a9518bd1fccf took longer than expected.
Output: .)
 ---> System.TimeoutException: Shutting down the build server at pipe 2843ffa6-94b9-4e65-8b9f-a9518bd1fccf took longer than expected.
Output: .
   at Microsoft.AspNetCore.Razor.Design.IntegrationTests.BuildServerTestFixtureBase.<>c__DisplayClass5_0.<Dispose>b__0() in /_/src/Razor/Microsoft.NET.Sdk.Razor/integrationtests/BuildServerTestFixtureBase.cs:line 36
   at System.Threading.CancellationToken.<>c.<Register>b__12_0(Object obj)
   at System.Threading.CancellationTokenSource.Invoke(Delegate d, Object state, CancellationTokenSource source)
   at System.Threading.CancellationTokenSource.CallbackNode.<>c.<ExecuteCallback>b__9_0(Object s)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.CancellationTokenSource.CallbackNode.ExecuteCallback()
   at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException)
   --- End of inner exception stack trace ---
   at System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(Boolean throwOnFirstException)
   at System.Threading.CancellationTokenSource.NotifyCancellation(Boolean throwOnFirstException)
   at System.Threading.CancellationTokenSource.<>c.<.cctor>b__58_0(Object obj)
   at System.Threading.TimerQueueTimer.CallCallback(Boolean isThreadPool)
   at System.Threading.TimerQueueTimer.Fire(Boolean isThreadPool)
   at System.Threading.TimerQueue.FireNextTimers()


Attachments:
  F:\workspace\_work\1\s\artifacts\TestResults\Release\f5410663-ef01-49db-9237-9f929fa5f471\Sequence_96994782dcf34a2e932bf2d4bee6e499.xml
Test Run Aborted.
Total tests: Unknown
     Passed: 42
 Total time: 3.3076 Minutes
Results File: F:\workspace\_work\1\s\artifacts\TestResults\Release\Microsoft.NET.Sdk.Razor.IntegrationTests_net6.0_x64.trx
Html test results file : F:\workspace\_work\1\s\artifacts\TestResults\Release\Microsoft.NET.Sdk.Razor.IntegrationTests_net6.0_x64.html

The active Test Run was aborted because the host process exited unexpectedly. Please inspect the call stack above, if available, to get more information about where the exception originated from.
The test running when the crash occurred: 
Microsoft.AspNetCore.Razor.Design.IntegrationTests.BuildPerformanceTest.BuildMvcAppWithComponents
Microsoft.AspNetCore.Razor.Design.IntegrationTests.PackIntegrationTest.Pack_Incremental_DoesNotRegenerateCacheAndPropsFiles
Microsoft.AspNetCore.Razor.Design.IntegrationTests.BuildIncrementalismTest.IncrementalBuild_WithP2P_WorksWhenBuildProjectReferencesIsDisabled
Microsoft.AspNetCore.Razor.Design.IntegrationTests.ScopedCssIntegrationTest.Build_RemovingScopedCssAndBuilding_UpdatesGeneratedCodeAndBundle
Microsoft.AspNetCore.Razor.Design.IntegrationTests.ApplicationPartDiscoveryIntegrationTest.BuildIncrementalism_WhenApplicationPartAttributeIsGenerated
Microsoft.AspNetCore.Razor.Design.IntegrationTests.PublishIntegrationTest.Publish_IncludeCshtmlAndRefAssemblies_CopiesFiles
Microsoft.AspNetCore.Razor.Design.IntegrationTests.MvcBuildIntegrationTest21.Building_Project
Microsoft.AspNetCore.Razor.Design.IntegrationTests.BuildServerIntegrationTest.ManualServerShutdown_NoPipeName_ShutsDownServer
Microsoft.AspNetCore.Razor.Design.IntegrationTests.BuildIntegrationTest.Build_ErrorInGeneratedCode_ReportsMSBuildError
Microsoft.AspNetCore.Razor.Design.IntegrationTests.RazorGenerateIntegrationTest.RazorGenerate_WorksWithLinkedFiles
Microsoft.AspNetCore.Razor.Design.IntegrationTests.BuildIntrospectionTest.GetCopyToOutputDirectoryItems_WhenNoFileIsPresent_ReturnsEmptySequence
Microsoft.AspNetCore.Razor.Design.IntegrationTests.MvcBuildIntegrationTest31.Publish_Project
Microsoft.AspNetCore.Razor.Design.IntegrationTests.StaticWebAssetsIntegrationTest.Publish_CopiesStaticWebAssetsToDestinationFolder_PublishSingleFile
Microsoft.AspNetCore.Razor.Design.IntegrationTests.MvcBuildIntegrationTest22.Publish_Project
Microsoft.AspNetCore.Razor.Design.IntegrationTests.BuildWithComponentsIntegrationTest.Build_DoesNotProduceMvcArtifacts_IfProjectDoesNotContainRazorGenerateItems


This test may, or may not be the source of the crash.
=== COMMAND LINE ===
"F:\workspace\_work\1\s\.dotnet\dotnet.exe" test F:\workspace\_work\1\s\artifacts\bin\Microsoft.NET.Sdk.Razor.IntegrationTests\Release\net6.0\Microsoft.NET.Sdk.Razor.IntegrationTests.dll --logger:"console;verbosity=normal" --logger:"trx;LogFileName=Microsoft.NET.Sdk.Razor.IntegrationTests_net6.0_x64.trx" --logger:"html;LogFileName=Microsoft.NET.Sdk.Razor.IntegrationTests_net6.0_x64.html" "--ResultsDirectory:F:\workspace\_work\1\s\artifacts\TestResults\Release" "--Framework:.NETCoreApp,Version=v6.0" --filter Quarantined!=true  --blame "CollectHangDump
TestTimeout=6m" > "F:\workspace\_work\1\s\artifacts\log\Release\Microsoft.NET.Sdk.Razor.IntegrationTests_net6.0_x64.log" 2>&1

New error in build output:

[xUnit.net 00:00:09.63]     [Test Class Cleanup Failure (Microsoft.AspNetCore.Razor.Design.IntegrationTests.BuildIncrementalismTest)] System.TimeoutException
Data collector 'Blame' message: All tests finished running, Sequence file will not be generated.
  Failed Build_TouchesUpToDateMarkerFile [1 ms]
  Error Message:
   [Test Class Cleanup Failure (Microsoft.AspNetCore.Razor.Design.IntegrationTests.BuildIncrementalismTest)]: System.TimeoutException : Shutting down the build server at pipe 42ed7c30-36b9-4d06-84c3-f8adb2bc5fe3 took longer than expected.
Output: .
  Stack Trace:
     at Microsoft.AspNetCore.Razor.Design.IntegrationTests.BuildServerTestFixtureBase.DisposeAsync() in C:\dev\dotnet\aspnetcore\src\Razor\Microsoft.NET.Sdk.Razor\integrationtests\BuildServerTestFixtureBase.cs:line 74

Failed!  - Failed:     1, Passed:     1, Skipped:     0, Total:     2, Duration: 8 s - Microsoft.NET.Sdk.Razor.IntegrationTests.dll (net6.0)

@halter73 halter73 added the reliability Issues affecting reliability (crashes, memory leaks, stress issues) label Jan 22, 2021
@halter73 halter73 force-pushed the halter73/better-timeouts branch from ad2ff4b to 1881e05 Compare January 22, 2021 19:58
@halter73 halter73 force-pushed the halter73/better-timeouts branch from 1881e05 to bc95b12 Compare January 22, 2021 20:01
@captainsafia
Copy link
Member

@halter73 Since we moved the Razor SDK into the SDK repo, we'll actually be removing this code from our repo shortly (working on a branch for that now). These integration tests are now in the SDK repo on a different stack.

else
{
var output = writer.ToString();
throw new TimeoutException($"Shutting down the build server at pipe {PipeName} took longer than expected.{Environment.NewLine}Output: {output}.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if instead the code should just output a warning e.g. using an ITestOutputHelper. Unfortunately that would require changing all uses of derived fixtures. Thoughts i.e. is this case actually an error worth crashing the test host about❔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this change, the tests fail rather than crash the testhost. I don't know if a warning would be better, but it sounds like this isn't going to be around for much longer. This is a simple fix that seems to me to be strictly better than what we have now.

@halter73
Copy link
Member Author

@captainsafia How shortly is shortly? The failure rate on this seems really high. It's hard to say exactly how high because how the failure crashes the testhost, but I was looking at multiple consecutive failures and it doesn't seem to just be some transient infrastructure flakiness. If these tests aren't being removed today, I think we should merge this so we don't waste time for the next person on build-ops.

@halter73 halter73 closed this Jan 22, 2021
@captainsafia
Copy link
Member

@halter73 Expected to merge today. I've got a PR out at #29548 but waiting on a newer build from the SDK repo with some fixes before merging.

@ghost
Copy link

ghost commented Jan 22, 2021

Hi @captainsafia. It looks like you just commented on a closed PR. The team will most probably miss it. If you'd like to bring something important up to their attention, consider filing a new issue and add enough details to build context.

@halter73 halter73 deleted the halter73/better-timeouts branch January 25, 2021 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reliability Issues affecting reliability (crashes, memory leaks, stress issues)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants