Skip to content

Coverage causes hangs #511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
clairernovotny opened this issue Aug 6, 2019 · 24 comments
Closed

Coverage causes hangs #511

clairernovotny opened this issue Aug 6, 2019 · 24 comments
Labels
stale tenet-performance Performance related issue

Comments

@clairernovotny
Copy link
Contributor

clairernovotny commented Aug 6, 2019

Hi,

When adding coverage for .NET Framework, it's causing hangs. It works fine on .NET Core:
https://dev.azure.com/onovotny/GitBuilds/_build/results?buildId=2203&view=logs&j=12f1170f-54f2-53f3-20dd-22fc7dff55f9

This commit has the changes:
novotnyllc/bc-csharp@5ddb175

Here's the command that fails, when on .NET Framework: novotnyllc/bc-csharp@5ddb175

Seems like it's happening with .NET Core too:
https://dev.azure.com/onovotny/GitBuilds/_build/results?buildId=2204&view=logs&j=12f1170f-54f2-53f3-20dd-22fc7dff55f9&t=bfbec40a-1b5e-5690-b870-859627cad0c0&l=20

Should be able to repro with dotnet test -f netcoreapp2.1 ...

Repro steps:

Clone: https://github.com/onovotny/bc-csharp
Checkout commit: 4a401ed22df13ef61b2bb7a306ee8a083167f1c6
Go to crypto\test
Run dotnet test -f netcoreapp2.1 and see that it passes. About 1.5 min on my machine
Run dotnet test -f netcoreapp2.1 -s ..\..\CodeCoverage.runsettings

Tests are taking much longer to run (never complete) and fail in some cases.

I think it's struggling on this test: https://github.com/onovotny/bc-csharp/blob/netstandard/crypto/test/src/math/ec/test/ECPointPerformanceTest.cs#L176

Perhaps with some of the random number generators in use there?

@clairernovotny clairernovotny changed the title Coverage causes hang on .NET framework Coverage causes hangs Aug 6, 2019
@MarcoRossignoli
Copy link
Collaborator

MarcoRossignoli commented Aug 7, 2019

Hi Oren,
I did some check and it's related to instrumentation, that test seem a performance test comparison https://github.com/onovotny/bc-csharp/blob/netstandard/crypto/test/src/math/ec/test/ECPointPerformanceTest.cs#L21 that lead to a lot of call to instrumentation code

image

I did a debug with msbuild but nothing different core is the same

dotnet test /p:CollectCoverage=true /p:Exclude=\"[xunit.*]*,[*Tests]*,[nunit.*]*,[NUnit3.*]*\" /p:ExcludeByAttribute=\"Obsolete,GeneratedCodeAttribute,CompilerGeneratedAttribute\" /p:SingleHit=true /p:CoverletToolsPath=C:\git\coverlet\src\coverlet.msbuild.tasks\bin\Debug\netstandard2.0\

With SingleHit=true should be better...but maybe you should exclude that test from "coverage check".
In my test I get result after some minutes

C:\git\coverletissue\bc-csharp\crypto\test (HEAD detached at 4a401ed -> origin)                                                                                                                                                               
λ dotnet test /p:CollectCoverage=true /p:Exclude=\"[xunit.*]*,[*Tests]*,[nunit.*]*,[NUnit3.*]*\" /p:ExcludeByAttribute=\"Obsolete,GeneratedCodeAttribute,CompilerGeneratedAttribute\" /p:SingleHit=true /p:CoverletToolsPath=C:\git\coverlet\s
rc\coverlet.msbuild.tasks\bin\Debug\netstandard2.0\                                                                                                                                                                                           
...                                                                                                                                                             
                                                                                                                                                                                                                                              
Avvio dell'esecuzione dei test in corso. Attendere...                                                                                                                                                                                         
  ! TestFunction                                                                                                                                                                                                                              
  ! TestSumOfMultipliesComplete [1ms]                                                                                                                                                                                                         
  ! TestSumOfTwoMultipliesComplete                                                                                                                                                                                                            
  ! TestMultiply                                                                                                                                                                                                                              
                                                                                                                                                                                                                                              
L'esecuzione dei test è riuscita.                                                                                                                                                                                                             
Total tests: 995                                                                                                                                                                                                                              
     Passed: 991                                                                                                                                                                                                                              
Tempo di esecuzione dei test: 24,4505 Minuti                                                                                                                                                                                                  
                                                                                                                                                                                                                                              
Calculating coverage result...                                                                                                                                                                                                                
  Generating report 'C:\git\coverletissue\bc-csharp\crypto\test\coverage.json'                                                                                                                                                                
                                                                                                                                                                                                                                              
+---------------------+--------+--------+--------+                                                                                                                                                                                            
| Module              | Line   | Branch | Method |                                                                                                                                                                                            
+---------------------+--------+--------+--------+                                                                                                                                                                                            
| BouncyCastle.Crypto | 75,16% | 62,9%  | 72,22% |                                                                                                                                                                                            
+---------------------+--------+--------+--------+                                                                                                                                                                                            
                                                                                                                                                                                                                                              
+---------+--------+--------+--------+                                                                                                                                                                                                        
|         | Line   | Branch | Method |                                                                                                                                                                                                        
+---------+--------+--------+--------+                                                                                                                                                                                                        
| Total   | 75,16% | 62,9%  | 72,22% |                                                                                                                                                                                                        
+---------+--------+--------+--------+                                                                                                                                                                                                        
| Average | 75,16% | 62,9%  | 72,22% |                                                                                                                                                                                                        
+---------+--------+--------+--------+                                                                                                                                                                                                        
                                                                                                                                                                                                                                              

@jkotas do you think that aggressive inlining could help here(I think that tiering cannot help to remove the call)

@MarcoRossignoli MarcoRossignoli added the tenet-performance Performance related issue label Aug 7, 2019
@clairernovotny
Copy link
Contributor Author

clairernovotny commented Aug 7, 2019

I think there may be other things going too as it's sometimes affectint the test itself:

  X TestCrypto [1s 462ms]                                                                                                 Error Message:                                                                                                           AES Monte Carlo Test 13: failed - expected 77ba00ed5412dff27c8ed91f3c376172 got 094d55f0268b6f471afeaeef0d591424       Stack Trace:
     at Org.BouncyCastle.Crypto.Tests.AllTests.TestCrypto() in D:\dev\BouncyCastle-PCL\crypto\test\src\crypto\test\AllTests.cs:line 43


  X TestFunction [1s 457ms]                                                                                               Error Message:                                                                                                             Expected string length 9 but was 40. Strings differ at index 3.
  Expected: "AES: Okay"
  But was:  "AES Monte Carlo Test 13: failed reversal"
  --------------^

  Stack Trace:
     at Org.BouncyCastle.Crypto.Tests.AesTest.TestFunction() in D:\dev\BouncyCastle-PCL\crypto\test\src\crypto\test\AESTest.cs:line 175

Not every run, but more than once. If you run the tests several times, it'll eventually fail with that.

Or this

  X TestECKeyAgree [150ms]                                                                                                Error Message:                                                                                                           System.NullReferenceException : Object reference not set to an instance of an object.
  Stack Trace:
     at Org.BouncyCastle.Crypto.Generators.ECKeyPairGenerator.GenerateKeyPair() in D:\dev\BouncyCastle-PCL\crypto\src\crypto\generators\ECKeyPairGenerator.cs:line 100
   at Org.BouncyCastle.Cms.Tests.CmsTestUtil.MakeECDsaKeyPair() in D:\dev\BouncyCastle-PCL\crypto\test\src\cms\test\CMSTestUtil.cs:line 253
   at Org.BouncyCastle.Cms.Tests.EnvelopedDataStreamTest.get_OrigECKP() in D:\dev\BouncyCastle-PCL\crypto\test\src\cms\test\EnvelopedDataStreamTest.cs:line 63
   at Org.BouncyCastle.Cms.Tests.EnvelopedDataStreamTest.TestECKeyAgree() in D:\dev\BouncyCastle-PCL\crypto\test\src\cms\test\EnvelopedDataStreamTest.cs:line 492

@clairernovotny
Copy link
Contributor Author

Those errors shouldn't be happening based on those code paths, it seems like Coverlet may be interfering with the normal code flow.

@MarcoRossignoli
Copy link
Collaborator

If you run the tests several times, it'll eventually fail with that.

You mean more than one time in same CI loop(to undestand if could be related to instrumenting more than once problem)?
I don't know what tests does...could be related to IL "instrumentations"?Coverlet injects a static class per module(ModuleTrackerTemplate) and a static call RecordSingleHit/RecordHit to it for every sequence/branch point.

@clairernovotny
Copy link
Contributor Author

clairernovotny commented Aug 7, 2019

I mean running it from the command line different times. It's non deterministic, and only happens with Coverlet. Has never happened without.

@MarcoRossignoli
Copy link
Collaborator

Can you try to "cleanup"(git -fdx) between tests and check if it happens?

@clairernovotny
Copy link
Contributor Author

Will try. In the interim, even with the ExcludeFromCodeCoverage attribute and SingleHit=true, TestSumOfTwoMultipliesComplete appears to take a very long time and isn't completing.

@clairernovotny
Copy link
Contributor Author

Hmm....running twice without a clean does appear to trigger the errors. Seems like it's over-instrumenting or breaking something that way?

@MarcoRossignoli
Copy link
Collaborator

MarcoRossignoli commented Aug 7, 2019

IIRW we had some issue where libs were instrumented more than once leading to weird results.

@jkotas
Copy link

jkotas commented Aug 7, 2019

@jkotas do you think that aggressive inlining could help here(I think that tiering cannot help to remove the call)

This is small-enough method to be inlined by JIT without any hints. I do not expect that aggressive inlining would help anything here.

@MarcoRossignoli
Copy link
Collaborator

Hmm....running twice without a clean does appear to trigger the errors. Seems like it's over-instrumenting or breaking something that way?

At the moment we don't check if module is already instrumented(maybe we could try to check if module tracker is present)

@clairernovotny
Copy link
Contributor Author

clairernovotny commented Aug 7, 2019

Spoke too soon, even after a clean, I got an error:

 PS> git clean -xdf                                                                                                                                                                                                                                netstandard ~2    D:\dev\BouncyCastle-PCL                                                                           
 PS> dotnet test .\crypto\test\crypto.test.csproj -s .\CodeCoverage.runsettings                                         src\asn1\test\LinkedCertificateTest.cs(51,38): warning CS0168: The variable 'e' is declared but never used [D:\dev\BouncyCastle-PCL\crypto\test\crypto.test.csproj]
src\test\NamedCurveTest.cs(254,17): warning CS0162: Unreachable code detected [D:\dev\BouncyCastle-PCL\crypto\test\crypto.test.csproj]
src\crypto\test\SCryptTest.cs(68,38): warning CS0168: The variable 'e' is declared but never used [D:\dev\BouncyCastle-PCL\crypto\test\crypto.test.csproj]
src\test\NamedCurveTest.cs(254,17): warning CS0162: Unreachable code detected [D:\dev\BouncyCastle-PCL\crypto\test\crypto.test.csproj]
src\asn1\test\LinkedCertificateTest.cs(51,38): warning CS0168: The variable 'e' is declared but never used [D:\dev\BouncyCastle-PCL\crypto\test\crypto.test.csproj]
src\crypto\test\SCryptTest.cs(68,38): warning CS0168: The variable 'e' is declared but never used [D:\dev\BouncyCastle-PCL\crypto\test\crypto.test.csproj]
Test run for D:\dev\BouncyCastle-PCL\crypto\test\bin\Debug\netcoreapp2.1\crypto.test.dll(.NETCoreApp,Version=v2.1)
Microsoft (R) Test Execution Command Line Tool Version 16.1.1
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
                                                                                                                          ! TestFunction                                                                                                                                                                                                                                  X TestFunction [1s 367ms]                                                                                               Error Message:                                                                                                             Expected string length 9 but was 40. Strings differ at index 3.
  Expected: "AES: Okay"
  But was:  "AES Monte Carlo Test 11: failed reversal"
  --------------^

  Stack Trace:
     at Org.BouncyCastle.Crypto.Tests.AesTest.TestFunction() in D:\dev\BouncyCastle-PCL\crypto\test\src\crypto\test\AESTest.cs:line 175

                                                                                                                          X TestCrypto [1s 377ms]                                                                                                 Error Message:                                                                                                           AES Monte Carlo Test 11: failed - expected acc863637868e3e068d2fd6e3508454a got a2164f3e337eb7cfdc3876ab22d9760e       Stack Trace:
     at Org.BouncyCastle.Crypto.Tests.AllTests.TestCrypto() in D:\dev\BouncyCastle-PCL\crypto\test\src\crypto\test\AllTests.cs:line 43

                                                                                                                          ! TestMultiply                                                                                                                                                                                                                                  ! TestSumOfMultipliesComplete [1ms]                                                                                                                                                                                                             ! TestSumOfTwoMultipliesComplete                                                                                      Test run in progress.                                                                                                                        .Attempting to cancel the build...
Attempting to cancel the build...

   1    netstandard ~2    D:\dev\BouncyCastle-PCL                                   

@MarcoRossignoli
Copy link
Collaborator

MarcoRossignoli commented Aug 7, 2019

Tell me where you put ExcludeFromCodeCoverage I would try to repro on my local(or if you've branch/commit to clone with above setup).

@clairernovotny
Copy link
Contributor Author

I tried it on ECAlgorithmsTest.cs on the TestSumOfTwoMultipliesComplete method. It's in crypto\test\src\math\ec\test

@clairernovotny
Copy link
Contributor Author

Looks like it takes 8.2 minutes on my machine to complete (with SingleHit = true and no ExcludeFromCodeCoverage attributes set). Still a significant issue given that it takes 1.5 minutes without any coverage.

@MarcoRossignoli
Copy link
Collaborator

MarcoRossignoli commented Aug 7, 2019

I'll try to understand what's going on, thank's for quick infos!
I'm not sure if we can improve perf...it depends on how many "new call" are injected.
cc: @tonerdo

@clairernovotny
Copy link
Contributor Author

Perf isn't the only issue. I'm concerned about the random failures it injects.

image

Locally, after 8.3 min, it passed one platform but something failed on another. Started with a clean repo.

Same thing on Azure Pipelines happened to pass completely this time. https://dev.azure.com/onovotny/GitBuilds/_build/results?buildId=2207&view=results

@MarcoRossignoli
Copy link
Collaborator

Perf isn't the only issue. I'm concerned about the random failures it injects.

I'll try to repro asap

@clairernovotny
Copy link
Contributor Author

In CmsTestUtil, the property getters call Init() on the objects they create, so that by the time GenerateKeyPair() is called, the local parameters variable should always be set. Something appears to be interfering in there.

@MarcoRossignoli
Copy link
Collaborator

MarcoRossignoli commented Aug 7, 2019

Oren can you try to add -m:1 only to exclude another know issue
(No repro using msbuild task more times for the moment on my local, now try collector)

@clairernovotny
Copy link
Contributor Author

clairernovotny commented Aug 7, 2019

It repro's with -m:1:

It's not every time, it took 3-4 runs to make it happen, each starting with a clean repo

image

@MarcoRossignoli
Copy link
Collaborator

MarcoRossignoli commented Aug 7, 2019

I'll try to reproduce and also compare instrumented vs non instrumented IL

It's first time I see an issue like that, maybe instrumentation change execution order that generates some bug (race etc...)

@MarcoRossignoli
Copy link
Collaborator

MarcoRossignoli commented Aug 8, 2019

@onovotny I did a lot of test on 2 different machine yesterday evening 4/5 time x machine, and no luck...I cannot repro, my command was

C:\git\coverletissue\bc-csharp> dotnet test .\crypto\test\crypto.test.csproj -s .\CodeCoverage.runsettings 

runsettings

<?xml version="1.0" encoding="utf-8"?>
<!-- File name extension must be .runsettings -->
<RunSettings>
  <DataCollectionRunSettings>
    <DataCollectors>      
       <DataCollector friendlyName="XPlat code coverage">
        <Configuration>
          <Format>cobertura</Format>
          <Exclude>[xunit.*]*,[*Tests]*,[nunit.*]*,[NUnit3.*]*</Exclude> <!-- [Assembly-Filter]Type-Filter -->
          <ExcludeByAttribute>Obsolete,GeneratedCodeAttribute,CompilerGeneratedAttribute</ExcludeByAttribute>
          <SingleHit>true</SingleHit>
        </Configuration>
      </DataCollector>    
    </DataCollectors>
  </DataCollectionRunSettings>
</RunSettings>

Can you check that my config it's the same as yours?

This is the instrumentation of Init() on RsaKeyPairGenerator
image

I've one suspect for #511 (comment) ...that for some reason tests are running in parallel and two test ask for static access to https://github.com/onovotny/bc-csharp/blob/netstandard/crypto/test/src/cms/test/CMSTestUtil.cs#L84-L91 and the first init kpg so null check above return non initilized instance for another thread that could lead to null ref. Maybe could happen because instrumented code are slower than non instrumented or is no more inlined so the randomic nature of failure.
Can you try to force tests serialization?

@Bertk
Copy link
Collaborator

Bertk commented Aug 31, 2024

This issue is outdated (before coverlet version 6.0.2) and not related to #1192 regression.

@Bertk Bertk closed this as completed Aug 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

4 participants