When running two assemblies in parallel in our tests, one of the results is intermittently lost, causing a failure of our package tests. We don't currently have a test that runs a larger number of assemblies in parallel, but I think we can assume that the problem would occur more frequently in that case and I would add such a test as part of the resolution of this issue.
Originally, I thought that this bug only appeared if one of the assemblies targeted .NET Framework, but it turns out that was only a coincidence. It also happens when we run, for example, .NET 6.0 and .NET 8.0 tests together and probably also if we run two .NET 8.0 tests.
As a temporary measure, I made our CI tests run by specifying --agents=1, forcing the assemblies to run sequentially. There seems to be some underlying race condition when multiple agents run in parallel.
I originally thought the problem would be in AggregatingTestRunner, which combines the results from multiple agents, but it that doesn't seem to be the case, at least as far as I can see. I believe it has to do our TCP server implementation in the engine. A separate proxy is used for each agent and all of them are waiting for results at the same time. I think we actually need to have a single place where agent results are received and then distributed to each proxy. Alternatively, a single proxy could be used, distributing the responses to each ProcessRunner.
My own development setup uses a virtual windows machine, and I'm unable to force this error to occur locally. Possibly, I don't have enough understanding of KVM to set it up properly. :-) Anyway, I need someone else to work on this with me, who is able to replicate the problem on a windows machine with multiple processors. @manfred-brands Does your setup allow you to replicate this issue? It should show up when running build -t Package multiple times. You can use --nob to make subsequent runs skip the build itself.
When running two assemblies in parallel in our tests, one of the results is intermittently lost, causing a failure of our package tests. We don't currently have a test that runs a larger number of assemblies in parallel, but I think we can assume that the problem would occur more frequently in that case and I would add such a test as part of the resolution of this issue.
Originally, I thought that this bug only appeared if one of the assemblies targeted .NET Framework, but it turns out that was only a coincidence. It also happens when we run, for example, .NET 6.0 and .NET 8.0 tests together and probably also if we run two .NET 8.0 tests.
As a temporary measure, I made our CI tests run by specifying
--agents=1, forcing the assemblies to run sequentially. There seems to be some underlying race condition when multiple agents run in parallel.I originally thought the problem would be in AggregatingTestRunner, which combines the results from multiple agents, but it that doesn't seem to be the case, at least as far as I can see. I believe it has to do our TCP server implementation in the engine. A separate proxy is used for each agent and all of them are waiting for results at the same time. I think we actually need to have a single place where agent results are received and then distributed to each proxy. Alternatively, a single proxy could be used, distributing the responses to each
ProcessRunner.My own development setup uses a virtual windows machine, and I'm unable to force this error to occur locally. Possibly, I don't have enough understanding of KVM to set it up properly. :-) Anyway, I need someone else to work on this with me, who is able to replicate the problem on a windows machine with multiple processors. @manfred-brands Does your setup allow you to replicate this issue? It should show up when running
build -t Packagemultiple times. You can use --nob to make subsequent runs skip the build itself.