-
Notifications
You must be signed in to change notification settings - Fork 218
Remove hard exit hack #1280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove hard exit hack #1280
Conversation
Fixes #1278 Copy the `_loopback` code from `package:multi_server_socket` but model it as a `List<ServerSocket>` instead of a single server. Remove the unnecessary arguments for handling anything other than port 0.
See what fails...
Note this fixes flakiness so you may need to run Travis a few times. |
I'm wondering if the hack covered up the problem in #1278 (comment) I'm also wondering if the fix could give us some clue why we have needed this hack. The bug that I fixed with f355958 is an unhandled async error. But since that error never surfaces I'm not sure where it's disappearing too, and if the error is preventing the VM from shutting down somehow. I'm not 100% sure that the unhandled async error I fixed there makes a real difference though, or if it's something else going on. |
If there is no exception and we never close the servers they can hold the process open. I'm not sure if this fully explains the flakiness on travis...
I wonder if there were multiple things that could trigger this. It seems to me like the node socket servers not getting shut down should cause the VM to stay open, but the fact that this is the happy case and not the error case makes me wonder why it doesn't always cause a problem. test/pkgs/test/lib/src/runner/node/platform.dart Lines 127 to 130 in c2f712c
We seem to be able to reliably trigger this behavior with a bug in the node platform which causes an unhandled async error. That error would be effectively swallowed at test/pkgs/test_api/lib/src/backend/invoker.dart Lines 355 to 357 in 58383da
But I don't know why having that error surface there specifically would be a problem. |
Ugh, fixing the new node issue does not fix things overall, we can still hit flakes. |
Skipping the test There was one run where there was a failure that looks really similar - in this case the test runner as a subprocess did emit the correct output, but then never exited so when the test timed out it was killed. This doesn't explain the outer timeouts that need to be killed by travis. https://travis-ci.org/github/dart-lang/test/jobs/700382066 If I run this test locally I don't ever see a failure or case where the test runner hangs, so far 25 successes in a row. I might verify that restoring the test restores the flakiness, then see if I can narrow it down to any single test case. |
After these hacks there is no `Invoker` retained in hello world test cases. - Don't pass through executable arguments. This isn't safe for things like `--enable-vm-service`. - Eagerly resolve some static futures. - Add an extra delay for paranoia, and print when observatory should be "settled". - Don't use an AsyncMemo for closing the runner. Refactor away from async/await to old style future chaining. This prevents the `_awaiter` field on the Future from holding on to the entire closure context and preventing GC. - Avoid reusing a static final future. Change to a getter which reads from a variable if it has already been resolved. - Fix some bugs around double newlines in observatory message.
Hack around bugs in the VM service protocol with uncaught errors.
Hmm, I had hoped it was those
|
…ctions, will this restore the problem?
…github actions, will this restore the problem?" This reverts commit 735ec36.
When I last was looking at this I had found some receive ports that needed to be closed, and confirmed that this resolves an existing issue. The other failure that was showing up even after the fix hasn't reproed for me after 3 runs so far on Github Actions. I don't know if it was that migration, or some other fix in the test runner since I last checked which fixed it, or if it just hasn't shown up as a flake yet. I think it makes sense to merge this for now, and if flakes show up again we can restore the hack pending further investigation. |
The `IsolateChannel` should close the receive port when we call `outerChannel.sink.close` which is also added to the `cleanupCallbacks`. This was originally added in #1280 to try to fix cases where the test runner could hang after running some tests specifically on the test runner CI when running the test as a subprocess. That PR went through changes closing a number of ports speculatively, and this was one likely not necessary.
The `IsolateChannel` should close the receive port when we call `outerChannel.sink.close` which is also added to the `cleanupCallbacks`. This was originally added in #1280 to try to fix cases where the test runner could hang after running some tests specifically on the test runner CI when running the test as a subprocess. It looks like when it was introduced there was no call to `outerChannel.sink.close()`. A cleanup callback to close the sink was added in #1941 which made this unnecessary.
Closes #599