Remove hard exit hack #1280

natebosch · 2020-06-19T20:26:24Z

Closes #599

Fixes #1278 Copy the `_loopback` code from `package:multi_server_socket` but model it as a `List<ServerSocket>` instead of a single server. Remove the unnecessary arguments for handling anything other than port 0.

See what fails...

grouma · 2020-06-19T20:42:28Z

Note this fixes flakiness so you may need to run Travis a few times.

…t-travis

natebosch · 2020-06-19T20:56:27Z

I'm wondering if the hack covered up the problem in #1278 (comment)

I'm also wondering if the fix could give us some clue why we have needed this hack. The bug that I fixed with f355958 is an unhandled async error. But since that error never surfaces I'm not sure where it's disappearing too, and if the error is preventing the VM from shutting down somehow.

I'm not 100% sure that the unhandled async error I fixed there makes a real difference though, or if it's something else going on.

If there is no exception and we never close the servers they can hold the process open. I'm not sure if this fully explains the flakiness on travis...

natebosch · 2020-06-19T22:12:25Z

I wonder if there were multiple things that could trigger this.

It seems to me like the node socket servers not getting shut down should cause the VM to stay open, but the fact that this is the happy case and not the error case makes me wonder why it doesn't always cause a problem.

test/pkgs/test/lib/src/runner/node/platform.dart

Lines 127 to 130 in c2f712c

    
           } catch (_) { 
        
             unawaited(server.close().catchError((_) {})); 
        
             rethrow; 
        
           }

We seem to be able to reliably trigger this behavior with a bug in the node platform which causes an unhandled async error.

That error would be effectively swallowed at

test/pkgs/test_api/lib/src/backend/invoker.dart

Lines 355 to 357 in 58383da

    
           // However, users don't think of load tests as "tests", so the error isn't 
        
           // helpful for them. 
        
           if (liveTest.suite.isLoadSuite) return;

But I don't know why having that error surface there specifically would be a problem.

natebosch · 2020-06-20T00:12:16Z

Ugh, fixing the new node issue does not fix things overall, we can still hit flakes.

natebosch · 2020-06-20T19:31:20Z

Skipping the test pkgs/test/test/runner/hybrid_test.dart looks like it may resolve things. It's flaky so it's hard to be sure but I've had a few runs through travis without seeing the problem surface.

There was one run where there was a failure that looks really similar - in this case the test runner as a subprocess did emit the correct output, but then never exited so when the test timed out it was killed. This doesn't explain the outer timeouts that need to be killed by travis.

https://travis-ci.org/github/dart-lang/test/jobs/700382066

If I run this test locally I don't ever see a failure or case where the test runner hangs, so far 25 successes in a row.

I might verify that restoring the test restores the flakiness, then see if I can narrow it down to any single test case.

After these hacks there is no `Invoker` retained in hello world test cases. - Don't pass through executable arguments. This isn't safe for things like `--enable-vm-service`. - Eagerly resolve some static futures. - Add an extra delay for paranoia, and print when observatory should be "settled". - Don't use an AsyncMemo for closing the runner. Refactor away from async/await to old style future chaining. This prevents the `_awaiter` field on the Future from holding on to the entire closure context and preventing GC. - Avoid reusing a static final future. Change to a getter which reads from a variable if it has already been resolved. - Fix some bugs around double newlines in observatory message.

Hack around bugs in the VM service protocol with uncaught errors.

natebosch · 2020-11-14T01:34:08Z

Hmm, I had hoped it was those ReceivePort instances, but it looks like maybe not. Looks like the same type of problem still happening. I wonder if there are multiple root causes.

18:29 +99 ~3 -1: test/runner/coverage_test.dart: with the --coverage flag, gathers coverage for Chrome tests [E]                                                                                       
  TimeoutException after 0:00:30.000000: Test timed out after 30 seconds. See https://pub.dev/packages/test#timeouts
  package:test_api/src/backend/invoker.dart 318:28  Invoker._handleError.<fn>
  dart:async                                        _CustomZone.run
  package:test_api/src/backend/invoker.dart 316:10  Invoker._handleError
  package:test_api/src/backend/invoker.dart 272:9   Invoker.heartbeat.<fn>.<fn>
  dart:async                                        _CustomZone.run
  package:test_api/src/backend/invoker.dart 271:38  Invoker.heartbeat.<fn>
  
Process `dart bin/test.dart` was killed with SIGKILL in a tear-down. Output:
    00:00 +0: compiling test.dart
    Compiled 9,441,162 characters Dart to 1,107,621 characters JavaScript in 8.02 seconds
    Dart file /tmp/dart_test_SXVRVD/runInBrowser.dart compiled to JavaScript: ../dart_test_HHBTDN/test_ZOUPUH/test.dart.browser_test.dart.js
    
    00:00 +0: test 1
    00:00 +1: All tests passed!
22:21 +130 ~3 -1: Some tests failed.

…ctions, will this restore the problem?

…github actions, will this restore the problem?" This reverts commit 735ec36.

natebosch · 2021-06-23T21:04:06Z

When I last was looking at this I had found some receive ports that needed to be closed, and confirmed that this resolves an existing issue.

The other failure that was showing up even after the fix hasn't reproed for me after 3 runs so far on Github Actions. I don't know if it was that migration, or some other fix in the test runner since I last checked which fixed it, or if it just hasn't shown up as a flake yet.

I think it makes sense to merge this for now, and if flakes show up again we can restore the hack pending further investigation.

The `IsolateChannel` should close the receive port when we call `outerChannel.sink.close` which is also added to the `cleanupCallbacks`. This was originally added in #1280 to try to fix cases where the test runner could hang after running some tests specifically on the test runner CI when running the test as a subprocess. That PR went through changes closing a number of ports speculatively, and this was one likely not necessary.

The `IsolateChannel` should close the receive port when we call `outerChannel.sink.close` which is also added to the `cleanupCallbacks`. This was originally added in #1280 to try to fix cases where the test runner could hang after running some tests specifically on the test runner CI when running the test as a subprocess. It looks like when it was introduced there was no call to `outerChannel.sink.close()`. A cleanup callback to close the sink was added in #1941 which made this unnecessary.

natebosch added 4 commits June 18, 2020 15:26

Add back multi server for node platform

e38374b

Fixes #1278 Copy the `_loopback` code from `package:multi_server_socket` but model it as a `List<ServerSocket>` instead of a single server. Remove the unnecessary arguments for handling anything other than port 0.

Typo

6fcd27f

Merge streams instead of calling .first on each

f355958

Remove hard exit hack

3cf83d3

See what fails...

googlebot added the cla: yes label Jun 19, 2020

No analyze and format

1db9e9d

Merge commit '6fcd27fa4c6abdfe46b0410b50a43cb366f75ccd' into hard-exi…

b0deff4

…t-travis

natebosch added 2 commits June 19, 2020 14:31

Merge branch 'restore-ipv4-node' into hard-exit-travis

9f9c025

Use finally block

58383da

If there is no exception and we never close the servers they can hold the process open. I'm not sure if this fully explains the flakiness on travis...

Merge branch 'restore-ipv4-node' into hard-exit-travis

03a5781

natebosch added 8 commits June 19, 2020 17:28

Merge branch 'master' into hard-exit-travis

46ed7d6

Skip signal tests to see if the flake still shows up

4431a17

Skip browser loader tests.

9b76490

Skip more browser tests

98a5cf0

Skip top level tests

b49ee01

Skip more

a804ac1

Skip hybrid test

f4d36e5

Unskip everything other than hybrid test

920e6b3

natebosch added 7 commits June 20, 2020 12:32

Unskip - verify the flakes resurface

f7e3c22

Try running just the hybrid tests

9bf24fd

Keep sharding

08df850

Skip test_api tests

66a078e

Expanded reporter

671417a

Remove some test cases

96c75dc

Run the same shard multiple times

11a9bd9

natebosch added 13 commits June 23, 2020 18:11

Write context ref and indexes

8716882

Try to print retaining root names

9503eb0

Hack around bugs in the VM service protocol with uncaught errors.

Merge branch 'master' into hard-exit-travis

16fd219

GO back to running through the real test, verify it is still failing

e19afa3

Go back to normal, verify still fails

96a8f8b

Cancel subscriptions in expanded reporter

f0e6ded

Merge branch 'master' into hard-exit-travis

e9ee364

CLose some recieve ports

ff28c1b

Merge branch 'master' into hard-exit-travis

194a48a

Back to narrow travis

3462c3f

Back to typical job setup

bcab1f0

Dev versions

f28164f

natebosch added 6 commits June 23, 2021 09:50

Merge branch 'master' into hard-exit-travis

8944588

Try undoing the port closing - it doesn't currently repro on github a…

735ec36

…ctions, will this restore the problem?

Revert "Try undoing the port closing - it doesn't currently repro on …

628d062

…github actions, will this restore the problem?" This reverts commit 735ec36.

Merge branch 'master' into hard-exit-travis

8812181

Clean up other references

3fed1c9

Update mono repo

211a5b3

natebosch marked this pull request as ready for review June 23, 2021 21:04

natebosch requested review from grouma and jakemac53 June 23, 2021 21:08

grouma approved these changes Jun 23, 2021

View reviewed changes

natebosch merged commit d8a0918 into master Jun 23, 2021

natebosch deleted the hard-exit-travis branch June 23, 2021 21:36

natebosch mentioned this pull request Mar 6, 2024

Let the IsolateChannel close the ReceivePort #2196

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove hard exit hack #1280

Remove hard exit hack #1280

Uh oh!

natebosch commented Jun 19, 2020 •

edited

Loading

Uh oh!

grouma commented Jun 19, 2020

Uh oh!

natebosch commented Jun 19, 2020

Uh oh!

natebosch commented Jun 19, 2020

Uh oh!

natebosch commented Jun 20, 2020

Uh oh!

natebosch commented Jun 20, 2020

Uh oh!

natebosch commented Nov 14, 2020

Uh oh!

natebosch commented Jun 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Remove hard exit hack #1280

Remove hard exit hack #1280

Uh oh!

Conversation

natebosch commented Jun 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grouma commented Jun 19, 2020

Uh oh!

natebosch commented Jun 19, 2020

Uh oh!

natebosch commented Jun 19, 2020

Uh oh!

natebosch commented Jun 20, 2020

Uh oh!

natebosch commented Jun 20, 2020

Uh oh!

natebosch commented Nov 14, 2020

Uh oh!

natebosch commented Jun 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

natebosch commented Jun 19, 2020 •

edited

Loading