Skip to content

Comments

Workaround ava worker deadlocks#7330

Merged
mergify[bot] merged 4 commits intomasterfrom
mhofman/fix-ava-tests
Apr 7, 2023
Merged

Workaround ava worker deadlocks#7330
mergify[bot] merged 4 commits intomasterfrom
mhofman/fix-ava-tests

Conversation

@mhofman
Copy link
Member

@mhofman mhofman commented Apr 4, 2023

We've been hit by a few test failures where the process pegs the CPU at 100%, and never exits. I am convinced this is a race condition somewhere in Node.js related to message sends from the worker thread combined with a forced exit from that thread.

I implemented a work around in ava to let the worker exit on its own, with a fallback of force termination from the parent if it doesn't, which seem to consistently allow the test to exit.

While working on this, I also noticed that some tests do not shutdown the controller and/or close the DB, which may cause worker hangs.

@mhofman mhofman requested review from michaelfig and turadg April 4, 2023 21:49
Copy link
Member

@turadg turadg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented a work around in ava to let the worker exit on its own, with a fallback of force termination from the parent if it doesn't, which seem to consistently allow the test to exit.

How can we be sure?

If this is the solution, we should try to get it into Ava main so we don't have to patch. Will you file an issue?

While working on this, I also noticed that the bootstrap based tests do now close the DB, which may cause issues.

That change LGTM. I'll hold off on Approve until I better understand the first one though I invite Mfig to approve before I do.

@mhofman
Copy link
Member Author

mhofman commented Apr 4, 2023

How can we be sure?

Given I cannot consistently reproduce this race, it's hard to tell.

I am running into other test failures now, investigating.

If this is the solution, we should try to get it into Ava main so we don't have to patch. Will you file an issue?

Definitely, once I've hammered out the test failures, I'll open an upstream PR.

@mhofman
Copy link
Member Author

mhofman commented Apr 5, 2023

So apparently the failures were a combination of 2 things:

  • The change in Ava resulted in forced/unclean exits to be reported as test general failure. Since this would be a breaking change, I have a fix restoring the previous behavior upcoming.
  • This change in ava highlighted a bunch of tests that didn't properly cleanup after themselves. I have fixed all the ones I could find locally, and am waiting on CI to confirm.

Once CI confirms no more tests fail because of incomplete cleanup, I'll push the fixup to ava to restore the non-failing forced exit behavior. I am wondering if we should somehow report this case to avoid new tests not cleaning up properly. Thoughts?

@mhofman mhofman force-pushed the mhofman/fix-ava-tests branch 2 times, most recently from cbb1363 to ccb6852 Compare April 5, 2023 13:12
@mhofman
Copy link
Member Author

mhofman commented Apr 5, 2023

With the conflict and the fixup! due to addressing testing failure, I assumed force push assuming reviews hadn't started yet.

Edit: The full clean rebase is fae2f48..3289d32

@mhofman mhofman force-pushed the mhofman/fix-ava-tests branch 2 times, most recently from fae2f48 to 6849c32 Compare April 5, 2023 13:22
Copy link
Member

@michaelfig michaelfig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall! I had just a few questions.

Also, I'm not sure I understand the AVA patch's different exit/failure states. I would prefer the behaviour you were asking about: having a different reported state if the worker doesn't clean up properly.

@mhofman mhofman force-pushed the mhofman/fix-ava-tests branch from 912b059 to c5aabc8 Compare April 6, 2023 21:10
@mhofman
Copy link
Member Author

mhofman commented Apr 6, 2023

I would prefer the behaviour you were asking about: having a different reported state if the worker doesn't clean up properly.

So the biggest problem right now is that I'm giving 5 seconds for the worker to exit cleanly, which seem to be not sufficient in CI for large workers like test-message-patterns.js. Maybe I should increase the fallback timeout?

@mhofman mhofman added the automerge:rebase Automatically rebase updates, then merge label Apr 7, 2023
@mhofman mhofman force-pushed the mhofman/fix-ava-tests branch from 54f3285 to 70324f6 Compare April 7, 2023 23:15
@mergify mergify bot merged commit 4e72872 into master Apr 7, 2023
@mergify mergify bot deleted the mhofman/fix-ava-tests branch April 7, 2023 23:56
mergify bot added a commit that referenced this pull request Jan 22, 2026
closes: #9083

## Description

Breaking changes in v6: https://github.com/avajs/ava/releases/tag/v6.0.0

Relevant subset:
- When tests finish, worker threads or child processes are no longer exited through proces.exit(). If your test file does not exit on its own, the test run will time out. avajs/ava#3260

- Failed assertions now throw, meaning that any subsequent code is not executed. This also impacts the type definitions. avajs/ava#3246

- [Only native errors](https://nodejs.org/api/util.html#utiltypesisnativeerrorvalue) are now considered errors by the t.throws() and t.throwsAsync() assertions. [Object.create(Error.prototype) is not a native error](https://github.com/avajs/ava/blob/v6.0.0/Object.create(Error.prototype)). avajs/ava#3229

This removes our local patches of Ava
- #7619
- #7330

### Security Considerations

none, no runtime change

### Scaling Considerations

none, no runtime change

### Documentation Considerations

none

### Testing Considerations

CI

### Upgrade Considerations

none, no runtime change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automerge:rebase Automatically rebase updates, then merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants