Migrate forking to a pooled worker design #1428 #2011

ijjk · 2019-01-10T10:20:35Z

Here's the start of my implementation for migrating forking to a pooled worker design. My implementation has a SingleProcessTestPool, SharedForkTestPool, and ForkTestPool which share a common interface to try to help abstract away the method of executing.

~~Currently, it picks a TestPool to use based on the factors set out on the issue~~:

~~For <N tests (where N is the number of processors), use SingleProcessTestPool; otherwise, use ForkTestPool~~
~~When --no-fork or equivalent flag is provided, force SingleProcessTestPool~~
~~When --inspect or --inspect-brk is specified, force SingleProcessTestPool~~

In the thread, the ForkTestPool was described as distributing the tests to the different forks as they become idle, and I thought it might simplify the execution to divide the tests beforehand and then create the forks to run them. Doing it this way allowed me to re-use the SingleProcessTestPool implementation inside of the forked process.

To achieve the testing inside the single process I updated the import of the runner to use a map with the test file as the key and the runner as the value. I also, utilized vm to provide isolation between running the tests in the same thread. Fixes: #1428 Fixes: #1332

After review by @novemberborn, there are now no breaking changes and use of the new test pools are opt-in. Also, the SingleProcessTestPool does not run inside of a sandbox e.g. vm as to avoid having to maintain a sandboxing library since vm2 does not work for this context. You can opt-in to a test pool with below configs/flags

SingleProcessTestPool using the --single-process flag or singleProcess config
SharedForkTestPool using the --share-forks flag or shareForks config

You can also make use of worker_threads now on supported node versions (>= 11.8 or > 10.x with --experimental-worker node flag) using the --worker-threads flag or workerThreads config

TODO:

Add note for --no-fork flag and fork config in docs
Clean up timeouts
Fix failing test cases

IssueHunt Summary

novemberborn · 2019-01-13T16:48:26Z

Hey @ijjk thanks for picking this up! There's a lot of changes here so I may not be able to have a proper look until next weekend. Going by the commit messages it looks like you're really far along already!

ijjk · 2019-01-14T07:39:47Z

@novemberborn okay thanks for the reply! Yeah, the changes started to add up. I'm still sorting out some test cases, and then hopefully it will be good to go.

lo1tuma · 2019-01-23T16:08:53Z

I’ve tried to run this against the test-suite of one of our projects with 1608 tests in total. While ava v1 takes round about 37 seconds, with this changes it seems to be extremely fast (my gut feeling < 5 seconds with the ForkTestPool). The problem is that after 1222 tests have been completed the runner completely hangs and I have to manually abort to process. I see some warnings in the console:

 MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 25 unhandledRejection listeners added. Use emitter.setMaxListeners() to increase limit

The same happens when I use the --no-fork option.

Apart from that, the reporting seems to behave strange, does it show a spinner for every process (I can see 8 for fork and 2 for no-fork)? Here is a screenshot:

As you can see in the screenshot there is also a warning from react which complains about missing requestAnimationFrame. I don’t get this warning with ava v1. I would guess this is related to sandboxing. In some test files I set global.window to an instance of jsdom.

As a follow up feature it would be also nice to provide an option which disables sandboxing in favor of performance.

ijjk · 2019-01-23T23:48:49Z

@lo1tuma thanks for trying out the changes! Yeah, there are still some kinks to work out. I actually just pushed up a change where I removed the use of vm2 since I had to use require in the host context which pretty much eliminated the sandboxing.

I think one of the main reasons it was so much faster when you tried it and then it stalled was because the modules are being cached in memory in require.cache between tests. This caching seems to be causing problems so the current solution I've found is to wipe require.cache. ~~Currently wiping require.cache has a pretty heavy cost on performance so if you have any ideas on more optimized ways of doing this I'd appreciate them.~~

I briefly tried using import-fresh since it seems like a better idea than wiping the entire cache at the start of every test file. I still need to investigate more why it was failing when I implemented it though. Also, is the project with the test suite you ran it against public by any chance?

Edit: nevermind on the require.cache question I ended up implementing a custom vm set up.

novemberborn

Wow there's a lot of work here @ijjk! I've tried my best in a first-pass review. There's a lot of comments, some probably repetitive as I figure out what's going on.

I actually just pushed up a change where I removed the use of vm2 since I had to use require in the host context which pretty much eliminated the sandboxing.

I'd rather use an existing module for sandboxing than have to maintain one ourselves. What do you mean by having to use "`require" in the host context"?

api.js

lib/testpool/single-process-test-pool.js

lib/testpool/test-pool.js

ijjk · 2019-01-27T19:38:01Z

@novemberborn

What do you mean by having to use "`require" in the host context"?

vm2's NodeVM has the option to require in the sandbox or in the host context. When using the sandbox require mode, modules that need to modify the module.constructor._extensions like @babel/register fail since vm2 doesn't expose a full module instance.

You can use the host require mode and it works but this implementation breaks pretty much any sandboxing since any changes made globally or to the module on require are done in the host and not the sandbox.

So for example, in host require mode, if you require a module let's call it browser-setup.js with below contents, the changes will be made in the host context and not the sandbox pretty much breaking the isolation.

const { JSDOM } = require('jsdom')
const { window, document } = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`)
global.window = window
global.document = document
global.fetch = require('node-fetch')

I couldn't find any existing modules that handle this.

vm2 has above problem
isolated-vm runs in a fork and according to the README mainly works with node >= 8.11.2
napajs runs on seperate threads and doesn't support all built-in node_modules

This is why I started working out a bare minimum custom implementation of vm.

novemberborn · 2019-01-29T09:35:46Z

I see.

What do you think of https://nodejs.org/api/worker_threads.html? That should solve the sandboxing and the performance issues. I still see a need for properly forked processes, so the pool switching will be necessary. Perhaps we could support non-sandboxed single process as an opt-in which only works for some use cases.

implementation, added ForkTestPool which is default with current behavior, and added SharedForkTestPool which uses a pool of forks and distributes tests to them as they become available

ijjk · 2019-01-29T22:37:11Z

@novemberborn WorkerThreads look interesting although still experimental. I re-worked my branch to support a non-isolated single process mode, the current mode of creating a new fork for each test, and a shared fork mode which reuses the forks and distributes tests to them as they become available.

The single process mode and shared fork modes are opt-in currently. I could try out a WorkerThread implementation that's opt-in too since it's still in the experimental stage. Was this re-work what you had in mind with the previous comment?

Edit: Not sure why 1 test fails on the mini reporter and 1 on verbose only on Windows in travis-ci. If you have an idea why I'd appreciate it.

novemberborn · 2019-01-30T09:39:58Z

I could try out a WorkerThread implementation that's opt-in too since it's still in the experimental stage. Was this re-work what you had in mind with the previous comment?

Yea. It no longer requires a flag in the recent Node.js 11 release(s), so it's definitely an option for us. It'll be in Node.js 12 soon enough!

moved worker options initializing to before test pool run

ijjk · 2019-01-30T22:06:08Z

@novemberborn I added the option to use Worker Threads instead of forking. The API was pretty similar to child_process.fork so I was able to implement it using ./lib/fork.

It's also set up so that Worker Threads can be shared in the same way that forks can with the --share-forks flag since creating new Worker Threads can be expensive on performance similar to creating new forks.

On one project's test suite that has around 632 tests across 59 files, that is using AVA, I found on github, the suite had below run times. Note, by no-cache I mean the cache under node_modules/.cache has been cleared for the project.

Mode	Run duration in seconds
`ForkTestPool` no cache	48.67s
`ForkTestPool` with cache	38.34s
`SharedForkTestPool` no cache	22.09s
`SharedForkTestPool` with cache	14.91s
`WorkerThreads` no cache	48.43s
`WorkerThreads` with cache	35.76s
`SharedWorkerThreads` no cache	21.51s
`SharedWorkerThreads` with cache	15.60s
`SingleProcessTestPool` no cache	~~Suite can't run in single process~~ 50.44s
`SingleProcessTestPool` with cache	41.13s

I also ran this on styled-jsx's test-suite which is around 174 tests across 16 files and got below run times.

Mode	Run duration in seconds
`ForkTestPool` no cache	10.73s
`ForkTestPool` with cache	6.34s
`SharedForkTestPool` no cache	8.80s
`SharedForkTestPool` with cache	6.30s
`WorkerThreads` no cache	10.84s
`WorkerThreads` with cache	6.96s
`SharedWorkerThreads` no cache	9.17s
`SharedWorkerThreads` with cache	6.37s
`SingleProcessTestPool` no cache	8.11s
`SingleProcessTestPool` with cache	4.36s

I didn't run these under a super controlled environment so they might not be perfect estimates but thought it might be helpful to compare on actual projects using AVA.

lib/test-pools/single-process-test-pool.js

worker thread test for node >= 11.8

in unsupported node version

novemberborn

Oops, looks like my comments were never posted. I'll do that now.

Again apologies for the lack of progress on this PR.

novemberborn · 2019-02-10T17:13:39Z

api.js

+							workerOptions.updateSnapshots = true;
+						}
+
+						this.Fork = Fork;


Why expose this?

This is mostly for mocking during testing. I thought it would be easier only mock here instead of each test-pool that needs Fork

api.js

lib/cli.js

novemberborn · 2019-02-10T17:18:56Z

lib/cli.js

+		ranFromCli: true,
+		shareForks: conf.shareForks,
+		singleProcess: conf.singleProcess,
+		workerThreads: conf.workerThreads


Which combinations do you think should be allowed? I'm tempted to argue it's either --share-forks, or --single-process, *or* --worker-threads.

Yeah that makes sense, combining them can add a lot of complexity

lib/test-pools/single-process-test-pool.js

novemberborn · 2019-02-10T17:38:20Z

lib/test-pools/single-process-test-pool.js

+			exit(1);
+		});
+
+		/* istanbul ignore next */


Why disable code coverage, here and elsewhere?

Removed all except those related to worker_threads, since I can't test them on unsupported node versions it brings down overall code coverage

I'd have expected codecov to combine coverage reports across all the Node.js versions we test on. This isn't happening?

novemberborn · 2019-02-10T17:42:58Z

lib/test-pools/single-process-test-pool.js

+				const {children} = require.cache[file] || {};
+				if (children) {
+					const dependencies = children.map(mod => mod.id);
+					statusListener({type: 'dependencies', dependencies});


This looks like a rather different approach, I guess because of caching we can't build as good a dependency map. But children here won't give us a full picture either. Is there a case for not emitting any of these dependencies? It'll affect the watcher so we should document that. The same will go for shared forks.

This has been updated now that I'm re-using subprocess for SingleProcessTestPool

lib/test-pools/test-pool.js

novemberborn · 2019-02-10T17:46:04Z

lib/worker/main.js

-const runner = require('./subprocess').getRunner();
+// If getRunner is missing AVA was probably required
+// directly instead of in a test
+if (typeof global.getRunner === 'undefined') {


We can't introduce globals. You could still rely on the require cache to access a shared runner.

I updated this to only be used in SingleProcess mode to fix an issue with esm causing the getRunner export not to be set here.

abrenneke · 2019-03-18T00:28:07Z

@ijjk sorry about the late reply - unfortunately I can't share either of the projects :(, I'm happy to keep testing the branch though. I hope we can get the speedup I first experienced, without any errors.

modules

ijjk · 2019-03-18T22:29:13Z

@SneakyMax I understand. I added an option cacheRequire that defaults to true which allows you to either require options.require everytime before new tests or to only require them once before all tests in that process.

Note: this is only used in sharedForks and singleProcess mode. This might help increase performance a bit if your tests are set up to be able to allow cacheRequire to be enabled.

…rker

Retsam · 2019-04-15T16:57:55Z

lib/revert-global.js

+	const storedKeys = {};
+	const storedCache = Object.assign({}, require.cache);
+	const storedExtensions = Object.assign({}, module.constructor._extensions);
+	const preventCache = ['bluebird', '../index.js', './worker/main.js'];


Why is Bluebird being prevented from caching?

I've tried running my tests against this branch, and all of our tests that involve bluebird cancellation are failing: we currently have a 'testBootstrap.js" file the we import before all of our tests which enables bluebird cancellation, but it's not being respected, likely due to this logic here.

Perhaps .getNewLibraryCopy() would be useful so that the test framework can have its own copy of Bluebird without stepping on the application code?

I was clearing bluebird there to try to not interfere with the test's usage of bluebird but you might be right that clearing it there ended up interfering anyways. I moved it to only clear initially so hopefully it doesn't conflict now.

Yup, it looks like that fixed it. All of my cancellation tests are running correctly now.

novemberborn · 2019-04-22T17:56:06Z

Hi @ijjk, sorry I haven't gotten round to picking this up again.

…rker

make sure sema.release is called on bail

bluebird when sharing process

ijjk · 2019-04-27T21:00:19Z

@novemberborn I just updated the branch to resolve conflicts, it should be good to review when you have a chance.

…rker

thasmo · 2019-07-19T18:16:32Z

Awesome work @ijjk! Hope this gets reviewed soon. 👍

novemberborn · 2019-07-21T18:32:45Z

Hope this gets reviewed soon. 👍

Yes me too 😛

This is the next big PR I'm meaning to look at once we land #1947. Apologies for taking so long. It's really hard to find the time (several uninterrupted hours) to go through PRs with this kind of impact.

novemberborn · 2019-09-15T17:37:12Z

Once more, apologies for taking so long in looking at this.

The bad news is, this is still a really large PR! But, the good news is that we're now supporting experimental features. See this commit for an example: 2b8ba3a

@ijjk if you're still keen, perhaps you could make a PR for just one of the options we discussed here. Preferably not anything that needs sandboxes and cleanups. If we put that behind a feature flag we can ship it as a new, opt-in feature without being a breaking change. We can then build on that feature, one PR at a time. It'll be a lot easier to review multiple smaller changes than one big one. We'll also be free to change the implementation as we learn more.

If you're burned out by this experience I totally understand, and again I'm sorry it got to this point. Let us know and perhaps somebody else is interested in picking this up.

This PR is still too daunting for me so I'm going to go ahead and close it. But I'd really like to see these new worker modes 😍

ijjk force-pushed the migrate-forked-worker branch 3 times, most recently from 1774235 to 64cc378 Compare January 11, 2019 22:51

ijjk force-pushed the migrate-forked-worker branch from d2088f5 to aa270a2 Compare January 14, 2019 07:39

ijjk force-pushed the migrate-forked-worker branch from 90a7ff2 to d13df63 Compare January 23, 2019 23:54

novemberborn requested changes Jan 27, 2019

View reviewed changes

Updated SingleProcessTestPool to not use custom vm

db3099c

implementation, added ForkTestPool which is default with current behavior, and added SharedForkTestPool which uses a pool of forks and distributes tests to them as they become available

ijjk force-pushed the migrate-forked-worker branch from b9fd625 to db3099c Compare January 29, 2019 22:28

Remove profile.js since --single-process cli arg replaces it

84905c3

ijjk added 2 commits January 30, 2019 10:56

Added tests for single process and shared forks mode and

7c29f7b

moved worker options initializing to before test pool run

Added support for using worker threads instead of forks

9de961c

iwilson-r7 reviewed Jan 31, 2019

View reviewed changes

lib/test-pools/single-process-test-pool.js Outdated Show resolved Hide resolved

ijjk added 3 commits January 31, 2019 08:41

Updated SingleProcessTestPool to use nowAndTimers for test require

d9b6ce8

Fix fork not exiting in node 6 on debug test and add

3bb7068

worker thread test for node >= 11.8

Only require ./lib/fork from api to allow it to be mocked in test

a14418f

ijjk force-pushed the migrate-forked-worker branch from 107d32d to a14418f Compare February 1, 2019 23:22

ijjk added 2 commits February 1, 2019 22:26

Fix worker_threads code bringing down code coverage

e858df3

in unsupported node version

Add integration tests for single-process-test-pool

fc5cd2c

ijjk changed the title ~~WIP: Migrate forking to a pooled worker design #1428~~ Migrate forking to a pooled worker design #1428 Feb 2, 2019

Merge branch 'master' into migrate-forked-worker

2913fe7

ijjk force-pushed the migrate-forked-worker branch from 99cebe9 to 2913fe7 Compare February 6, 2019 04:44

novemberborn reviewed Mar 3, 2019

View reviewed changes

ijjk added 2 commits March 17, 2019 13:37

Merge branch 'master' into migrate-forked-worker

6292e91

Make updates from review

7d380a2

ijjk added 2 commits March 18, 2019 16:43

Add cacheRequire option to cache options.require

b248781

modules

Merge branch 'master' into migrate-forked-worker

8582b4e

ijjk force-pushed the migrate-forked-worker branch from a06f467 to de3c3d6 Compare March 18, 2019 22:28

Fix error from merge conflict

fb01ed5

ijjk force-pushed the migrate-forked-worker branch from de3c3d6 to fb01ed5 Compare March 19, 2019 01:15

Merge remote-tracking branch 'upstream/master' into migrate-forked-wo…

b79b1a9

…rker

novemberborn mentioned this pull request Apr 9, 2019

Expose test file path within the test process #1976

Closed

Retsam reviewed Apr 15, 2019

View reviewed changes

ijjk added 4 commits April 27, 2019 13:28

Merge remote-tracking branch 'upstream/master' into migrate-forked-wo…

446ac58

…rker

revert to use bluebird for fork-test-pool and

fc8299a

make sure sema.release is called on bail

Remove async-sema and just clear

97dc742

bluebird when sharing process

Fix error from merge

d1689e5

ijjk force-pushed the migrate-forked-worker branch from 1e9e946 to d1689e5 Compare April 27, 2019 21:00

ijjk added 5 commits April 27, 2019 16:33

bump

e873cab

Merge remote-tracking branch 'upstream/master' into migrate-forked-wo…

54dd758

…rker

fix linting

a8b0b11

Update test

7973fae

bump

3d90788

novemberborn closed this Sep 15, 2019

abrenneke mentioned this pull request Jan 3, 2020

Pooled Workers: api.js refactor #2333

Closed

Migrate forking to a pooled worker design #1428 #2011

Migrate forking to a pooled worker design #1428 #2011

Uh oh!

Conversation

ijjk commented Jan 10, 2019 • edited by issuehunt-oss bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

novemberborn commented Jan 13, 2019

Uh oh!

ijjk commented Jan 14, 2019

Uh oh!

lo1tuma commented Jan 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ijjk commented Jan 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

novemberborn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ijjk commented Jan 27, 2019

Uh oh!

novemberborn commented Jan 29, 2019

Uh oh!

ijjk commented Jan 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

novemberborn commented Jan 30, 2019

Uh oh!

ijjk commented Jan 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

novemberborn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abrenneke commented Mar 18, 2019

Uh oh!

ijjk commented Mar 18, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ijjk commented Jan 10, 2019 •

edited by issuehunt-oss bot

Loading

lo1tuma commented Jan 23, 2019 •

edited

Loading

ijjk commented Jan 23, 2019 •

edited

Loading

ijjk commented Jan 29, 2019 •

edited

Loading

ijjk commented Jan 30, 2019 •

edited

Loading