Skip to content

Sometimes build tests don't complete #5394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cristianoc opened this issue May 26, 2022 · 15 comments · Fixed by #5471
Closed

Sometimes build tests don't complete #5394

cristianoc opened this issue May 26, 2022 · 15 comments · Fixed by #5471

Comments

@cristianoc
Copy link
Collaborator

cristianoc commented May 26, 2022

Experience that on my m1 laptop, every few time build tests run, they stop halfway through and never progress.
It might be what's happening here too:
https://github.com/rescript-lang/rescript-compiler/actions/runs/2390862882

@cristianoc
Copy link
Collaborator Author

If the build tests are run in sequence, and not in parallel, they always complete.

@jchavarri
Copy link
Contributor

jchavarri commented Jun 25, 2022

I took a look at this, I can reproduce with some test suite like super errors, ~1 out of every 10 runs:

node ./jscomp/build_tests/super_errors/input.js

When it hangs, the callback to child_process.exec here is never called for the test that never finishes. The only reason why this could happen is that the test process is still running.

So I took a look at the processes that remain running (ps aux | grep bsc). The strangest thing is that the bsc.exe process ends up in zombie state, while the node process that is started from the bsc script (which the test input.js script calls) ends up in sleeping state:

j                76961   0.0  0.0        0      0 s003  Z+   11:00PM   0:00.00 (bsc.exe)
j                76909   0.0  0.1  4642632  21596 s003  S+   11:00PM   0:00.08 node /Users/j/Development/github/rescript-compiler/bsc -w +A -color always /Users/j/Development/github/rescript-compiler/jscomp/build_tests/super_errors/fixtures/partial_app.res

So, either something broken here:

https://github.com/rescript-lang/rescript-compiler/blob/3c5f1a2430c67c4c09afb94c1e1ce5b7bb9582c1/bsc#L16

Or bsc.exe is not terminating properly? Although zombie state seems to indicate otherwise.

The only related issue I could find is nodejs/node#40189.

@cristianoc
Copy link
Collaborator Author

So you hit the mystery hanging bug.
It used to happen more often, when execFile was used, instead of the Sync one, and the tests were run in parallel.
After trying to simplify tests, reached the same conclusion that it could be some low-level mechanism at play here. But then the question is why it does not happen more often to more projects.
Perhaps the fact that processes write to stdout and stderr, flushing etc, is relevant, but this is pure speculation.

@cristianoc
Copy link
Collaborator Author

Long shot: wondering whether this could be implicated #5206

@cristianoc
Copy link
Collaborator Author

cristianoc commented Jun 26, 2022

I had not noticed there was parallelism left within a single test.
Looks like this:

var  bsc = path.join(__dirname,'..','..','..','darwinarm64', 'bsc.exe')

speeds up the tests a lot and removed the hanging.
So it must be something about resolving the executable name.

@cristianoc
Copy link
Collaborator Author

cristianoc commented Jun 26, 2022

@jchavarri this solved it for me: #5469
But I don't know why.

Even if one wants to do this, the logic should be used in the other build tests too. Though a bunch of tests call rescript not bsc.

@cristianoc
Copy link
Collaborator Author

It appears that the trigger is inherit in bsc here:

child_process.execFileSync(exe, delegate_args, { stdio: "inherit" });

Again not completely sure why but, this seems relevant https://www.freecodecamp.org/news/node-js-child-processes-everything-you-need-to-know-e69498fe970a/
esp. the sentence "When those streams get closed, the child process that was using them will emit the close event. This close event is different than the exit event because multiple child processes might share the same stdio streams and so one child process exiting does not mean that the streams got closed.".

@jchavarri
Copy link
Contributor

jchavarri commented Jun 26, 2022

I can confirm calling bsc.exe directly fixes the problem for me as well.

I got a bit carried away and added some helper bin_path.js file to derive paths to binaries in #5471, but maybe it's too intrusive -> can lead to build issues, so the more lean approach in #5469 might be better for now.

@cristianoc
Copy link
Collaborator Author

That's the kind of thing I had in mind, except: bsc itself is untouchable.
So 2 copies of the logic in total in the repo should do it.

jchavarri added a commit to jchavarri/rescript-compiler that referenced this issue Jun 26, 2022
@jchavarri
Copy link
Contributor

Why is that? removed changes in a373cb7.

@cristianoc
Copy link
Collaborator Author

Why is that? removed changes in a373cb7.

It's the most run thing in production. So any change there is dangerous. From perf to corner cases or anything.

@cristianoc
Copy link
Collaborator Author

cristianoc commented Jun 26, 2022

The other build tests, it would be good to go back to being able to run them in parallel.
So their calls to bsc and rescript might need the same treatment.

@jchavarri
Copy link
Contributor

jchavarri commented Jun 26, 2022

It's the most run thing in production. So any change there is dangerous. From perf to corner cases or anything.

I thought for production projects ninja rules would point to bsc.exe directly. If it's calling bsc node script, that'd be very bad for performance 😱

@cristianoc
Copy link
Collaborator Author

It's the most run thing in production. So any change there is dangerous. From perf to corner cases or anything.

I thought for production projects ninja rules would point to bsc.exe directly. If it's calling bsc node script, that'd be very bad for performance 😱

Fair point. We just need to double check it's the case.
Plus bsc is part of the "official api" so we just need to make sure it works in an installation (the relevant files are included).

@jchavarri
Copy link
Contributor

jchavarri commented Jun 26, 2022

let me know if d16bf32 is what you had in mind. There might be some additional cleanup in ciTest.js now that installing npm package is not a requirement to run the tests 🤔

cristianoc pushed a commit that referenced this issue Jun 26, 2022
mununki pushed a commit to mununki/rescript-compiler that referenced this issue Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants