Skip to content

test: fix test-repl-import-referrer flakiness #58014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

dario-piotrowicz
Copy link
Member

@dario-piotrowicz dario-piotrowicz commented Apr 24, 2025

Fixes #58009

@nodejs-github-bot nodejs-github-bot added needs-ci PRs that need a full CI run. test Issues and PRs related to the tests. labels Apr 24, 2025
Copy link

codecov bot commented Apr 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.26%. Comparing base (b665127) to head (01e3f9d).
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #58014      +/-   ##
==========================================
- Coverage   90.27%   90.26%   -0.01%     
==========================================
  Files         630      630              
  Lines      186158   186158              
  Branches    36475    36474       -1     
==========================================
- Hits       168047   168031      -16     
- Misses      10980    10988       +8     
- Partials     7131     7139       +8     

see 22 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

setTimeout(() => {
child.stdin.write('await import(\'./message.mjs\');\n');
child.stdin.write('.exit');
}, 250);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 250? Did you try other values? Could we use a lower one? We should probably be using common.platformTimeout() btw

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 250? Did you try other values? Could we use a lower one?

250 is totally arbitrary 😓 , I did try different values, 100 seems to be working fine on my
machine but I increased the value as I think a bigger delay would probably be needed in CI

would you prefer me to put 100 instead and see how this performs in CI?

We should probably be using common.platformTimeout() btw

ah yeah for sure, sorry I forgot about this 😓

@lpinca
Copy link
Member

lpinca commented Apr 25, 2025

please let me know if you think there are better alternatives 🙏

The following patch seems to fix the issue on my machine.

diff --git a/test/parallel/test-repl-import-referrer.js b/test/parallel/test-repl-import-referrer.js
index 9c3e961e5e1..57aae7858ab 100644
--- a/test/parallel/test-repl-import-referrer.js
+++ b/test/parallel/test-repl-import-referrer.js
@@ -8,23 +8,24 @@ const args = ['--interactive'];
 const opts = { cwd: fixtures.path('es-modules') };
 const child = cp.spawn(process.execPath, args, opts);
 
-const outputs = [];
+let output = '';
 child.stdout.setEncoding('utf8');
 child.stdout.on('data', (data) => {
-  outputs.push(data);
-  if (outputs.length === 3) {
+  output += data;
+
+  if (
+    !child.stdin.writableEnded &&
+    output.includes('[Module: null prototype] { message: \'A message\' }')
+  ) {
     // All the expected outputs have been received
     // so we can close the child process's stdin
     child.stdin.end();
   }
 });
 
-child.on('exit', common.mustCall(() => {
-  const results = outputs[2].split('\n')[0];
-  assert.strictEqual(
-    results,
-    '[Module: null prototype] { message: \'A message\' }'
-  );
+child.on('exit', common.mustCall((code, signal) => {
+  assert.strictEqual(code, 0);
+  assert.strictEqual(signal, null);
 }));
 
 child.stdin.write('await import(\'./message.mjs\');\n');

@dario-piotrowicz
Copy link
Member Author

please let me know if you think there are better alternatives 🙏

The following patch seems to fix the issue on my machine.

...

Thanks a lot @lpinca 🙂

Checking for the output string in the on data listener is something that occurred to me too, but I don't attempt to go that route as I think it doesn't leads to nice patterns here 😕 😓

In your suggested solution specifically I have concerns about the readability of the test, using '[Module: null prototype] { message: \'A message\' }' as signal for when to exit seems to me much much less clear than asserting that the value is in the output (basically we're asserting that value in an indirect way, we could re-assert it in the on exit listener but that feels pretty unnecessary/ugly 😓)

Anyways if you strongly believe that your suggested solution is better (I'd also be curious to see other opinions on it hopefully) I am totally fine to go with it, as using setTimeout is indeed pretty nasty in its own way 😓

@lpinca
Copy link
Member

lpinca commented Apr 25, 2025

Checking for the output string in the on data listener is something that occurred to me too, but I don't attempt to go that route as I think it doesn't leads to nice patterns here 😕 😓

In your suggested solution specifically I have concerns about the readability of the test, using '[Module: null prototype] { message: \'A message\' }' as signal for when to exit seems to me much much less clear than asserting that the value is in the output

What is the difference? Currently it is based on counting data chunks which is not reliable. I definitely think it is way better than adding a random delay. Timers are known to cause flakiness.

@dario-piotrowicz
Copy link
Member Author

What is the difference?

The main difference I see is that we're not directly asserting that the string is in the output which is the whole point of the test 😖

Currently it is based on counting data chunks which is not reliable.

Is it not? how come? isn't that deterministic? 😮

I definitely think it is way better than adding a random delay.

yeah I think that this is a very fair point of view 👍

@lpinca
Copy link
Member

lpinca commented Apr 25, 2025

The main difference I see is that we're not directly asserting that the string is in the output which is the whole point of the test 😖

We do, if there isn't the test times out.

@dario-piotrowicz
Copy link
Member Author

We do, if there isn't the test times out.

yes, of course we do, I was saying that it's not being done in a direct way (there is not assert.strictEqual() being used for that)

this in my opinion reduces readability and it might make the issue more difficult to diagnose since the failure would be a timeout instead of an assert failure (although someone could argue that given the current flakes that's almost already the case 😓)

@dario-piotrowicz dario-piotrowicz changed the title test: add setTimeout to remove test-repl-import-referrer flakiness test: fix test-repl-import-referrer flakiness Apr 25, 2025
@lpinca
Copy link
Member

lpinca commented Apr 25, 2025

yes, of course we do, I was saying that it's not being done in a direct way (there is not assert.strictEqual() being used for that)

There are hundreds of tests like this.

Is it not? how come? isn't that deterministic? 😮

Here is an example when test currently fails found by applying this patch

diff --git a/test/parallel/test-repl-import-referrer.js b/test/parallel/test-repl-import-referrer.js
index 9c3e961e5e1..41204efda17 100644
--- a/test/parallel/test-repl-import-referrer.js
+++ b/test/parallel/test-repl-import-referrer.js
@@ -12,6 +12,7 @@ const outputs = [];
 child.stdout.setEncoding('utf8');
 child.stdout.on('data', (data) => {
   outputs.push(data);
+  console.log(outputs)
   if (outputs.length === 3) {
     // All the expected outputs have been received
     // so we can close the child process's stdin
=== release test-repl-import-referrer ===                   
Path: parallel/test-repl-import-referrer
[
  'Welcome to Node.js v24.0.0-pre.\nType ".help" for more information.\n> '
]
[
  'Welcome to Node.js v24.0.0-pre.\nType ".help" for more information.\n> ',
  "[Module: null prototype] { message: 'A message' }\n> "
]
Command: out/Release/node --test-reporter=./test/common/test-error-reporter.js --test-reporter-destination=stdout /Users/luigi/code/node/test/parallel/test-repl-import-referrer.js
--- TIMEOUT ---

As you can see there are only 2 chunks, the test waits for 3 before calling child.stdin.end();. This is issue.

@dario-piotrowicz dario-piotrowicz force-pushed the dario/58009/test-repl-import-referrer-flaky branch from c8c1e6c to 7f7cea4 Compare April 25, 2025 08:31
@dario-piotrowicz
Copy link
Member Author

There are hundreds of tests like this.

👍

As you can see there are only 2 chunks, the test waits for 3 before calling child.stdin.end();. This is issue.

ah ok, sorry I thought that the chunking would be pretty stable/deterministic (I guess it is on the same machine?), thanks for pointing this 🙏

@dario-piotrowicz
Copy link
Member Author

Thanks a lot for the suggested change @lpinca 🙏, I've applied it 🙂

@lpinca lpinca added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 25, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 25, 2025
@nodejs-github-bot
Copy link
Collaborator

@aduh95
Copy link
Contributor

aduh95 commented Apr 25, 2025

@lpinca lpinca added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 25, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 25, 2025
@nodejs-github-bot
Copy link
Collaborator

@aduh95 aduh95 added the commit-queue Add this label to land a pull request using GitHub Actions. label Apr 25, 2025
@lpinca lpinca added the commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. label Apr 25, 2025
@aduh95 aduh95 added author ready PRs that have at least one approval, no pending requests for changes, and a CI started. and removed commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. labels Apr 25, 2025
@lpinca
Copy link
Member

lpinca commented Apr 25, 2025

I was thinking of a simpler fix with fewer changes than the original test

diff --git a/test/parallel/test-repl-import-referrer.js b/test/parallel/test-repl-import-referrer.js
index 9c3e961e5e1..c1b82198a25 100644
--- a/test/parallel/test-repl-import-referrer.js
+++ b/test/parallel/test-repl-import-referrer.js
@@ -8,24 +8,19 @@ const args = ['--interactive'];
 const opts = { cwd: fixtures.path('es-modules') };
 const child = cp.spawn(process.execPath, args, opts);
 
-const outputs = [];
+let output = '';
 child.stdout.setEncoding('utf8');
 child.stdout.on('data', (data) => {
-  outputs.push(data);
-  if (outputs.length === 3) {
-    // All the expected outputs have been received
-    // so we can close the child process's stdin
-    child.stdin.end();
-  }
+  output += data;
 });
 
 child.on('exit', common.mustCall(() => {
-  const results = outputs[2].split('\n')[0];
-  assert.strictEqual(
-    results,
-    '[Module: null prototype] { message: \'A message\' }'
-  );
+  const results = output.replace(/^> /mg, '').split('\n').slice(2);
+  assert.deepStrictEqual(results, [
+    '[Module: null prototype] { message: \'A message\' }',
+    ''
+  ]);
 }));
 
 child.stdin.write('await import(\'./message.mjs\');\n');
-child.stdin.write('.exit');
+child.stdin.write('.exit\n');

but it fails as an ERR_USE_AFTER_CLOSE error is thrown. Is this expected or it is an unwanted effect of 8e7f32f? I think no error should be thrown.

@dario-piotrowicz
Copy link
Member Author

I was thinking of a simpler fix with fewer changes than the original test

...

but it fails as an ERR_USE_AFTER_CLOSE error is thrown. Is this expected or it is an unwanted effect of 8e7f32f? I think no error should be thrown.

No, for sure this is not expected... and I would imagine it to indeed most likely be an unwanted effect of 8e7f32f, the fact that you see the error without manually closing the child's stdin is pretty concerning to me 😕

I'll have a look as soon as I can (later today/tomorrow), I wonder if this is so severe that 8e7f32f should be reverted 😖

@lpinca lpinca removed the commit-queue Add this label to land a pull request using GitHub Actions. label Apr 25, 2025
@lpinca
Copy link
Member

lpinca commented Apr 25, 2025

I wonder if this is so severe that 8e7f32f should be reverted 😖

I was thinking the same thing.

@dario-piotrowicz
Copy link
Member Author

I wonder if this is so severe that 8e7f32f should be reverted 😖

I was thinking the same thing.

I'm happy to investigate this, but if we're concerned about user's impact it could be safer to revert sooner rather than later...

since #57680 was marked as semver-major can I assume that only users using node 25 (when it will come out) would potentially be affected?

@lpinca
Copy link
Member

lpinca commented Apr 25, 2025

Yes, there are also nightly builds, but I don't expect people to use them in production 😄.

FWIW, I can also reproduce it manually

$ ./node
Welcome to Node.js v24.0.0-pre.
Type ".help" for more information.
> await import('./test/fixtures/es-modules/message.mjs');
[Module: null prototype] { message: 'A message' }
> .exit
node:internal/readline/interface:603
      throw new ERR_USE_AFTER_CLOSE('readline');
      ^

Error [ERR_USE_AFTER_CLOSE]: readline was closed
    at REPLServer.pause (node:internal/readline/interface:603:13)
    at REPLServer.<anonymous> (node:internal/main/repl:48:16)
    at REPLServer.emit (node:events:519:35)
    at REPLServer.emit (node:domain:489:12)
    at REPLServer.emitExit (node:repl:841:10)
    at REPLServer.emit (node:events:519:35)
    at REPLServer.emit (node:domain:489:12)
    at REPLServer.close (node:internal/readline/interface:594:10)
    at node:repl:1115:5
    at process.processTicksAndRejections (node:internal/process/task_queues:85:11) {
  code: 'ERR_USE_AFTER_CLOSE'
}

Node.js v24.0.0-pre

@dario-piotrowicz
Copy link
Member Author

@lpinca yes I can as well! 😓

ok this is pretty bad... (I'm surprised that no tests consistently failed because of this 😓)

maybe I should close this PR and revert 8e7f32f right away to be on the safe side, then try to reintroduce the change but also making sure that there's no repl error being thrown... what do you think? 😓

(PS: I'm really sorry for the inconvenience caused here, I'm really surprised that something like this managed to sneak through 😓)

@lpinca
Copy link
Member

lpinca commented Apr 25, 2025

I think it makes sense.

(PS: I'm really sorry for the inconvenience caused here, I'm really surprised that something like this managed to sneak through 😓)

No problem, it happens.

@dario-piotrowicz
Copy link
Member Author

Closing in favour of #58024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. needs-ci PRs that need a full CI run. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

parallel/test-repl-import-referrer is flaky
4 participants