-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[lit] Echo full RUN lines in case of external shells #65267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lit] Echo full RUN lines in case of external shells #65267
Conversation
llvm/utils/lit/lit/TestRunner.py
Outdated
commands[j] += f": {shlex.quote(command.lstrip())} >&2 " \ | ||
f"&& {command}" | ||
else: | ||
commands[j] += " has no command after substitutions" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a testcase for this codepath? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, especially given that I forgot the stderr redirection in that case. I pushed a commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
I like this. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too.
0054ad4
to
21444d6
Compare
Before <https://reviews.llvm.org/D154984> and <https://reviews.llvm.org/D156954>, lit reported full RUN lines in a `Script:` section. Now, in the case of lit's internal shell, it's the execution trace that includes them. However, if lit is configured to use an external shell (e.g., bash, windows `cmd`), they aren't reported at all. A fix was requested at the following: * <https://reviews.llvm.org/D154984#4627605> * <https://discourse.llvm.org/t/rfc-improving-lits-debug-output/72839/35?u=jdenny-ornl> This patch does not correctly address the case when the external shell is windows `cmd`. As discussed at <llvm#65242>, it's not clear whether that's a use case that people still care about, and it seems to be generally broken anyway.
21444d6
to
9191ba7
Compare
Thanks for the reviews. |
Looks like this PR might cause unexpected failure in testing, and kill buildbot worker (also python code). |
Buildbots failed after this landed, as reported at: <#65267 (comment)> This reverts commit 9191ba7.
I pushed a revert (efec733). I cannot tell from the logs what happened. |
@jsji Thanks for alerting me to the bot fails. The revert seems to have fixed at list one bot. Did you see any additional clue in the logs about what happened? All I see is:
I don't know how to debug that. |
Sorry, no idea about why either. |
It's probably not much help, but this affected one of my build bots, and when I ran check-all manually on the machine, it died with an error "User defined signal 2":
|
We saw the same issue on our bots as well, see for example: We did some investigation but haven't found the culprit either, but we saw trap faults in the kernel log. |
Thanks for the info, and sorry for all the trouble. I am now able to reproduce on a local system: |
I'm running into this issue on the s390x builder as well. It seems the problem happens in the ./compiler-rt/test/fuzzer/fork-sigusr.test case here:
Normally, the PID variable is set to the pid of the ForkSIGUSR process, which then gets killed with SIGUSR2 and terminates cleanly. However, with this patch applied, the PID variable get set to the pid of a shell process executing
The shell eats the SIGUSR2 and prints a message, but the ForkSIGUSR process never gets the signal and keeps running indefinitely. This somehow causes the buildbot connection to terminate - not sure exactly why but possibly some bad reaction to that shell getting the signal? Unfortunately whenever I re-start the build bot, it thinks it need to re-try the previous build since that had the failed connection - but that previous build still has this patch applied and therefore terminates again :-( I'm trying to work around this now. |
@uweigand Thanks! That was the clue I needed. I found there are four tests that have a similar pattern and cause trouble on my local system:
If I disable those tests, check-fuzzer behaves as at the parent commit. I have a simple lit fix I've pushed to this branch. It also makes check-fuzzer behave as at the parent commit. I'm hesitant to land the new version as last time was a bit of a disaster. Based on @uweigand's comments, some bots are still recovering. Sorry! Who knows what other trouble might be lurking on the next attempt. Any advice on how to proceed? |
Here's the fix: jdenny-ornl@61e272e Unfortunately, I made the mistake of pushing the revert directly to git (old workflow) instead of clicking the revert button here in the PR. I'm not sure how to reopen the PR now. I can start a new PR if that's desirable. |
@uweigand if you are still trying to recover your buildbot, this is the process I used which I was able to recover mine. Thanks to the info everyone found here, it seems there are 4 affected tests fork-sigusr.test, merge-sigusr.test, sigint.test and sigusr.test. With the worker offline, I manually edited the files to add a "Completed" job: https://lab.llvm.org/buildbot/#/builders/247/builds/8680 |
Thanks for the tips. I've managed to get the bot going again by clearing the build queues in the web interface, so it would reset to current mainline. All s390x bots are now green again. |
This reverts commit 19b44c2. The reason for the revert is discussed at: https://discourse.llvm.org/t/rfc-improving-lits-debug-output/72839/52
Before <https://reviews.llvm.org/D154984> and <https://reviews.llvm.org/D156954>, lit reported full RUN lines in a `Script:` section. Now, in the case of lit's internal shell, it's the execution trace that includes them. However, if lit is configured to use an external shell (e.g., bash, windows `cmd`), they aren't reported at all. A fix was requested at the following: * <https://reviews.llvm.org/D154984#4627605> * <https://discourse.llvm.org/t/rfc-improving-lits-debug-output/72839/35?u=jdenny-ornl> This patch does not correctly address the case when the external shell is windows `cmd`. As discussed at <llvm#65242>, it's not clear whether that's a use case that people still care about, and it seems to be generally broken anyway.
Buildbots failed after this landed, as reported at: <llvm#65267 (comment)> This reverts commit 9191ba7.
)" This reverts commit 19b44c2. The reason for the revert is discussed at: https://discourse.llvm.org/t/rfc-improving-lits-debug-output/72839/52
I'm trying to figure out how to resurrect this PR after rebasing onto main and applying fixes. It will need additional review. Is it best to just to create a new PR, or is there some way to reopen this one? |
Based on the information I found, it's not possible to reopen this PR, so I created a new one: #66408 |
Buildbots failed after this landed, as reported at: <llvm/llvm-project#65267 (comment)> This reverts commit 9191ba7.
Buildbots failed after this landed, as reported at: <llvm/llvm-project#65267 (comment)> This reverts commit 9191ba7144b39f5af699993d66f3587d5da49759.
Before https://reviews.llvm.org/D154984 and https://reviews.llvm.org/D156954, lit reported full RUN lines in a
Script:
section. Now, in the case of lit's internal shell, it's the execution trace that includes them. However, if lit is configured to use an external shell (e.g., bash, windowscmd
), they aren't reported at all.A fix was requested at the following:
This patch does not correctly address the case when the external shell is windows
cmd
. As discussed at #65242, it's not clear whether that's a use case that people still care about, and it seems to be generally broken anyway.