Skip to content

Conversation

@sluongng
Copy link
Contributor

No description provided.

@sluongng sluongng requested a review from fmeum April 24, 2024 13:36
@sluongng sluongng force-pushed the sluongng/rules-go-47.0 branch from b098753 to a408be3 Compare April 24, 2024 13:39
@sluongng sluongng requested a review from bduffany April 24, 2024 14:10
// require.Error(t, res.Error)
// assert.Contains(t, res.Error.Error(), "signal: terminated")
// assert.Equal(t, -1, res.ExitCode)
// }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bduffany FYI.

I thought this test only killed the sh shell, but for some reason sigterm is being ignored here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can restore the original behavior with signal.Reset(os.SIGTERM), but its slightly suspicious that this is needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try bisecting rules_go tmr to verify the root cause

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bisect confirmed the culprit PR.

signal.Reset did not fix the test, but adding our own signal.Notify handler worked.

@sluongng sluongng force-pushed the sluongng/rules-go-47.0 branch 2 times, most recently from d2452e8 to 63880db Compare April 25, 2024 09:10
@sluongng sluongng force-pushed the sluongng/rules-go-47.0 branch from 63880db to 1f6b176 Compare April 25, 2024 10:48
Arguments: []string{"bash", "-c", `
echo foo-stdout >&1
echo bar-stderr >&2
kill 0 -KILL
Copy link
Member

@bduffany bduffany Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any idea why this change is needed? I'd be surprised if upgrading rules_go broke this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://buildbuddy.buildbuddy.io/invocation/d1ddf6d1-cf88-4de1-89bd-bf39c97ca43d here is an example of this failure.

From https://man7.org/linux/man-pages/man1/kill.1.html, we should use kill -KILL -- 0 to be sure.

It's unclear to me why bazel-contrib/rules_go#3920 triggered this though.

Comment on lines 125 to 124
// Since https://github.com/bazelbuild/rules_go/pull/3920
// We need to make sure that our go test binary properly capture and handle SIGTERM
_, stop := signal.NotifyContext(ctx, syscall.SIGTERM)

res := runSh(ctx, "kill -TERM $$")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the SIGTERM signal here should only be getting sent to the sh process running the kill -TERM command so I'm a bit confused why this change is needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, I am not sure either... I tried to dump out the process tree plus the process group id but I could not tell for certain how the sigterm could propagate from the child process, the /bin/sh shell, to the parent process (the go test binary). Or if there is something else (i.e. Bazel) sending the go binary sigkterm.

A simpler solution here for me is to patch rules_go with the old signal.Notify behavior. Which, as I discussed with the rules_go PR author, seems like it would work.

Raised the rules_go PR for this and in the meantime, patch our own rules_go with that fix.

Copy link
Member

@bduffany bduffany Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was the error that was happening without the _, stop := signal.NotifyContext(ctx, syscall.SIGTERM) workaround? (before applying the revert patch to rules_go)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no the error was pretty much caused by the change in rules_go. I have bisected rules_go to that exact PR.

I think that PR made it so that the test itself could not use sigterm.

Copy link
Member

@bduffany bduffany Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, what was the test failure you were seeing? I understand it was caused by rules_go but trying to understand why

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I misunderstood your question 🤦

Here is an example failure with the revert of patching rules_go https://buildbuddy.buildbuddy.io/invocation/f016c53f-2073-476b-a338-939a56ef720d?target=%2F%2Fenterprise%2Fserver%2Fremote_execution%2Fcommandutil%3Acommandutil_test&targetStatus=6

Here is with a small patch to get a bit more verbose output

--- a/enterprise/server/remote_execution/commandutil/commandutil_unix_test.go
+++ b/enterprise/server/remote_execution/commandutil/commandutil_unix_test.go
@@ -123,6 +123,7 @@ func TestRun_Killed_ErrorResult(t *testing.T) {
        {
                res := runSh(ctx, "kill -TERM $$")

+               t.Logf("ExitCode: %d\nStdout: %s \n---\nStderr: %s\n", res.ExitCode, string(res.Stdout), string(res.Stderr))
                require.Error(t, res.Error)
                assert.Contains(t, res.Error.Error(), "signal: terminated")
                assert.Equal(t, -1, res.ExitCode)

https://buildbuddy.buildbuddy.io/invocation/09926add-3892-4b97-8d24-69955437fb0b?target=%2F%2Fenterprise%2Fserver%2Fremote_execution%2Fcommandutil%3Acommandutil_test&targetStatus=6

We were expecting an error, but the command simply executed successfully.

Copy link
Member

@bduffany bduffany Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, TIL about sigaction. IIUC, the go runtime is ignoring SIGTERM by ultimately doing a syscall like sigaction(SIGTERM, &sigaction{sa_handler: SIG_IGN}, NULL). The installed action handlers are then inherited by child processes, which means that child processes will ignore signals too.

So it does seem like rules_go should not be doing signal.Ignore as that will likely break more people than just us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sluongng sluongng force-pushed the sluongng/rules-go-47.0 branch from 1f6b176 to 600d4aa Compare April 29, 2024 09:46
@sluongng sluongng merged commit ee29cfb into master Apr 30, 2024
@sluongng sluongng deleted the sluongng/rules-go-47.0 branch April 30, 2024 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants