-
Notifications
You must be signed in to change notification settings - Fork 18k
os/exec: syscall.forkExec hang when spawning multiple processes concurrently on darwin
#61080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I don't understand the test case in the GitHub repository. It starts a bunch of processes with a call The program also hangs when built with Go 1.20 or Go 1.19. Are you sure that the GitHub repo is a correct example of the problem? |
Apologies, you are correct. I had trimmed the repro to try to make it concise by removing the actual communication between the processes since the hanging I'm talking about happens before that anyways. But I ended up creating a bad example that always hangs. I have updated the code and the original issue description to reflect the correct repro. |
Hrm. I wonder if this is related to the other (Those failures are on |
Notably, that stack trace seems like a solid match for the one observed in #59892 (comment). |
@ianlancetaylor, I think we need an exclusive lock on On systems with |
Marking as release-blocker because this appears to be a regression in Go 1.21 on a first-class platform. |
Change https://go.dev/cl/507355 mentions this issue: |
darwin
Change https://go.dev/cl/508755 mentions this issue: |
We cancel the Context to unblock the test as soon as all of the "exit" processes have completed. If that happens to occur before all of the "hang" processes have started, the Start calls may fail with context.Canceled. Since those errors are possible in normal operation of the test, ignore them. Fixes #61277. Updates #61080. Change-Id: I20db083ec89ca88eb085ceb2892b9f87f83705ac Reviewed-on: https://go-review.googlesource.com/c/go/+/508755 TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Bryan Mills <[email protected]> Auto-Submit: Bryan Mills <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>
…is not atomic In CL 421441, we changed syscall to allow concurrent calls to forkExec. On platforms that support the pipe2 syscall that is the right behavior, because pipe2 atomically opens the pipe with CLOEXEC already set. However, on platforms that do not support pipe2 (currently aix and darwin), syscall.forkExecPipe is not atomic, and the pipes do not initially have CLOEXEC set. If two calls to forkExec proceed concurrently, a pipe intended for one child process can be accidentally inherited by the other. If the process is long-lived, the pipe can be held open unexpectedly and prevent the parent process from reaching EOF reading the child's status from the pipe. Fixes golang#61080. Updates golang#23558. Updates golang#54162. Change-Id: I83edcc80674ff267a39d06260c5697c654ff5a4b Reviewed-on: https://go-review.googlesource.com/c/go/+/507355 TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Run-TryBot: Bryan Mills <[email protected]> Auto-Submit: Bryan Mills <[email protected]>
We cancel the Context to unblock the test as soon as all of the "exit" processes have completed. If that happens to occur before all of the "hang" processes have started, the Start calls may fail with context.Canceled. Since those errors are possible in normal operation of the test, ignore them. Fixes golang#61277. Updates golang#61080. Change-Id: I20db083ec89ca88eb085ceb2892b9f87f83705ac Reviewed-on: https://go-review.googlesource.com/c/go/+/508755 TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Bryan Mills <[email protected]> Auto-Submit: Bryan Mills <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes, it reproduces since this change.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
We have a code generator that we use at Uber that spawns up many concurrent child processes that communicate via stdin & stdout. While doing internal testing with Go1.21rc2, we noticed the code generator hanging. A very minimal runnable repro can be found in this repository: https://github.com/JacobOaks/Go1.21rc2-syscall.forkExec-hanging-repro.
Essentially, we are spinning up a bunch of external processes with stdin & stdout pipes concurrently. Something like (see link above for full repro):
Attaching delve to the hanging process, we notice the issue occurs in
cmd.Start
, wheresyscall.forkExec
seems to hang:This behavior is flaky and in our investigation, only appears on Go1.21rc2 on darwin-arm64.
git bisect
indicated this change to be the culprit.What did you expect to see?
I would expect the program in the linked repro to not hang, as in Go 1.20.
What did you see instead?
It occasionally hangs, see above.
The text was updated successfully, but these errors were encountered: