-
Notifications
You must be signed in to change notification settings - Fork 18k
syscall: Setgroups may hang in go 1.16+ when not using cgo #50113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
Assign this to me. I'll try to figure out what is going on. |
First, I'll have to figure out the docker thing - debugging in that environment might take some setup on my part. |
Based on the In that and all prior versions of Go, calling @ianlancetaylor gives the best advice for restructuring this code: that the As to the code as it stands, it looks like this code is trying to work with two models of privilege at the same time.
If you really want to roll your own with |
I'm sorry that I haven't set CGO_ENABLED=0 for this go env command. This problem won't happen while CGO_ENABLED=1. You can see I set CGO_ENABLED=0 in the docker build, which will cause this hanging problem. |
This problem also happened in the latest go1.17.5. Also, I've tried just now, go1.16.12 has also hang. |
The |
@AndrewGMorgan I've assigned this issue to you per your request. |
There is a great deal of code in this combined system. I've been trying to isolate the code actually tripping whatever the issue is and build a minimal reproduction by skipping lots of it. If I remove the pty stuff and hard code the client to not request
|
Yes, it seems pty may influence the behavior of Setgroups, but I don't know why |
How reproducible is this failure mode? I'm not running under Docker since I want to debug in a familiar environment. Is this bug report claiming that Docker is a necessary ingredient to the failure mode? Are you 100% sure they are not locking up somewhere in the When you say it reproduces with go1.17.5 #50113 (comment) I wonder if you could do the following while in the locked-up state?
I re-enabled the
In your write-up, you say:
So, since things work for me when I run with golang's default |
FWIW When I ran |
I managed to reproduce a hang in a manner that I can debug a bit. It appears to involve the
|
Yes, this function may influence the behavior. But I don't how why this function may due to the hanging problem |
Thanks for this bug report and the reproducer. Some code spelunking and web surfing later, this looks like a deadlock issue with blocking read system calls. It seems to share some underlying similarity to #38618 for example. However, the victim-vector here is that the It appears that the core of the issue is that This user code workaround is not that satisfying as a solution because it leaves open the possibility that the all threads syscall mechanism can deadlock in other situations like this. Hopefully, however, it can unblock (no pun intended!) your work. I'd also urge you to reconsider using the fragile single-OS thread I'm next going to explore if I can create a more minimal reproducer for this deadlock and then explore the patch @ianlancetaylor briefly suggested in #38618 which seems like it will cover more corner cases than this user code workaround.
|
Perhaps a more elegant workaround is to completely avoid the need to ever call While we still inline the pseudo terminal setup to avoid the blocking triggered by
|
It seems like a general problem if a blocking system call can block |
The problem with delaying a syscall like this it that while this thread is blocked there may be another call to something that also needs to run a privileged operation on all threads. Typically, these sorts of things come in two or more back to back privilege manipulation sequences - each building on the last. I think we'd end up needing to manage a queue of pending changes per thread for that to have a hope. Also, there are some strange kernel ABI issues with stracing and other things when the privilege of a process' threads get out of sync. I don't much like the complexity of augmenting a signal but it is unclear to me how else to break out of the syscall. First, going to develop a simpler test case. Then I'll explore ways to address this. We can see how minimal a fix develops. Fortunately, there is a work around for this case. Hopefully, that can be adapted for any others that emerge while investigating... |
Here is a minimal reproducer. No privilege needed, no non-standard packages. It needs to be compiled
|
This change to go (at HEAD) makes that minimal test run without deadlocking:
|
The original code runs for a while with the above patch to go, but eventually crashes. I'm still investigating. |
I think we do want something like the patch in #50113 (comment). The ability for a blocking system call anywhere in the program to block execution is (Some system calls may not even be expected to block indefinitely, but expect to be woken by an action on another thread. But since we have stopped the world, that action never occurs and this thread remains blocked forever, causing a deadlock). cc @aclements |
Change https://golang.org/cl/381534 mentions this issue: |
Re point 3 of #50113 (comment) there is a corner case that I can't quite get my head around. Namely, when the code is racing thread creation. That is, when the |
For the above cl, I've included a slightly more stressful variant of the minimal test case #50113 (comment) . I've also validated that it the deadlocking workload pointed to at the top of this bug (https://github.com/weixiao-huang/golang-setgroups-hang) also works without deadlock when this change is applied. |
@gopherbot, please backport to Go 1.17 |
@gopherbot, please backport to Go 1.16 |
Backport issue(s) opened: #50976 (for 1.16). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
@gopherbot, please backport to Go 1.17 |
Change https://golang.org/cl/383434 mentions this issue: |
1.17 backport issue #51077 |
Change https://go.dev/cl/383996 mentions this issue: |
Change https://go.dev/cl/383999 mentions this issue: |
syscall_runtime_doAllThreadsSyscall is only used on Linux. In preparation of a follow-up CL that will modify the function to use other Linux-only functions, move it to os_linux.go with no changes. For #50113. Change-Id: I348b6130038603aa0a917be1f1debbca5a5a073f Reviewed-on: https://go-review.googlesource.com/c/go/+/383996 Trust: Michael Pratt <[email protected]> Reviewed-by: Andrew G. Morgan <[email protected]> Reviewed-by: Austin Clements <[email protected]> Run-TryBot: Austin Clements <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
Add a generic syscall package for use by the runtime. Eventually we'd like to clean up system calls in the runtime to use more code generation and be moved out of the main runtime package. The implementations of the assembly functions are based on copies of syscall.RawSyscall6, modified slightly for more consistency between arches. e.g., renamed trap to num, always set syscall num register first. For now, this package is just the bare minimum needed for doAllThreadsSyscall to make an arbitrary syscall. For #51087. For #50113. Change-Id: Ibecb5e6303279ce15286759e1cd6a2ddc52f7c72 Reviewed-on: https://go-review.googlesource.com/c/go/+/383999 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Austin Clements <[email protected]>
We ended up with:
The Go team has opted to abandon the current implementation (leaving this present bug in 1.16, 1.17), and adopt the substantial rewrite in 1.18. For this present issue, for earlier releases, the following options are available:
|
The CGO_ENABLED=0 failure mode is discussed in: golang/go#50113 At the present time, this only passes when the psx package is compiled CGO_ENABLED=1. The problem being that a blocking read cannot be interrupted by the CGO_ENABLED=0 build of package "psx". It does not deadlock when compiled CGO_ENABLED=1 because the psx signal wakes the reading thread up back into user space. Signed-off-by: Andrew G. Morgan <[email protected]>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes, this problem will also exist in go 1.17.5, but not exist in go 1.15.x.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
See https://github.com/weixiao-huang/golang-setgroups-hang
What did you expect to see?
client --key-path /.launch/key --server=localhost:2222
should not hang while compiled by golang 1.16.x and 1.17.xWhat did you see instead?
client --key-path /.launch/key --server=localhost:2222
will hang while compiled by golang 1.16.x and 1.17.xThe text was updated successfully, but these errors were encountered: