-
Notifications
You must be signed in to change notification settings - Fork 18k
io: Copy to a pipe prevents process exit (Go 1.24rc2 on Linux regression) #71375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
First, let's wait for kernel folks to confirm this. Thinking about a workaround, my best solution for now is to not use sendfile(2) on Linux when writing to a pipe. |
cc @golang/runtime |
Thanks for testing the RC and the great bug report! I am also curious to see what the kernel folks have to say. My immediate thought was that exit_group should never block indefinitely and anything causing it to do so is clearly a kernel bug. That said, Regardless, even if it is considered a kernel bug, we'll have to work around it for older kernels. It seems that the obvious options are:
Given how late we are in the release freeze, I'd lean towards (1) as the less risky option, and then we could consider applying a new version of the CL for 1.25. cc @panjf2000 |
I'll work on this. |
This comment has been minimized.
This comment has been minimized.
Change https://go.dev/cl/644015 mentions this issue: |
I spent more time thinking about this today. Thanks @kolyshkin for sending the fix CL https://go.dev/cl/644015. The fix seems fairly straightforward and not too concerning on its own, though I am concerned that we might find more (non-pipe) edge cases that break sendfile. That makes me continue to lean towards a revert, though I acknowledge that this bug took nearly 6mo to surface, so waiting another 6mo isn't guaranteed to uncover more bugs. Looking at the rationale of the original CL, if I understand correctly, most calls will use
It seems that pipe and socket copies would be the most valuable case here. We have to drop pipes, so that leaves sockets. Note that this change is only for All that said, I think I am in agreement that this feature is nice to have, but not critical enough to warrant a last minute fix, so revert still feels best to me. |
Change https://go.dev/cl/644895 mentions this issue: |
#71459 was a different report of blocking, i think on a pty? |
It's the same under the hood as the reproducer in this issue description, with Either https://go.dev/cl/644015 or https://go.dev/cl/644895 fixes it. |
Change https://go.dev/cl/644935 mentions this issue: |
Actually, CL 603295 was intended to handle the case of data transmission between regular files originally, the real "nice to have" case is pipe, I'd suggest we just drop anything other than regular files, like we did for Solaris: CL 605355, CL 606135. @prattmic |
Huh, interesting. What is the common case where we expect Is it cross-filesystem copies? Looking at the implementation again, it sounds like that despite cross-filesystem copy is theoretically allowed, very few filesystems opt-in: https://elixir.bootlin.com/linux/v6.12.11/source/fs/read_write.c#L1520 |
As a matter of fact, copy_file_range(2) has a messy history: https://lwn.net/Articles/846403/. Even today, this system call is hardly perfect, thus it's pragmatic to have sendfile(2) as its fallback. |
Given how close to the release we are, let's go with a revert to be safe: https://go.dev/cl/644895. I'd appreciate a look. The revert was messy in the test files due to the refactoring done for the Solaris and FreeBSD equivalent CLs, and I didn't want to revert those as well since they seem to be fine. The good news is that we plan to reopen the tree for 1.25 development this week or next week, so we can try again shortly. |
@kolyshkin The reproducer you provided is good with the revert. I'd appreciate if you could test runc itself at Go tip and make sure all seems well. |
@prattmic I've tested a recent git master (at commit 37f27fb) locally and it works fine, thanks! EDIT: fix commit id |
Change https://go.dev/cl/646415 mentions this issue: |
Go version
go version go1.24rc2 linux/amd64
Output of
go env
in your module/workspace:What did you do?
There is a regression in Go 1.24rc2 (git-bisect points to 1) caused by a (not yet confirmed) Linux kernel bug with sendmail(2)/splice(2), which I just reported 2.
In short, when sendfile(2) or splice(2) is used to copy data to a pipe, and another process is having the other end of this file, this prevents that other process from exit. You can get a short C repro from 2, and here's a Go repro:
What did you see happen?
When the above repro is run with go1.24rc2, the process is not exiting:
(and the process hangs here).
NOTE if you can't repro this, you probably pressed Enter in the terminal. Do not do anything with this terminal!.
In a different terminal, you can check the status of the child process (using the pid from the above output):
As you can see,
ps
thinks that the process is a kernel thread. This happens because the process is half-exited and some /proc/$PID/ entries (root
,cwd
, `exe) are no longer valid, like it is with kernel threads.Now, to see where the process is stuck:
What did you expect to see?
No stuck process.
With an older Go version, everything works as it should:
The text was updated successfully, but these errors were encountered: