-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add kernel.Task.BlockFD[WithDeadline]
.
#8044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Before this CL, using `kernel.Task.Block*` to wait for host FD readiness requires going through fdnotifier, resulting in a significant amount of overhead. As an example, consider a blocking application `recvmsg` on a hostinet socket that blocks: - The task goroutine invokes `recvmsg` and gets `EAGAIN`. - The task goroutine heap-allocates a `waiter.Entry` and a channel of `waiter.EventMask`, and invokes `epoll_ctl` to add the socket FD to fdnotifier's epoll FD. - The task goroutine invokes `recvmsg` and gets `EAGAIN`, again. - The task goroutine blocks in Go (on the channel select in `kernel.Task.block`). If the thread that was running the task goroutine can find idle goroutines to run, then it does so; otherwise, it invokes `futex(FUTEX_WAIT)` to block in the host. Note that the vast majority of the sentry's "work" consists of executing application code, during which the corresponding task goroutines appear to the Go scheduler to be blocked in host syscalls; furthermore, time that *is* spent executing sentry code (in Go) is overhead relative to the application's execution. Consequently, the sentry has relatively little Go code to execute and is generally optimized to have less, making this tradeoff less favorable than in (presumably) more typical Go programs. - When the socket FD becomes readable, fdnotifier's goroutine returns from `epoll_wait` and wakes the task goroutine, usually invoking `futex(FUTEX_WAKE)` to wake another thread. It then yields control of its thread to other goroutines, improving wakeup-to-execution latency for the task goroutine. The `futex(FUTEX_WAKE)` is skipped if any of the following are true: - `GOMAXPROCS` threads are already executing goroutines. For reasons described above, we expect this to occur infrequently. - At least one already-running thread is in the "spinning" state, because it was itself recently woken but has not yet started executing goroutines. - At least one already-running thread is in the "spinning" state, because it recently ran out of goroutines to run and is busy-polling before going to sleep. A "spinning" thread stops spinning either because it successfully busy-polls for an idle goroutine to run, or it times out while busy-polling in the latter case; in the former case the thread usually invokes `futex(FUTEX_WAKE)` to wake *another* thread as described above, and in the latter case the thread invokes `futex(FUTEX_WAIT)` to go to sleep. - The task goroutine invokes `recvmsg` and succeeds. - The task goroutine invokes `epoll_ctl` to remove the socket FD from fdnotifier's epoll FD. This CL demonstrates how fdnotifier may be replaced by making host syscalls from task goroutine context. After this CL, after per-thread initialization (`sigprocmask`), the same scenario results in: - The task goroutine invokes `recvmsg` and gets `EAGAIN`. - The task goroutine invokes `ppoll` on the host FD, which returns when the socket FD becomes available. The Go runtime maintains a thread called "sysmon" which runs periodically. When this thread determines that another thread has been blocked in a host syscall for "long enough" (20-40us + slack) and there are idle goroutines to run, it steals that thread's runqueue and invokes `futex(FUTEX_WAKE)` to wake another thread to run the stolen runqueue. - The task goroutine invokes `recvmsg` and succeeds. For now, this functionality is only used in hostinet where socket methods are responsible for blocking; applying it more generally (e.g. to `read(2)` from hostinet sockets) requires additional work to move e.g. `read(2)` blocking from `//pkg/sentry/syscalls/linux` into file description implementations. Some of the overheads before this CL are tractable without removing fdnotifier. The `sleep` package only requires one allocation - of `sleep.Waker` - per registration, and the `syncevent` package requires none. `EventRegister` can return the last known readiness mask to avoid the second `recvmsg`. However, the interactions with the Go runtime - and in particular the many `FUTEX_WAKE`s we incur when waking goroutines due to `ready() -> wakep() -> schedule() -> resetspinning() -> wakep()` - are not. The leading alternative solutions to the same problem are `sleep.Sleeper.AssertAndFetch` and "change the Go runtime". The former is very dubiously safe; it works by transiently lying about `runtime.sched.nmspinning`, a global variable, and not calling `runtime.resetspinning()` when it stops lying, so side effects start at "`runtime.wakep()` is disabled globally rather than only on the caller's thread" and go from there. The latter is dubiously tractable for reasons including the atypicality of the sentry described above, though see golang/go#54622. PiperOrigin-RevId: 478129654
b15f98e
to
a1c1542
Compare
A friendly reminder that this PR had no activity for 120 days. |
This PR has been closed due to lack of activity. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
auto-closed
exported
Issue was exported automatically
stale-pr
This PR has not been updated in 120 days.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add
kernel.Task.BlockFD[WithDeadline]
.Before this CL, using
kernel.Task.Block*
to wait for host FD readinessrequires going through fdnotifier, resulting in a significant amount of
overhead. As an example, consider a blocking application
recvmsg
on ahostinet socket that blocks:
The task goroutine invokes
recvmsg
and getsEAGAIN
.The task goroutine heap-allocates a
waiter.Entry
and a channel ofwaiter.EventMask
, and invokesepoll_ctl
to add the socket FD tofdnotifier's epoll FD.
The task goroutine invokes
recvmsg
and getsEAGAIN
, again.The task goroutine blocks in Go (on the channel select in
kernel.Task.block
). If the thread that was running the task goroutine canfind idle goroutines to run, then it does so; otherwise, it invokes
futex(FUTEX_WAIT)
to block in the host.Note that the vast majority of the sentry's "work" consists of executing
application code, during which the corresponding task goroutines appear to
the Go scheduler to be blocked in host syscalls; furthermore, time that is
spent executing sentry code (in Go) is overhead relative to the application's
execution. Consequently, the sentry has relatively little Go code to execute
and is generally optimized to have less, making this tradeoff less favorable
than in (presumably) more typical Go programs.
When the socket FD becomes readable, fdnotifier's goroutine returns from
epoll_wait
and wakes the task goroutine, usually invokingfutex(FUTEX_WAKE)
to wake another thread. It then yields control of itsthread to other goroutines, improving wakeup-to-execution latency for the
task goroutine.
The
futex(FUTEX_WAKE)
is skipped if any of the following are true:GOMAXPROCS
threads are already executing goroutines. For reasonsdescribed above, we expect this to occur infrequently.
At least one already-running thread is in the "spinning" state, because it
was itself recently woken but has not yet started executing goroutines.
At least one already-running thread is in the "spinning" state, because it
recently ran out of goroutines to run and is busy-polling before going to
sleep.
A "spinning" thread stops spinning either because it successfully busy-polls
for an idle goroutine to run, or it times out while busy-polling in the
latter case; in the former case the thread usually invokes
futex(FUTEX_WAKE)
to wake another thread as described above, and in thelatter case the thread invokes
futex(FUTEX_WAIT)
to go to sleep.The task goroutine invokes
recvmsg
and succeeds.The task goroutine invokes
epoll_ctl
to remove the socket FD fromfdnotifier's epoll FD.
This CL demonstrates how fdnotifier may be replaced by making host syscalls
from task goroutine context. After this CL, after per-thread initialization
(
sigprocmask
), the same scenario results in:The task goroutine invokes
recvmsg
and getsEAGAIN
.The task goroutine invokes
ppoll
on the host FD, which returns when thesocket FD becomes available.
The Go runtime maintains a thread called "sysmon" which runs periodically.
When this thread determines that another thread has been blocked in a host
syscall for "long enough" (20-40us + slack) and there are idle goroutines to
run, it steals that thread's runqueue and invokes
futex(FUTEX_WAKE)
to wakeanother thread to run the stolen runqueue.
The task goroutine invokes
recvmsg
and succeeds.For now, this functionality is only used in hostinet where socket methods are
responsible for blocking; applying it more generally (e.g. to
read(2)
fromhostinet sockets) requires additional work to move e.g.
read(2)
blocking from//pkg/sentry/syscalls/linux
into file description implementations.Some of the overheads before this CL are tractable without removing fdnotifier.
The
sleep
package only requires one allocation - ofsleep.Waker
- perregistration, and the
syncevent
package requires none.EventRegister
canreturn the last known readiness mask to avoid the second
recvmsg
. However,the interactions with the Go runtime - and in particular the many
FUTEX_WAKE
swe incur when waking goroutines due to
ready() -> wakep() -> schedule() -> resetspinning() -> wakep()
- are not. The leading alternative solutions to thesame problem are
sleep.Sleeper.AssertAndFetch
and "change the Go runtime".The former is very dubiously safe; it works by transiently lying about
runtime.sched.nmspinning
, a global variable, and not callingruntime.resetspinning()
when it stops lying, so side effects start at"
runtime.wakep()
is disabled globally rather than only on the caller'sthread" and go from there. The latter is dubiously tractable for reasons
including the atypicality of the sentry described above, though see
golang/go#54622.