Skip to content

Add kernel.Task.BlockFD[WithDeadline]. #8044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

copybara-service[bot]
Copy link

@copybara-service copybara-service bot commented Oct 3, 2022

Add kernel.Task.BlockFD[WithDeadline].

Before this CL, using kernel.Task.Block* to wait for host FD readiness
requires going through fdnotifier, resulting in a significant amount of
overhead. As an example, consider a blocking application recvmsg on a
hostinet socket that blocks:

  • The task goroutine invokes recvmsg and gets EAGAIN.

  • The task goroutine heap-allocates a waiter.Entry and a channel of
    waiter.EventMask, and invokes epoll_ctl to add the socket FD to
    fdnotifier's epoll FD.

  • The task goroutine invokes recvmsg and gets EAGAIN, again.

  • The task goroutine blocks in Go (on the channel select in
    kernel.Task.block). If the thread that was running the task goroutine can
    find idle goroutines to run, then it does so; otherwise, it invokes
    futex(FUTEX_WAIT) to block in the host.

    Note that the vast majority of the sentry's "work" consists of executing
    application code, during which the corresponding task goroutines appear to
    the Go scheduler to be blocked in host syscalls; furthermore, time that is
    spent executing sentry code (in Go) is overhead relative to the application's
    execution. Consequently, the sentry has relatively little Go code to execute
    and is generally optimized to have less, making this tradeoff less favorable
    than in (presumably) more typical Go programs.

  • When the socket FD becomes readable, fdnotifier's goroutine returns from
    epoll_wait and wakes the task goroutine, usually invoking
    futex(FUTEX_WAKE) to wake another thread. It then yields control of its
    thread to other goroutines, improving wakeup-to-execution latency for the
    task goroutine.

    The futex(FUTEX_WAKE) is skipped if any of the following are true:

    • GOMAXPROCS threads are already executing goroutines. For reasons
      described above, we expect this to occur infrequently.

    • At least one already-running thread is in the "spinning" state, because it
      was itself recently woken but has not yet started executing goroutines.

    • At least one already-running thread is in the "spinning" state, because it
      recently ran out of goroutines to run and is busy-polling before going to
      sleep.

    A "spinning" thread stops spinning either because it successfully busy-polls
    for an idle goroutine to run, or it times out while busy-polling in the
    latter case; in the former case the thread usually invokes
    futex(FUTEX_WAKE) to wake another thread as described above, and in the
    latter case the thread invokes futex(FUTEX_WAIT) to go to sleep.

  • The task goroutine invokes recvmsg and succeeds.

  • The task goroutine invokes epoll_ctl to remove the socket FD from
    fdnotifier's epoll FD.

This CL demonstrates how fdnotifier may be replaced by making host syscalls
from task goroutine context. After this CL, after per-thread initialization
(sigprocmask), the same scenario results in:

  • The task goroutine invokes recvmsg and gets EAGAIN.

  • The task goroutine invokes ppoll on the host FD, which returns when the
    socket FD becomes available.

    The Go runtime maintains a thread called "sysmon" which runs periodically.
    When this thread determines that another thread has been blocked in a host
    syscall for "long enough" (20-40us + slack) and there are idle goroutines to
    run, it steals that thread's runqueue and invokes futex(FUTEX_WAKE) to wake
    another thread to run the stolen runqueue.

  • The task goroutine invokes recvmsg and succeeds.

For now, this functionality is only used in hostinet where socket methods are
responsible for blocking; applying it more generally (e.g. to read(2) from
hostinet sockets) requires additional work to move e.g. read(2) blocking from
//pkg/sentry/syscalls/linux into file description implementations.

Some of the overheads before this CL are tractable without removing fdnotifier.
The sleep package only requires one allocation - of sleep.Waker - per
registration, and the syncevent package requires none. EventRegister can
return the last known readiness mask to avoid the second recvmsg. However,
the interactions with the Go runtime - and in particular the many FUTEX_WAKEs
we incur when waking goroutines due to ready() -> wakep() -> schedule() -> resetspinning() -> wakep() - are not. The leading alternative solutions to the
same problem are sleep.Sleeper.AssertAndFetch and "change the Go runtime".
The former is very dubiously safe; it works by transiently lying about
runtime.sched.nmspinning, a global variable, and not calling
runtime.resetspinning() when it stops lying, so side effects start at
"runtime.wakep() is disabled globally rather than only on the caller's
thread" and go from there. The latter is dubiously tractable for reasons
including the atypicality of the sentry described above, though see
golang/go#54622.

@copybara-service copybara-service bot added the exported Issue was exported automatically label Oct 3, 2022
Before this CL, using `kernel.Task.Block*` to wait for host FD readiness
requires going through fdnotifier, resulting in a significant amount of
overhead. As an example, consider a blocking application `recvmsg` on a
hostinet socket that blocks:

- The task goroutine invokes `recvmsg` and gets `EAGAIN`.

- The task goroutine heap-allocates a `waiter.Entry` and a channel of
  `waiter.EventMask`, and invokes `epoll_ctl` to add the socket FD to
  fdnotifier's epoll FD.

- The task goroutine invokes `recvmsg` and gets `EAGAIN`, again.

- The task goroutine blocks in Go (on the channel select in
  `kernel.Task.block`). If the thread that was running the task goroutine can
  find idle goroutines to run, then it does so; otherwise, it invokes
  `futex(FUTEX_WAIT)` to block in the host.

  Note that the vast majority of the sentry's "work" consists of executing
  application code, during which the corresponding task goroutines appear to
  the Go scheduler to be blocked in host syscalls; furthermore, time that *is*
  spent executing sentry code (in Go) is overhead relative to the application's
  execution. Consequently, the sentry has relatively little Go code to execute
  and is generally optimized to have less, making this tradeoff less favorable
  than in (presumably) more typical Go programs.

- When the socket FD becomes readable, fdnotifier's goroutine returns from
  `epoll_wait` and wakes the task goroutine, usually invoking
  `futex(FUTEX_WAKE)` to wake another thread. It then yields control of its
  thread to other goroutines, improving wakeup-to-execution latency for the
  task goroutine.

  The `futex(FUTEX_WAKE)` is skipped if any of the following are true:

  - `GOMAXPROCS` threads are already executing goroutines. For reasons
    described above, we expect this to occur infrequently.

  - At least one already-running thread is in the "spinning" state, because it
    was itself recently woken but has not yet started executing goroutines.

  - At least one already-running thread is in the "spinning" state, because it
    recently ran out of goroutines to run and is busy-polling before going to
    sleep.

  A "spinning" thread stops spinning either because it successfully busy-polls
  for an idle goroutine to run, or it times out while busy-polling in the
  latter case; in the former case the thread usually invokes
  `futex(FUTEX_WAKE)` to wake *another* thread as described above, and in the
  latter case the thread invokes `futex(FUTEX_WAIT)` to go to sleep.

- The task goroutine invokes `recvmsg` and succeeds.

- The task goroutine invokes `epoll_ctl` to remove the socket FD from
  fdnotifier's epoll FD.

This CL demonstrates how fdnotifier may be replaced by making host syscalls
from task goroutine context. After this CL, after per-thread initialization
(`sigprocmask`), the same scenario results in:

- The task goroutine invokes `recvmsg` and gets `EAGAIN`.

- The task goroutine invokes `ppoll` on the host FD, which returns when the
  socket FD becomes available.

  The Go runtime maintains a thread called "sysmon" which runs periodically.
  When this thread determines that another thread has been blocked in a host
  syscall for "long enough" (20-40us + slack) and there are idle goroutines to
  run, it steals that thread's runqueue and invokes `futex(FUTEX_WAKE)` to wake
  another thread to run the stolen runqueue.

- The task goroutine invokes `recvmsg` and succeeds.

For now, this functionality is only used in hostinet where socket methods are
responsible for blocking; applying it more generally (e.g. to `read(2)` from
hostinet sockets) requires additional work to move e.g. `read(2)` blocking from
`//pkg/sentry/syscalls/linux` into file description implementations.

Some of the overheads before this CL are tractable without removing fdnotifier.
The `sleep` package only requires one allocation - of `sleep.Waker` - per
registration, and the `syncevent` package requires none. `EventRegister` can
return the last known readiness mask to avoid the second `recvmsg`. However,
the interactions with the Go runtime - and in particular the many `FUTEX_WAKE`s
we incur when waking goroutines due to `ready() -> wakep() -> schedule() ->
resetspinning() -> wakep()` - are not. The leading alternative solutions to the
same problem are `sleep.Sleeper.AssertAndFetch` and "change the Go runtime".
The former is very dubiously safe; it works by transiently lying about
`runtime.sched.nmspinning`, a global variable, and not calling
`runtime.resetspinning()` when it stops lying, so side effects start at
"`runtime.wakep()` is disabled globally rather than only on the caller's
thread" and go from there. The latter is dubiously tractable for reasons
including the atypicality of the sentry described above, though see
golang/go#54622.

PiperOrigin-RevId: 478129654
@github-actions
Copy link

A friendly reminder that this PR had no activity for 120 days.

@github-actions github-actions bot added the stale-pr This PR has not been updated in 120 days. label Sep 13, 2023
Copy link

This PR has been closed due to lack of activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-closed exported Issue was exported automatically stale-pr This PR has not been updated in 120 days.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant