Skip to content

Fix/vsock reset race#5882

Merged
JackThomson2 merged 9 commits into
firecracker-microvm:mainfrom
JackThomson2:fix/vsock_reset_race
May 28, 2026
Merged

Fix/vsock reset race#5882
JackThomson2 merged 9 commits into
firecracker-microvm:mainfrom
JackThomson2:fix/vsock_reset_race

Conversation

@JackThomson2

Copy link
Copy Markdown
Contributor

There's a certain race which can happen on vsock on resume, with a storm of connections going through which can arrive before the device reset has completed leading to dropped connections.

Update FC to now block these pending connections until the guest has acked that the reset has been completed.

We were allocating an empty vec every call anyway, so no new allocations added other than when we fill it

Reason

...

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@codecov

codecov Bot commented May 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 76.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.99%. Comparing base (eaa6239) to head (a5ca64c).

Files with missing lines Patch % Lines
src/vmm/src/devices/virtio/queue.rs 57.14% 3 Missing ⚠️
src/vmm/src/devices/virtio/vsock/event_handler.rs 75.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5882      +/-   ##
==========================================
+ Coverage   82.88%   82.99%   +0.11%     
==========================================
  Files         277      277              
  Lines       30086    30106      +20     
==========================================
+ Hits        24937    24987      +50     
+ Misses       5149     5119      -30     
Flag Coverage Δ
5.10-m5n.metal 83.30% <76.00%> (+0.12%) ⬆️
5.10-m6a.metal 82.65% <76.00%> (+0.12%) ⬆️
5.10-m6g.metal 79.94% <76.00%> (+0.12%) ⬆️
5.10-m6i.metal 83.30% <76.00%> (+0.11%) ⬆️
5.10-m7a.metal-48xl 82.64% <76.00%> (+0.12%) ⬆️
5.10-m7g.metal 79.94% <76.00%> (+0.12%) ⬆️
5.10-m7i.metal-24xl 83.28% <76.00%> (+0.12%) ⬆️
5.10-m7i.metal-48xl 83.27% <76.00%> (+0.11%) ⬆️
5.10-m8g.metal-24xl 79.94% <76.00%> (+0.12%) ⬆️
5.10-m8g.metal-48xl 79.94% <76.00%> (+0.12%) ⬆️
5.10-m8i.metal-48xl 83.27% <76.00%> (+0.12%) ⬆️
5.10-m8i.metal-96xl 83.28% <76.00%> (+0.12%) ⬆️
6.1-m5n.metal 83.33% <76.00%> (+0.12%) ⬆️
6.1-m6a.metal 82.67% <76.00%> (+0.12%) ⬆️
6.1-m6g.metal 79.94% <76.00%> (+0.12%) ⬆️
6.1-m6i.metal 83.33% <76.00%> (+0.12%) ⬆️
6.1-m7a.metal-48xl 82.67% <76.00%> (+0.13%) ⬆️
6.1-m7g.metal 79.94% <76.00%> (+0.12%) ⬆️
6.1-m7i.metal-24xl 83.34% <76.00%> (+0.12%) ⬆️
6.1-m7i.metal-48xl 83.34% <76.00%> (+0.11%) ⬆️
6.1-m8g.metal-24xl 79.94% <76.00%> (+0.12%) ⬆️
6.1-m8g.metal-48xl 79.94% <76.00%> (+0.12%) ⬆️
6.1-m8i.metal-48xl 83.35% <76.00%> (+0.12%) ⬆️
6.1-m8i.metal-96xl 83.34% <76.00%> (+0.12%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JackThomson2 JackThomson2 force-pushed the fix/vsock_reset_race branch from a16431d to d7a3b0b Compare May 13, 2026 11:14
@JackThomson2 JackThomson2 added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label May 14, 2026

@Manciukic Manciukic left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly LGTM, just a few minor issues.

Comment thread src/vmm/src/devices/virtio/vsock/device.rs Outdated
Comment thread src/vmm/src/devices/virtio/vsock/device.rs Outdated
Comment thread src/vmm/src/devices/virtio/vsock/device.rs Outdated
Comment thread tests/integration_tests/functional/test_vsock.py
Comment thread src/vmm/src/devices/virtio/vsock/event_handler.rs
Comment thread tests/integration_tests/functional/test_vsock.py Outdated
Comment thread tests/integration_tests/functional/test_vsock.py Outdated
@JackThomson2 JackThomson2 force-pushed the fix/vsock_reset_race branch from d2bec53 to afcfd09 Compare May 15, 2026 14:13
@JamesC1305 JamesC1305 self-requested a review May 18, 2026 09:18
Comment thread src/vmm/src/devices/virtio/vsock/device.rs
Manciukic
Manciukic previously approved these changes May 18, 2026
Comment thread src/vmm/src/devices/virtio/vsock/event_handler.rs
Comment thread tests/integration_tests/functional/test_vsock.py Outdated
Comment thread src/vmm/src/devices/virtio/vsock/device.rs
@JackThomson2 JackThomson2 force-pushed the fix/vsock_reset_race branch 5 times, most recently from e6a4d24 to 54173cb Compare May 21, 2026 14:12

@Manciukic Manciukic left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one clarification needed on the enabling of notifications on the evq.

Comment thread src/vmm/src/devices/virtio/vsock/device.rs Outdated
@JackThomson2 JackThomson2 force-pushed the fix/vsock_reset_race branch 2 times, most recently from 25f0b12 to 0cba3a6 Compare May 26, 2026 13:36

@JamesC1305 JamesC1305 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments. The potential for unsoundness one is the one I'd like to clarify. The other two are just a suggestion and a nit (neither of which are blocking)

Comment thread tests/framework/utils_vsock.py
Comment thread tests/integration_tests/functional/test_vsock.py
Comment thread src/vmm/src/devices/virtio/queue.rs
Comment thread tests/integration_tests/functional/test_vsock.py Outdated
Add `__enter__`/`__exit__` to `HostEchoWorker` so callers can register
workers with `contextlib.ExitStack` and have their UDS sockets closed
deterministically on any exit path - including assertion failures and
exceptions raised mid-loop. The exit hook joins the worker thread if
still alive before closing the socket.

Convert the existing call sites in `check_host_connections` and
`test_vsock.py` to the `ExitStack` pattern. No behavior change on the
happy path; the only observable difference is that a failure no longer
leaks UDS sockets (and the `socat` echo server file descriptors they
hold open inside the guest).

Signed-off-by: Jack Thomson <jackabt@amazon.com>
Drop the leading underscore from `_vsock_connect_to_guest` so other
test modules can call it directly to open a vsock connection to the
guest echo server. Also harden the helper to close the socket if
either `connect()` or the CONNECT/ACK handshake raises - so callers
never receive a half-open descriptor that they then have to remember
to close on error paths.

The returned socket is still a plain `socket.socket`, so it works as
a context manager out of the box and can be registered with
`contextlib.ExitStack` for deterministic cleanup.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
`test_vsock_transport_reset_g2h` spawns a host-side `socat` listener
with `subprocess.Popen` and kills it at the end of each loop iteration.
If any of the assertions or API calls between `Popen` and the cleanup
raise, the loop variable rebinds on the next iteration and the
previous `socat` is orphaned - it keeps the listener bound to the
chroot socket path and is reported by the post-test process leak
check, masking the real assertion failure.

Wrap the iteration body in `try/finally` so the `host_socat.kill()`
and `new_vm.kill()` calls always run, even on failure. No happy-path
behavior change.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
Add a method to unconditionally arm `avail_event` at the driver's
current `avail.idx`, so the next driver publish to the avail ring is
guaranteed to produce a notification when EVENT_IDX is negotiated.

This complements the existing `try_enable_notification`, which is
designed for the drain-then-arm pattern used by descriptor-consuming
queues: it only arms when the queue is empty, so it can pair with a
subsequent pop of any descriptors the driver added concurrently.

Some queues are not normally drained by the device — the vsock event
virtqueue is the motivating case: the device only writes to it, and
the driver replenishes descriptors as the device consumes them. There
the next driver publish, not the next-after-drain one, is what the
device must learn about. `try_enable_notification` does not fit:

  * it returns early when `len() != 0`, leaving `avail_event`
    unchanged, and
  * it sets `avail_event = next_avail`, which on a partially-popped
    queue is behind `avail.idx` and would suppress the upcoming
    publish anyway.

The new method skips the drain check and writes the current
`avail.idx` directly. A Release fence orders the avail_event store
before any subsequent used-ring update or interrupt the caller
delivers; the caller is responsible for ensuring the driver does not
add to the avail ring until it has observed that update, since this
path does not recheck `avail.idx`.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
The event virtqueue is only ever written by the device; avail
descriptors are never popped except for an event publish, so
`avail_event` is never advanced through the normal
`pop_or_enable_notification` path. With VIRTIO_RING_F_EVENT_IDX, that
leaves `avail_event = 0` and any guest kick after `avail.idx` first
exceeds 1 evaluates `vring_need_event(0, new, old)` to false and is
suppressed.

The pre-existing logic does not depend on receiving the post-event
refill kick, so the suppression has no observable effect today. The
next commit gates RX delivery on that kick, at which point the
suppression becomes a deadlock. Fix it at the source: after publishing
the TRANSPORT_RESET event into the used ring, call the new
`Queue::enable_notification` to set `avail_event` to the current
`avail.idx` so the driver's refill of the consumed head reliably
notifies the host. Per virtio 1.2 §5.10.6.3, the driver SHOULD
replenish event virtqueue buffers promptly, so we expect this
notification to arrive shortly.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
After a snapshot restore, `kick()` signals the event virtqueue so the
guest driver processes the queued VIRTIO_VSOCK_EVENT_TRANSPORT_RESET
event, which iterates the connected-socket list and resets every
transport-bound socket. With virtio-pci + MSI-X each virtqueue has its
own interrupt vector, so the RX vq IRQ and EVQ IRQ can be delivered to
different vCPUs and the guest's `virtio_transport_rx_work` and
`virtio_transport_event_work` end up running concurrently on different
per-CPU workers.

That permits the following interleaving:

  rx_work:    recv_listen
              -> vsock_insert_connected(child)        // takes lock
  event_work: vsock_for_each_connected_socket(reset)  // takes lock,
                                                      // sees the
                                                      // child

The child is set to TCP_CLOSE/ECONNRESET *after* the virtio handshake
has already returned VSOCK_OP_RESPONSE to the host. The host then sends
data on what it believes is an established connection, the guest looks
up the (now closed) child and replies with VSOCK_OP_RST -- the symptom
seen in production. With virtio-mmio there is a single shared interrupt
so both work_structs queue onto the same per-CPU worker and run
strictly in FIFO order, hiding the race.

Close the window from the device side: while a TRANSPORT_RESET is
in-flight, do not deliver any RX packets. The guest driver always
re-arms the event vq (and so issues a virtqueue_kick) at the end of
`virtio_transport_event_work`, after the connected-socket sweep is
complete. We use that kick as the "event handled" signal: clear the
gate in `handle_evq_event` and drain any RX that accumulated in the
muxer while the gate was up.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
Add `test_vsock_post_restore_connect_storm`, a regression test for the
race fixed in the previous commit. Without the fix, on virtio-pci +
MSI-X the guest's `virtio_transport_rx_work` and
`virtio_transport_event_work` can run on different vCPUs; a fresh
CONNECT inserted into the connected-socket table just before the
TRANSPORT_RESET sweep iterates it gets reset to TCP_CLOSE/ECONNRESET
*after* the device has already returned OP_RESPONSE to the host. The
host then either sees a truncated read or a ConnectionResetError on a
connection it just opened.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
Add unit tests for the vsock device testing the new behaviour with the
locking on reset.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
@JackThomson2 JackThomson2 force-pushed the fix/vsock_reset_race branch from 0cba3a6 to f99114c Compare May 27, 2026 13:15
JamesC1305
JamesC1305 previously approved these changes May 27, 2026
Add changelog item for the vsock reset race fix.

Signed-off-by: Jack Thomson <jackabt@amazon.com>

@micz010 micz010 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, reviewed only the changelog change

Comment thread tests/framework/utils_vsock.py
Comment thread src/vmm/src/devices/virtio/queue.rs
@JackThomson2 JackThomson2 merged commit e90cef8 into firecracker-microvm:main May 28, 2026
8 of 9 checks passed
@JackThomson2 JackThomson2 deleted the fix/vsock_reset_race branch May 28, 2026 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Awaiting review Indicates that a pull request is ready to be reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants