Fix/vsock reset race by JackThomson2 · Pull Request #5882 · firecracker-microvm/firecracker

JackThomson2 · 2026-05-13T11:00:50Z

There's a certain race which can happen on vsock on resume, with a storm of connections going through which can arrive before the device reset has completed leading to dropped connections.

Update FC to now block these pending connections until the guest has acked that the reset has been completed.

We were allocating an empty vec every call anyway, so no new allocations added other than when we fill it

Reason

...

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

This functionality cannot be added in rust-vmm.

codecov · 2026-05-13T11:05:45Z

Codecov Report

❌ Patch coverage is 76.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.99%. Comparing base (eaa6239) to head (a5ca64c).

Files with missing lines	Patch %	Lines
src/vmm/src/devices/virtio/queue.rs	57.14%	3 Missing ⚠️
src/vmm/src/devices/virtio/vsock/event_handler.rs	75.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5882      +/-   ##
==========================================
+ Coverage   82.88%   82.99%   +0.11%     
==========================================
  Files         277      277              
  Lines       30086    30106      +20     
==========================================
+ Hits        24937    24987      +50     
+ Misses       5149     5119      -30

Flag	Coverage Δ
5.10-m5n.metal	`83.30% <76.00%> (+0.12%)`	⬆️
5.10-m6a.metal	`82.65% <76.00%> (+0.12%)`	⬆️
5.10-m6g.metal	`79.94% <76.00%> (+0.12%)`	⬆️
5.10-m6i.metal	`83.30% <76.00%> (+0.11%)`	⬆️
5.10-m7a.metal-48xl	`82.64% <76.00%> (+0.12%)`	⬆️
5.10-m7g.metal	`79.94% <76.00%> (+0.12%)`	⬆️
5.10-m7i.metal-24xl	`83.28% <76.00%> (+0.12%)`	⬆️
5.10-m7i.metal-48xl	`83.27% <76.00%> (+0.11%)`	⬆️
5.10-m8g.metal-24xl	`79.94% <76.00%> (+0.12%)`	⬆️
5.10-m8g.metal-48xl	`79.94% <76.00%> (+0.12%)`	⬆️
5.10-m8i.metal-48xl	`83.27% <76.00%> (+0.12%)`	⬆️
5.10-m8i.metal-96xl	`83.28% <76.00%> (+0.12%)`	⬆️
6.1-m5n.metal	`83.33% <76.00%> (+0.12%)`	⬆️
6.1-m6a.metal	`82.67% <76.00%> (+0.12%)`	⬆️
6.1-m6g.metal	`79.94% <76.00%> (+0.12%)`	⬆️
6.1-m6i.metal	`83.33% <76.00%> (+0.12%)`	⬆️
6.1-m7a.metal-48xl	`82.67% <76.00%> (+0.13%)`	⬆️
6.1-m7g.metal	`79.94% <76.00%> (+0.12%)`	⬆️
6.1-m7i.metal-24xl	`83.34% <76.00%> (+0.12%)`	⬆️
6.1-m7i.metal-48xl	`83.34% <76.00%> (+0.11%)`	⬆️
6.1-m8g.metal-24xl	`79.94% <76.00%> (+0.12%)`	⬆️
6.1-m8g.metal-48xl	`79.94% <76.00%> (+0.12%)`	⬆️
6.1-m8i.metal-48xl	`83.35% <76.00%> (+0.12%)`	⬆️
6.1-m8i.metal-96xl	`83.34% <76.00%> (+0.12%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Manciukic

mostly LGTM, just a few minor issues.

Manciukic

LGTM, just one clarification needed on the enabling of notifications on the evq.

JamesC1305

Few minor comments. The potential for unsoundness one is the one I'd like to clarify. The other two are just a suggestion and a nit (neither of which are blocking)

Add `__enter__`/`__exit__` to `HostEchoWorker` so callers can register workers with `contextlib.ExitStack` and have their UDS sockets closed deterministically on any exit path - including assertion failures and exceptions raised mid-loop. The exit hook joins the worker thread if still alive before closing the socket. Convert the existing call sites in `check_host_connections` and `test_vsock.py` to the `ExitStack` pattern. No behavior change on the happy path; the only observable difference is that a failure no longer leaks UDS sockets (and the `socat` echo server file descriptors they hold open inside the guest). Signed-off-by: Jack Thomson <jackabt@amazon.com>

Drop the leading underscore from `_vsock_connect_to_guest` so other test modules can call it directly to open a vsock connection to the guest echo server. Also harden the helper to close the socket if either `connect()` or the CONNECT/ACK handshake raises - so callers never receive a half-open descriptor that they then have to remember to close on error paths. The returned socket is still a plain `socket.socket`, so it works as a context manager out of the box and can be registered with `contextlib.ExitStack` for deterministic cleanup. Signed-off-by: Jack Thomson <jackabt@amazon.com>

`test_vsock_transport_reset_g2h` spawns a host-side `socat` listener with `subprocess.Popen` and kills it at the end of each loop iteration. If any of the assertions or API calls between `Popen` and the cleanup raise, the loop variable rebinds on the next iteration and the previous `socat` is orphaned - it keeps the listener bound to the chroot socket path and is reported by the post-test process leak check, masking the real assertion failure. Wrap the iteration body in `try/finally` so the `host_socat.kill()` and `new_vm.kill()` calls always run, even on failure. No happy-path behavior change. Signed-off-by: Jack Thomson <jackabt@amazon.com>

Add a method to unconditionally arm `avail_event` at the driver's current `avail.idx`, so the next driver publish to the avail ring is guaranteed to produce a notification when EVENT_IDX is negotiated. This complements the existing `try_enable_notification`, which is designed for the drain-then-arm pattern used by descriptor-consuming queues: it only arms when the queue is empty, so it can pair with a subsequent pop of any descriptors the driver added concurrently. Some queues are not normally drained by the device — the vsock event virtqueue is the motivating case: the device only writes to it, and the driver replenishes descriptors as the device consumes them. There the next driver publish, not the next-after-drain one, is what the device must learn about. `try_enable_notification` does not fit: * it returns early when `len() != 0`, leaving `avail_event` unchanged, and * it sets `avail_event = next_avail`, which on a partially-popped queue is behind `avail.idx` and would suppress the upcoming publish anyway. The new method skips the drain check and writes the current `avail.idx` directly. A Release fence orders the avail_event store before any subsequent used-ring update or interrupt the caller delivers; the caller is responsible for ensuring the driver does not add to the avail ring until it has observed that update, since this path does not recheck `avail.idx`. Signed-off-by: Jack Thomson <jackabt@amazon.com>

The event virtqueue is only ever written by the device; avail descriptors are never popped except for an event publish, so `avail_event` is never advanced through the normal `pop_or_enable_notification` path. With VIRTIO_RING_F_EVENT_IDX, that leaves `avail_event = 0` and any guest kick after `avail.idx` first exceeds 1 evaluates `vring_need_event(0, new, old)` to false and is suppressed. The pre-existing logic does not depend on receiving the post-event refill kick, so the suppression has no observable effect today. The next commit gates RX delivery on that kick, at which point the suppression becomes a deadlock. Fix it at the source: after publishing the TRANSPORT_RESET event into the used ring, call the new `Queue::enable_notification` to set `avail_event` to the current `avail.idx` so the driver's refill of the consumed head reliably notifies the host. Per virtio 1.2 §5.10.6.3, the driver SHOULD replenish event virtqueue buffers promptly, so we expect this notification to arrive shortly. Signed-off-by: Jack Thomson <jackabt@amazon.com>

After a snapshot restore, `kick()` signals the event virtqueue so the guest driver processes the queued VIRTIO_VSOCK_EVENT_TRANSPORT_RESET event, which iterates the connected-socket list and resets every transport-bound socket. With virtio-pci + MSI-X each virtqueue has its own interrupt vector, so the RX vq IRQ and EVQ IRQ can be delivered to different vCPUs and the guest's `virtio_transport_rx_work` and `virtio_transport_event_work` end up running concurrently on different per-CPU workers. That permits the following interleaving: rx_work: recv_listen -> vsock_insert_connected(child) // takes lock event_work: vsock_for_each_connected_socket(reset) // takes lock, // sees the // child The child is set to TCP_CLOSE/ECONNRESET *after* the virtio handshake has already returned VSOCK_OP_RESPONSE to the host. The host then sends data on what it believes is an established connection, the guest looks up the (now closed) child and replies with VSOCK_OP_RST -- the symptom seen in production. With virtio-mmio there is a single shared interrupt so both work_structs queue onto the same per-CPU worker and run strictly in FIFO order, hiding the race. Close the window from the device side: while a TRANSPORT_RESET is in-flight, do not deliver any RX packets. The guest driver always re-arms the event vq (and so issues a virtqueue_kick) at the end of `virtio_transport_event_work`, after the connected-socket sweep is complete. We use that kick as the "event handled" signal: clear the gate in `handle_evq_event` and drain any RX that accumulated in the muxer while the gate was up. Signed-off-by: Jack Thomson <jackabt@amazon.com>

Add `test_vsock_post_restore_connect_storm`, a regression test for the race fixed in the previous commit. Without the fix, on virtio-pci + MSI-X the guest's `virtio_transport_rx_work` and `virtio_transport_event_work` can run on different vCPUs; a fresh CONNECT inserted into the connected-socket table just before the TRANSPORT_RESET sweep iterates it gets reset to TCP_CLOSE/ECONNRESET *after* the device has already returned OP_RESPONSE to the host. The host then either sees a truncated read or a ConnectionResetError on a connection it just opened. Signed-off-by: Jack Thomson <jackabt@amazon.com>

Add unit tests for the vsock device testing the new behaviour with the locking on reset. Signed-off-by: Jack Thomson <jackabt@amazon.com>

Add changelog item for the vsock reset race fix. Signed-off-by: Jack Thomson <jackabt@amazon.com>

micz010

Approved, reviewed only the changelog change

JackThomson2 force-pushed the fix/vsock_reset_race branch from a16431d to d7a3b0b Compare May 13, 2026 11:14

JackThomson2 added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label May 14, 2026

Manciukic reviewed May 15, 2026

View reviewed changes

JackThomson2 force-pushed the fix/vsock_reset_race branch from d2bec53 to afcfd09 Compare May 15, 2026 14:13

JamesC1305 self-requested a review May 18, 2026 09:18

JamesC1305 reviewed May 18, 2026

View reviewed changes

Comment thread src/vmm/src/devices/virtio/vsock/device.rs

Manciukic previously approved these changes May 18, 2026

View reviewed changes

JamesC1305 reviewed May 18, 2026

View reviewed changes

Comment thread src/vmm/src/devices/virtio/vsock/event_handler.rs

Comment thread tests/integration_tests/functional/test_vsock.py Outdated

Comment thread src/vmm/src/devices/virtio/vsock/device.rs

JackThomson2 dismissed Manciukic’s stale review via 499f43d May 18, 2026 14:14

JackThomson2 force-pushed the fix/vsock_reset_race branch 5 times, most recently from e6a4d24 to 54173cb Compare May 21, 2026 14:12

Manciukic reviewed May 22, 2026

View reviewed changes

Comment thread src/vmm/src/devices/virtio/vsock/device.rs Outdated

JackThomson2 force-pushed the fix/vsock_reset_race branch 2 times, most recently from 25f0b12 to 0cba3a6 Compare May 26, 2026 13:36

JamesC1305 reviewed May 26, 2026

View reviewed changes

Comment thread tests/framework/utils_vsock.py

Comment thread tests/integration_tests/functional/test_vsock.py

Comment thread src/vmm/src/devices/virtio/queue.rs

Comment thread tests/integration_tests/functional/test_vsock.py Outdated

JackThomson2 added 8 commits May 27, 2026 13:15

test(vsock): add unit tests

f99114c

Add unit tests for the vsock device testing the new behaviour with the locking on reset. Signed-off-by: Jack Thomson <jackabt@amazon.com>

JackThomson2 force-pushed the fix/vsock_reset_race branch from 0cba3a6 to f99114c Compare May 27, 2026 13:15

JamesC1305 previously approved these changes May 27, 2026

View reviewed changes

docs(changelog): add vsock reset fix

a5ca64c

Add changelog item for the vsock reset race fix. Signed-off-by: Jack Thomson <jackabt@amazon.com>

JackThomson2 dismissed JamesC1305’s stale review via a5ca64c May 27, 2026 14:54

JackThomson2 requested a review from micz010 as a code owner May 27, 2026 14:54

micz010 approved these changes May 27, 2026

View reviewed changes

JamesC1305 approved these changes May 28, 2026

View reviewed changes

ShadowCurse reviewed May 28, 2026

View reviewed changes

Comment thread tests/framework/utils_vsock.py

Comment thread src/vmm/src/devices/virtio/queue.rs

ShadowCurse approved these changes May 28, 2026

View reviewed changes

JackThomson2 merged commit e90cef8 into firecracker-microvm:main May 28, 2026
8 of 9 checks passed

JackThomson2 deleted the fix/vsock_reset_race branch May 28, 2026 14:41

Conversation

JackThomson2 commented May 13, 2026

Reason

License Acceptance

PR Checklist

Uh oh!

codecov Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Manciukic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Manciukic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JamesC1305 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

micz010 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov Bot commented May 13, 2026 •

edited

Loading