Add PCI hotplug support#5786
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5786 +/- ##
==========================================
- Coverage 82.82% 82.80% -0.02%
==========================================
Files 276 277 +1
Lines 29691 29889 +198
==========================================
+ Hits 24591 24751 +160
- Misses 5100 5138 +38
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
0fcdd09 to
cdcf610
Compare
Manciukic
left a comment
There was a problem hiding this comment.
mostly LGTM, just a few things here and there. I'm happy to take some of the items (like extending test coverage) to a separate PR. We should also update documentation, but happy to take that on another PR.
edfdb0d to
b878e82
Compare
I extended the coverage in this PR, but will update documentation and CHANGELOG in a different PR after we get this merged. |
33c4989 to
70ad97e
Compare
ApiServerAdapter has a handle_request() method for dispatching requests received from the API thread to the right handler. This method is called from inside EventManager::run() which takes &mut self as an argument. This can be problematic if we want to modify the EventManager object from inside handle_request(). Work around that by having ApiServerAdapter::process() store the request but not call handle_request(). EventManager::run() returns after handling each event. Call handle_request() after EventManager::run() in the event loop. In a subsequent patch, handle_request() will take a &mut EventManager as an argument. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
The handle_request() function will soon need access to the EventManager in order to add and remove new objects to support device hot-plugging. Pass a mutable reference of EventManager to handle_request() in preparation of that. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
The EntropyDevice and PmemDevice error names are inconsistent with the error names used for all other devices all of which use the Config suffix. Rename them to EntropyConfig and PmemConfig for consistency. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Define VirtioDeviceId as (VirtioDeviceType, String) and use it in the PCI and MMIO device manager HashMaps instead of the raw tuple. This will be used in more places in subsequent patches and having an ID type increases readability. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Firecracker has been rejecting device attach API requests after the VM was started until now. Add hot-plugging for Block, Pmem and Net PCIe devices. This enables the relevant API calls and attaches the device to the PCIe "bus". No notification is delivered to the guest at the moment to notify it that a new device has been added. The guest has to manually rescan the bus in order to detect new devices. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
PciBus::add_device() silently overwrites the previous device if called with a device_id that is already occupied. Add an assert to catch this as a programming error. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
add_device() inserts a device into the PCI bus but does not mark the corresponding slot in the device_ids bitmap. This is ok without device hotplugging support because next_device_id() is never called after restore. However, with the upcoming device hotplugging support this is going to be a problem. The device_ids vector is set to all False after restore. Therefore if we try to hotplug a device after a snapshot restore then next_device_id() will return a slot that is not really free. Fix this by marking the slot as non-free in add_device(). After snapshot restore add_device() will be called for every device saved in the snapshot and the device_ids vector will be updated accordingly. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Wire up the HTTP DELETE method for device hot-unplug. Add the HotUnplugDevice VmmAction variant, route it through the pre-boot/runtime controllers, and define the hot_unplug_device() stub which will be implemented in a subsequent patch. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Implement the actual device teardown: remove the device from the virtio devices map, the MMIO bus, the PCI bus, and the event manager. Also add PciBus::remove_device() to free the PCI device slot. No notification is delivered to the guest at the moment to notify it that a device has been removed. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
The test generates bad syscall arguments by adding 1000000 to the allowed value, assuming this produces a value that will be rejected. This breaks for masked_eq operations: e.g. with mask=4 and val=0 (PROT_EXEC must not be set), (0 + 1000000) & 4 is still 0, so the bad value passes the check. Use XOR with the mask instead, which flips the relevant bit and guarantees a violation regardless of whether the rule expects the bit set or clear. A subsequent patch will add seccomp rules that use the 'masked_eq' operation. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
The device hotplug path creates devices after boot, when seccomp filters are already active. This requires allowing syscalls that were previously only called during boot before seccomp was installed. Add the following to the vmm thread filter: - timerfd_create: RateLimiter creates a TimerFd for each new device - ioctl(KVM_IOEVENTFD): registering ioeventfds for virtqueue notification - ioctl(TUNSETIFF): opening tap device for net hotplug - ioctl(TUNSETOFFLOAD): configuring tap offload for net hotplug - ioctl(TUNSETVNETHDRSZ): setting vnet header size for net hotplug - mmap(MAP_SHARED|MAP_NORESERVE, !PROT_EXEC): pmem backing file mapping - mmap(MAP_PRIVATE|MAP_NORESERVE|MAP_ANONYMOUS, !PROT_EXEC): pmem aligned region - mmap(MAP_SHARED|MAP_NORESERVE|MAP_FIXED, !PROT_EXEC): pmem file overlay mapping - mmap(MAP_SHARED|MAP_FIXED, PROT_READ|PROT_WRITE): IovDeque ring buffer for net device - memfd_create(MFD_CLOEXEC|MFD_ALLOW_SEALING): IovDeque ring buffer for net device - fcntl(F_ADD_SEALS): IovDeque memfd sealing for net device Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add unit tests for testing the possible failure modes of hotplugging different classes of devices. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Extend the existing hotplug unit tests to also verify device hot-unplug and add new tests to verify that hot-unplugging root devices is rejected. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add integration tests for block, pmem and net hotplugging. The tests require a manual PCI bus rescan at the moment since no hotplug notification mechanism is implemented at the moment. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a delete() method to the test framework HTTP API client and extend the hotplug tests to verify device hot-unplug. Also extend the test_hotplug_max_devices() test to unplugs all devices, verifies slots are freed, and plug them again. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add PCI hotplug support. No hotplug notification mechanism is implemented yet, so the the guest needs to rescan the PCI "bus" manually in order to see new attachments.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.PR Checklist
tools/devtool checkbuild --allto verify that the PR passesbuild checks on all supported architectures.
tools/devtool checkstyleto verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md.Runbook for Firecracker API changes.
integration tests.
TODO.rust-vmm.