Remove hypervisor_handler thread #533

ludfjig · 2025-05-27T21:51:24Z

Removes the per-vm hypervisor_handler thread in favor of running on the callers thread.
Removes all timeout-based cancellation in favor of an explicit kill() api. Currently, one can only cancel guest function calls, not the first vm initialization, but this should be fine as long as you trust your guest binary. You can only interrupt a guest while it's in a blocking call to vcpufd.run(). Host function calls can still not be interrupted (Make it possible to kill guest execution when running a host function. #192)

These changes should improve performance and throughput. It should also avoid the incredible performance drop off we observed under load when the hypervisor handler thread required joining, after cancelling guest execution.

Added API changes:

impl MultiUseSandbox{
    /// Get a handle to the interrupt handler for this sandbox,
    /// capable of interrupting guest execution.
    pub fn interrupt_handle(&self) -> Arc<dyn InterruptHandle> {
         ...  
    }
    ...
}

/// A trait for handling interrupts to a sandbox's vcpu
pub trait InterruptHandle: Send + Sync {
    /// Interrupt the corresponding sandbox's vcpu if it's running.
    ///
    /// - If this is called while the vcpu is running, then it will interrupt the vcpu and return `true`.
    /// - If this is called while the vcpu is not running, then it will do nothing and return `false`.
    ///
    /// # Note
    /// This function will block for the duration of the time it takes for the vcpu thread to be interrupted.
    fn kill(&self) -> bool;

    /// Returns true iff the corresponding sandbox has been dropped
    fn dropped(&self) -> bool;
}

Removed API changed:

All timeout based configuration

closes #471

Note: On KVM, moving the vcpufd (sandbox) to a new thread will incur a performance overhead the first time the vcpu in ran on the new thread, as per kvm kernel docs

simongdavies

Left some comments/questions , not quite finished reviewing , I will finish it in the morning

src/hyperlight_host/src/func/call_ctx.rs

Justfile

src/hyperlight_host/src/func/guest_dispatch.rs

src/hyperlight_host/examples/logging/main.rs

src/hyperlight_host/examples/metrics/main.rs

src/hyperlight_host/examples/tracing/main.rs

src/hyperlight_host/src/hypervisor/hyperv_linux.rs

src/hyperlight_host/src/hypervisor/kvm.rs

src/hyperlight_host/src/hypervisor/mod.rs

dblnz

Awesome work! 💯
I tested it locally. 😺

src/hyperlight_host/src/hypervisor/hyperv_windows.rs

src/hyperlight_host/src/hypervisor/hyperv_linux.rs

src/hyperlight_host/src/hypervisor/kvm.rs

src/hyperlight_host/src/sandbox/config.rs

src/hyperlight_host/src/signal_handlers/mod.rs

src/hyperlight_host/tests/integration_test.rs

simongdavies

LGTM , nice work @ludfjig

src/hyperlight_host/src/hypervisor/mod.rs

src/hyperlight_host/src/sandbox/config.rs

danbugs

Looking good. Thanks for the fix on the SIGRTMIN offset stuff!

To finish off this PR, I think we just have to update some docs. Doing a quick search for "hypervisor handler" in the project shows references in shared_mem.rs, guest.rs, how-to-debug-a-hyperlight-guest.md, and the copilot-instructions.md.

Also, I think we should update the signal-handlers-development-notes.md doc to be appropriate to what we have rn.

src/hyperlight_host/src/hypervisor/hyperv_linux.rs

Signed-off-by: Ludvig Liljenberg <[email protected]>

… sandbox Signed-off-by: Ludvig Liljenberg <[email protected]>

…, but a new function call could be scheduled, before the interruptor-thread has time to observe the fact that the vcpu was interrupted Signed-off-by: Ludvig Liljenberg <[email protected]>

danbugs

LGTM!!

ludfjig force-pushed the cancel_vm3 branch 6 times, most recently from eb85e5f to 4abdd49 Compare May 27, 2025 22:56

ludfjig added the kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. label May 27, 2025

ludfjig force-pushed the cancel_vm3 branch 4 times, most recently from c4d0a52 to 210e506 Compare May 28, 2025 00:14

ludfjig marked this pull request as ready for review May 28, 2025 00:52

ludfjig requested review from danbugs, dblnz, devigned, syntactically, marosset, jprendes and simongdavies as code owners May 28, 2025 00:52

simongdavies reviewed May 28, 2025

View reviewed changes

ludfjig force-pushed the cancel_vm3 branch 10 times, most recently from 34a5772 to baf9912 Compare May 30, 2025 22:18

ludfjig force-pushed the cancel_vm3 branch 5 times, most recently from 2374355 to b1f7d4c Compare June 3, 2025 02:46

dblnz previously approved these changes Jun 3, 2025

View reviewed changes

dblnz reviewed Jun 3, 2025

View reviewed changes

src/hyperlight_host/src/hypervisor/hyperv_windows.rs Show resolved Hide resolved

ludfjig dismissed dblnz’s stale review via 2220073 June 3, 2025 15:46

ludfjig force-pushed the cancel_vm3 branch 2 times, most recently from 2220073 to 0dc8686 Compare June 3, 2025 15:47

simongdavies requested changes Jun 3, 2025

View reviewed changes

ludfjig force-pushed the cancel_vm3 branch from 0dc8686 to 91ae36f Compare June 3, 2025 16:36

simongdavies self-requested a review June 3, 2025 16:59

simongdavies previously approved these changes Jun 3, 2025

View reviewed changes

danbugs requested changes Jun 3, 2025

View reviewed changes

src/hyperlight_host/src/hypervisor/mod.rs Outdated Show resolved Hide resolved

src/hyperlight_host/src/sandbox/config.rs Show resolved Hide resolved

ludfjig dismissed simongdavies’s stale review via 9cea057 June 3, 2025 18:27

ludfjig force-pushed the cancel_vm3 branch 3 times, most recently from 7a21f4c to a931dd2 Compare June 3, 2025 20:05

danbugs reviewed Jun 3, 2025

View reviewed changes

src/hyperlight_host/src/hypervisor/hyperv_linux.rs Show resolved Hide resolved

ludfjig added 6 commits June 3, 2025 13:26

Remove Hypervisor-Handler thread, and timeout-based config

54b3f4f

Signed-off-by: Ludvig Liljenberg <[email protected]>

Implement InterruptHandle API

19212f7

Signed-off-by: Ludvig Liljenberg <[email protected]>

Fix tests and examples that relied on timing out

9d23826

Signed-off-by: Ludvig Liljenberg <[email protected]>

Make interrupt retry delay configurable

e1492a4

Signed-off-by: Ludvig Liljenberg <[email protected]>

Allow configuring the signal number of the signal used to interrupt a…

883624e

… sandbox Signed-off-by: Ludvig Liljenberg <[email protected]>

Prevent ABA-problem, where the vcpu could be successfully interrupted…

de5c328

…, but a new function call could be scheduled, before the interruptor-thread has time to observe the fact that the vcpu was interrupted Signed-off-by: Ludvig Liljenberg <[email protected]>

ludfjig force-pushed the cancel_vm3 branch from a931dd2 to de5c328 Compare June 3, 2025 20:26

danbugs approved these changes Jun 3, 2025

View reviewed changes

ludfjig merged commit 60ed739 into hyperlight-dev:main Jun 3, 2025
46 of 48 checks passed

ludfjig mentioned this pull request Jun 4, 2025

UninitializedSandbox::init hangs if no permissions on /dev/kvm #561

Closed

Remove hypervisor_handler thread #533

Remove hypervisor_handler thread #533

Uh oh!

Conversation

ludfjig commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simongdavies left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dblnz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

simongdavies left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

danbugs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

danbugs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ludfjig commented May 27, 2025 •

edited

Loading