Skip to content

Remove hypervisor_handler thread #533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 3, 2025
Merged

Conversation

ludfjig
Copy link
Contributor

@ludfjig ludfjig commented May 27, 2025

  • Removes the per-vm hypervisor_handler thread in favor of running on the callers thread.
  • Removes all timeout-based cancellation in favor of an explicit kill() api. Currently, one can only cancel guest function calls, not the first vm initialization, but this should be fine as long as you trust your guest binary. You can only interrupt a guest while it's in a blocking call to vcpufd.run(). Host function calls can still not be interrupted (Make it possible to kill guest execution when running a host function. #192)

These changes should improve performance and throughput. It should also avoid the incredible performance drop off we observed under load when the hypervisor handler thread required joining, after cancelling guest execution. 

Added API changes:

impl MultiUseSandbox{
    /// Get a handle to the interrupt handler for this sandbox,
    /// capable of interrupting guest execution.
    pub fn interrupt_handle(&self) -> Arc<dyn InterruptHandle> {
         ...  
    }
    ...
}
/// A trait for handling interrupts to a sandbox's vcpu
pub trait InterruptHandle: Send + Sync {
    /// Interrupt the corresponding sandbox's vcpu if it's running.
    ///
    /// - If this is called while the vcpu is running, then it will interrupt the vcpu and return `true`.
    /// - If this is called while the vcpu is not running, then it will do nothing and return `false`.
    ///
    /// # Note
    /// This function will block for the duration of the time it takes for the vcpu thread to be interrupted.
    fn kill(&self) -> bool;

    /// Returns true iff the corresponding sandbox has been dropped
    fn dropped(&self) -> bool;
}

Removed API changed:

  • All timeout based configuration

closes #471

Note: On KVM, moving the vcpufd (sandbox) to a new thread will incur a performance overhead the first time the vcpu in ran on the new thread, as per kvm kernel docs

@ludfjig ludfjig force-pushed the cancel_vm3 branch 6 times, most recently from eb85e5f to 4abdd49 Compare May 27, 2025 22:56
@ludfjig ludfjig added the kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. label May 27, 2025
@ludfjig ludfjig force-pushed the cancel_vm3 branch 4 times, most recently from c4d0a52 to 210e506 Compare May 28, 2025 00:14
@ludfjig ludfjig marked this pull request as ready for review May 28, 2025 00:52
Copy link
Contributor

@simongdavies simongdavies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments/questions , not quite finished reviewing , I will finish it in the morning

@ludfjig ludfjig force-pushed the cancel_vm3 branch 10 times, most recently from 34a5772 to baf9912 Compare May 30, 2025 22:18
@ludfjig ludfjig force-pushed the cancel_vm3 branch 5 times, most recently from 2374355 to b1f7d4c Compare June 3, 2025 02:46
dblnz
dblnz previously approved these changes Jun 3, 2025
Copy link
Contributor

@dblnz dblnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! 💯
I tested it locally. 😺

@ludfjig ludfjig force-pushed the cancel_vm3 branch 2 times, most recently from 2220073 to 0dc8686 Compare June 3, 2025 15:47
@simongdavies simongdavies self-requested a review June 3, 2025 16:59
simongdavies
simongdavies previously approved these changes Jun 3, 2025
Copy link
Contributor

@simongdavies simongdavies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM , nice work @ludfjig

@ludfjig ludfjig force-pushed the cancel_vm3 branch 3 times, most recently from 7a21f4c to a931dd2 Compare June 3, 2025 20:05
Copy link
Contributor

@danbugs danbugs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Thanks for the fix on the SIGRTMIN offset stuff!

To finish off this PR, I think we just have to update some docs. Doing a quick search for "hypervisor handler" in the project shows references in shared_mem.rs, guest.rs, how-to-debug-a-hyperlight-guest.md, and the copilot-instructions.md.

Also, I think we should update the signal-handlers-development-notes.md doc to be appropriate to what we have rn.

ludfjig added 6 commits June 3, 2025 13:26
Signed-off-by: Ludvig Liljenberg <[email protected]>
…, but a new function call could be scheduled, before the interruptor-thread has time to observe the fact that the vcpu was interrupted

Signed-off-by: Ludvig Liljenberg <[email protected]>
Copy link
Contributor

@danbugs danbugs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!

@ludfjig ludfjig merged commit 60ed739 into hyperlight-dev:main Jun 3, 2025
46 of 48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement For PRs adding features, improving functionality, docs, tests, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove the Sandbox Execution thread , replace elapsed time based timeouts with a Kill function
4 participants