Skip to content

[kv_offload] Merge FilterReusedOffloadingManager and add lifecycle hook to OffloadingManager#41727

Open
hickeyma wants to merge 3 commits intovllm-project:mainfrom
hickeyma:refactor-offload
Open

[kv_offload] Merge FilterReusedOffloadingManager and add lifecycle hook to OffloadingManager#41727
hickeyma wants to merge 3 commits intovllm-project:mainfrom
hickeyma:refactor-offload

Conversation

@hickeyma
Copy link
Copy Markdown
Contributor

@hickeyma hickeyma commented May 5, 2026

Purpose

This commit implements Tasks 5 and 6 of #33689:

  • Task 5: Move reuse_manager.py into cpu/manager.py
  • Task 6: Add request_finished lifecycle hook to OffloadingManager

Test Plan

CI passes

Tests pass

CI passes

…and add OffloadingManager.request_finished

This commit implements Tasks 5 and 6 of vllm-project#33689:

- Task 5: Move `reuse_manager.py` into `cpu/manager.py`
- Task 6: Add `request_finished` lifecycle hook to `OffloadingManager`

Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
@hickeyma hickeyma requested review from ApostaC and orozery as code owners May 5, 2026 11:29
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added the v1 label May 5, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request moves the FilterReusedOffloadingManager to vllm/v1/kv_offload/cpu/manager.py and introduces a request_finished hook to the OffloadingManager interface to track async work before freeing GPU blocks. The review feedback highlights a design limitation where the lack of req_id in the prepare_store method makes the new hook difficult to implement effectively. Additionally, there are suggestions to fix a garbled docstring and to ensure the shutdown method is properly delegated in the decorator class.

Comment thread vllm/v1/kv_offload/base.py
Comment thread vllm/v1/kv_offload/cpu/manager.py
Comment thread vllm/v1/kv_offload/cpu/manager.py
self.events.clear()


class FilterReusedOffloadingManager(OffloadingManager):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking to remove FilterReusedOffloadingManager and instead copy its logic inside CPUOffloadingManager

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orozery Do you want it to be done in this PR or in a separate PR?

hickeyma and others added 2 commits May 5, 2026 14:09
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants