Skip to content

[Core][Refactor] scheduler: refactor prefill chunk alignment logic#41728

Open
jzakrzew wants to merge 1 commit intovllm-project:mainfrom
jzakrzew:refactor-scheduler-alignment-logic
Open

[Core][Refactor] scheduler: refactor prefill chunk alignment logic#41728
jzakrzew wants to merge 1 commit intovllm-project:mainfrom
jzakrzew:refactor-scheduler-alignment-logic

Conversation

@jzakrzew
Copy link
Copy Markdown
Contributor

@jzakrzew jzakrzew commented May 5, 2026

Purpose

This is a behavior-preserving refactor that makes it possible to add prefill chunk alignment logic without introducing model-specific hacks directly into the scheduler. It introduces a PrefillChunkAlignmentPolicy interface, which separates prefill chunk-shaping decisions from the scheduler control flow. As part of this change, the existing Mamba align-mode chunk-splitting logic has been fully moved out of the scheduler and into the alignment policy layer.

The immediate motivation for this PR is the lack of progress on #38561. This PR does not include any batch-invariance-related logic, but it may provide a cleaner foundation for that work.

Test Plan

Added new tests for scheduler alignment policy behavior in tests/v1/core/test_scheduler.py.
Added tests for the policies themselves in tests/v1/core/test_prefill_chunk_alignment.py.

pytest tests/v1/core/test_scheduler.py -k align
pytest tests/v1/core/test_prefill_chunk_alignment.py

Introduce a PrefillChunkAlignmentPolicy interface to separate prefill
chunk-shaping decisions from the scheduler control flow. Move the existing
Mamba align-mode chunk splitting into a dedicated policy.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added the v1 label May 5, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Mamba block-aligned splitting logic into a more flexible PrefillChunkAlignmentPolicy system. It introduces a protocol-based approach with a default no-op policy and a specific Mamba policy, allowing the scheduler to handle prefill chunk alignment more cleanly across different model types. A critical issue was identified in the Mamba alignment logic where a small token budget (less than the block size) could cause num_scheduled_tokens to snap to zero, potentially stalling the scheduler; this contradicts the code's internal documentation which suggests such small chunks should be allowed through without caching.

Comment thread vllm/v1/core/sched/prefill_chunk_alignment.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant