[Core][Refactor] scheduler: refactor prefill chunk alignment logic by jzakrzew · Pull Request #41728 · vllm-project/vllm

jzakrzew · 2026-05-05T11:49:38Z

Purpose

This is a behavior-preserving refactor that makes it possible to add prefill chunk alignment logic without introducing model-specific hacks directly into the scheduler. It introduces a PrefillChunkAlignmentPolicy interface, which separates prefill chunk-shaping decisions from the scheduler control flow. As part of this change, the existing Mamba align-mode chunk-splitting logic has been fully moved out of the scheduler and into the alignment policy layer.

The immediate motivation for this PR is the lack of progress on #38561. This PR does not include any batch-invariance-related logic, but it may provide a cleaner foundation for that work.

Test Plan

Added new tests for scheduler alignment policy behavior in tests/v1/core/test_scheduler.py.
Added tests for the policies themselves in tests/v1/core/test_prefill_chunk_alignment.py.

pytest tests/v1/core/test_scheduler.py -k align
pytest tests/v1/core/test_prefill_chunk_alignment.py

Introduce a PrefillChunkAlignmentPolicy interface to separate prefill chunk-shaping decisions from the scheduler control flow. Move the existing Mamba align-mode chunk splitting into a dedicated policy. Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request refactors the Mamba block-aligned splitting logic into a more flexible PrefillChunkAlignmentPolicy system. It introduces a protocol-based approach with a default no-op policy and a specific Mamba policy, allowing the scheduler to handle prefill chunk alignment more cleanly across different model types. A critical issue was identified in the Mamba alignment logic where a small token budget (less than the block size) could cause num_scheduled_tokens to snap to zero, potentially stalling the scheduler; this contradicts the code's internal documentation which suggests such small chunks should be allowed through without caching.

jzakrzew requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, njhill, orozery, robertgshaw2-redhat and ywang96 as code owners May 5, 2026 11:49

claude Bot reviewed May 5, 2026

View reviewed changes

mergify Bot added the v1 label May 5, 2026

gemini-code-assist Bot reviewed May 5, 2026

View reviewed changes

Comment thread vllm/v1/core/sched/prefill_chunk_alignment.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Core][Refactor] scheduler: refactor prefill chunk alignment logic#41728

[Core][Refactor] scheduler: refactor prefill chunk alignment logic#41728
jzakrzew wants to merge 1 commit intovllm-project:mainfrom
jzakrzew:refactor-scheduler-alignment-logic

jzakrzew commented May 5, 2026

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jzakrzew commented May 5, 2026

Purpose

Test Plan

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant