fix: remove over-strict K%4 assert in get_shuffle_matrix_sf_a_row_indices by jimmyzho · Pull Request #3163 · flashinfer-ai/flashinfer

jimmyzho · 2026-04-24T00:26:48Z

Summary

Removes the assert K % 4 == 0 guard from get_shuffle_matrix_sf_a_row_indices in flashinfer/utils.py. The assertion rejects valid inputs (e.g. K=90 for GPT-OSS-120B MXFP4 weights), even though the downstream kernel handles them correctly.

Why it's safe to remove:

get_shuffle_matrix_sf_a_row_indices only computes row permutation indices over the M dimension. K is unpacked from the shape but never used.
K is padded up to a multiple of 4 in the downstream path:
- C++ launcher (csrc/nv_internal/tensorrt_llm/thop/fp4Op.cpp:197): cols_padded = PadUpFn(cols, 4) and passes n_padded to the kernel.
- CUDA kernel (csrc/nv_internal/cpp/kernels/quantization.cu:400-418): loops up to numColsPadded, writes 0 for padded slots (T sf = 0; if (colIdx < numCols) sf = SFIn[...];).
- Python (flashinfer/quantization/fp4_quantization.py:51-54, 297-304): _compute_swizzled_layout_sf_size uses round_up(total_column, 4) to size the output buffer to match.

Fixes #2122.

Test plan

Verify get_shuffle_matrix_sf_a_row_indices succeeds for K=90 (the GPT-OSS-120B case that triggered the original assertion).
Verify K=1, K=4, K=88 (previously valid) still return correct row indices.
Verify the full shuffle_matrix_sf_a pipeline (row shuffle → block_scale_interleave) produces correctly-sized outputs for non-aligned K.
Run the GPT-OSS-120B MXFP4 MoE repro from FlashInfer MXFP4 moe crashes with GPT-OSS-120B #2122 and confirm it no longer asserts.

🤖 Generated with Claude Code

Summary by CodeRabbit

Improvements
- Relaxed a strict precondition on matrix dimensions: the system now accepts sizes previously rejected and relies on internal padding/alignment handled downstream, improving compatibility and reducing manual configuration.

coderabbitai · 2026-04-24T00:27:04Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6023ece9-ccd4-40b1-916f-328879616bfa

📥 Commits

Reviewing files that changed from the base of the PR and between 53aca62 and 3d21b6f.

📒 Files selected for processing (1)

flashinfer/utils.py

📝 Walkthrough

Walkthrough

The assertion enforcing that K is divisible by 4 was removed from get_shuffle_matrix_sf_a_row_indices in flashinfer/utils.py; a comment now notes that downstream block_scale_interleave pads K to a multiple of 4 and K is unused in the row-permutation computation.

Changes

Shuffle matrix logic

Layer / File(s)	Summary
Pre-shuffle check `flashinfer/utils.py`	Removed `assert K % 4 == 0`; added comment clarifying K is unused for row-permutation and that `block_scale_interleave` will pad K to a multiple of 4.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

op: moe

Suggested reviewers

sricketts
yongwww
yzh119
cyx-6
samuellees
bkryu

Poem

🐰 I hopped through code with nimble paws,
I nudged a check that caused some pause.
K may wobble, not divide by four,
The kernel pads and hums once more.
🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately describes the main change: removing an overly strict K%4 assertion from get_shuffle_matrix_sf_a_row_indices.
Description check	✅ Passed	The PR description comprehensively explains what was changed, why it's safe, provides evidence from three code locations, and includes a detailed test plan.
Linked Issues check	✅ Passed	The PR successfully addresses issue `#2122` by removing the K%4 assertion that was rejecting K=90 for GPT-OSS-120B, while providing evidence that downstream code handles non-aligned K.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to removing the problematic assertion and adding a clarifying comment; no unrelated modifications are present.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

jimmyzho · 2026-04-24T00:27:17Z

/bot run

flashinfer-bot · 2026-04-24T00:29:11Z

GitLab MR !597 has been created, and the CI pipeline #49345749 is currently running. I'll report back once the pipeline job completes.

gemini-code-assist

Code Review

This pull request removes the K % 4 alignment assertion in the get_shuffle_matrix_sf_a_row_indices function within flashinfer/utils.py. This change allows for non-aligned K dimensions, such as those found in GPT-OSS-120B MXFP4 weights, as the function only computes row permutation indices over the M dimension and downstream kernels handle the necessary padding. I have no feedback to provide.

…ices The assertion `K % 4 == 0` in `get_shuffle_matrix_sf_a_row_indices` rejects valid inputs that the downstream kernel handles correctly. This function only computes row permutation indices over the M dimension and never uses K. The downstream `block_scale_interleave` CUDA kernel already pads K to a multiple of 4 internally via `_compute_swizzled_layout_sf_size` (round_up(total_column, 4)). Removing the assertion unblocks models with non-aligned K dimensions, e.g. K=90 in GPT-OSS-120B MXFP4 weights (issue flashinfer-ai#2122). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jimmyzho requested review from aleozlx, bkryu, cyx-6, kahyunnam, nv-yunzheq, saltyminty, samuellees, sricketts, yongwww, yyihuang and yzh119 as code owners April 24, 2026 00:26

jimmyzho added the run-ci label Apr 24, 2026

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

aleozlx approved these changes May 8, 2026

View reviewed changes

aleozlx enabled auto-merge (squash) May 8, 2026 23:17

jimmyzho and others added 2 commits May 11, 2026 20:29

shorten comment string

3d21b6f

jimmyzho force-pushed the fix/issue-2122-remove-k4-assert branch from 20cd7be to 3d21b6f Compare May 12, 2026 00:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: remove over-strict K%4 assert in get_shuffle_matrix_sf_a_row_indices#3163

fix: remove over-strict K%4 assert in get_shuffle_matrix_sf_a_row_indices#3163
jimmyzho wants to merge 2 commits into
flashinfer-ai:mainfrom
jimmyzho:fix/issue-2122-remove-k4-assert

jimmyzho commented Apr 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

jimmyzho commented Apr 24, 2026

Uh oh!

flashinfer-bot commented Apr 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jimmyzho commented Apr 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

jimmyzho commented Apr 24, 2026

Uh oh!

flashinfer-bot commented Apr 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jimmyzho commented Apr 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading