Skip to content

fix: remove over-strict K%4 assert in get_shuffle_matrix_sf_a_row_indices#3163

Open
jimmyzho wants to merge 2 commits into
flashinfer-ai:mainfrom
jimmyzho:fix/issue-2122-remove-k4-assert
Open

fix: remove over-strict K%4 assert in get_shuffle_matrix_sf_a_row_indices#3163
jimmyzho wants to merge 2 commits into
flashinfer-ai:mainfrom
jimmyzho:fix/issue-2122-remove-k4-assert

Conversation

@jimmyzho
Copy link
Copy Markdown
Contributor

@jimmyzho jimmyzho commented Apr 24, 2026

Summary

Removes the assert K % 4 == 0 guard from get_shuffle_matrix_sf_a_row_indices in flashinfer/utils.py. The assertion rejects valid inputs (e.g. K=90 for GPT-OSS-120B MXFP4 weights), even though the downstream kernel handles them correctly.

Why it's safe to remove:

  • get_shuffle_matrix_sf_a_row_indices only computes row permutation indices over the M dimension. K is unpacked from the shape but never used.
  • K is padded up to a multiple of 4 in the downstream path:
    • C++ launcher (csrc/nv_internal/tensorrt_llm/thop/fp4Op.cpp:197): cols_padded = PadUpFn(cols, 4) and passes n_padded to the kernel.
    • CUDA kernel (csrc/nv_internal/cpp/kernels/quantization.cu:400-418): loops up to numColsPadded, writes 0 for padded slots (T sf = 0; if (colIdx < numCols) sf = SFIn[...];).
    • Python (flashinfer/quantization/fp4_quantization.py:51-54, 297-304): _compute_swizzled_layout_sf_size uses round_up(total_column, 4) to size the output buffer to match.

Fixes #2122.

Test plan

  • Verify get_shuffle_matrix_sf_a_row_indices succeeds for K=90 (the GPT-OSS-120B case that triggered the original assertion).
  • Verify K=1, K=4, K=88 (previously valid) still return correct row indices.
  • Verify the full shuffle_matrix_sf_a pipeline (row shuffle → block_scale_interleave) produces correctly-sized outputs for non-aligned K.
  • Run the GPT-OSS-120B MXFP4 MoE repro from FlashInfer MXFP4 moe crashes with GPT-OSS-120B #2122 and confirm it no longer asserts.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Improvements
    • Relaxed a strict precondition on matrix dimensions: the system now accepts sizes previously rejected and relies on internal padding/alignment handled downstream, improving compatibility and reducing manual configuration.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6023ece9-ccd4-40b1-916f-328879616bfa

📥 Commits

Reviewing files that changed from the base of the PR and between 53aca62 and 3d21b6f.

📒 Files selected for processing (1)
  • flashinfer/utils.py

📝 Walkthrough

Walkthrough

The assertion enforcing that K is divisible by 4 was removed from get_shuffle_matrix_sf_a_row_indices in flashinfer/utils.py; a comment now notes that downstream block_scale_interleave pads K to a multiple of 4 and K is unused in the row-permutation computation.

Changes

Shuffle matrix logic

Layer / File(s) Summary
Pre-shuffle check
flashinfer/utils.py
Removed assert K % 4 == 0; added comment clarifying K is unused for row-permutation and that block_scale_interleave will pad K to a multiple of 4.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

op: moe

Suggested reviewers

  • sricketts
  • yongwww
  • yzh119
  • cyx-6
  • samuellees
  • bkryu

Poem

🐰 I hopped through code with nimble paws,
I nudged a check that caused some pause.
K may wobble, not divide by four,
The kernel pads and hums once more.
🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately describes the main change: removing an overly strict K%4 assertion from get_shuffle_matrix_sf_a_row_indices.
Description check ✅ Passed The PR description comprehensively explains what was changed, why it's safe, provides evidence from three code locations, and includes a detailed test plan.
Linked Issues check ✅ Passed The PR successfully addresses issue #2122 by removing the K%4 assertion that was rejecting K=90 for GPT-OSS-120B, while providing evidence that downstream code handles non-aligned K.
Out of Scope Changes check ✅ Passed All changes are directly scoped to removing the problematic assertion and adding a clarifying comment; no unrelated modifications are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jimmyzho
Copy link
Copy Markdown
Contributor Author

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !597 has been created, and the CI pipeline #49345749 is currently running. I'll report back once the pipeline job completes.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the K % 4 alignment assertion in the get_shuffle_matrix_sf_a_row_indices function within flashinfer/utils.py. This change allows for non-aligned K dimensions, such as those found in GPT-OSS-120B MXFP4 weights, as the function only computes row permutation indices over the M dimension and downstream kernels handle the necessary padding. I have no feedback to provide.

@aleozlx aleozlx enabled auto-merge (squash) May 8, 2026 23:17
jimmyzho and others added 2 commits May 11, 2026 20:29
…ices

The assertion `K % 4 == 0` in `get_shuffle_matrix_sf_a_row_indices`
rejects valid inputs that the downstream kernel handles correctly.
This function only computes row permutation indices over the M
dimension and never uses K. The downstream `block_scale_interleave`
CUDA kernel already pads K to a multiple of 4 internally via
`_compute_swizzled_layout_sf_size` (round_up(total_column, 4)).

Removing the assertion unblocks models with non-aligned K dimensions,
e.g. K=90 in GPT-OSS-120B MXFP4 weights (issue flashinfer-ai#2122).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jimmyzho jimmyzho force-pushed the fix/issue-2122-remove-k4-assert branch from 20cd7be to 3d21b6f Compare May 12, 2026 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FlashInfer MXFP4 moe crashes with GPT-OSS-120B

3 participants