update moe_smooth_quant dispatch and v2 supports block_m is a multiple of 16 by junhaha666 · Pull Request #2333 · ROCm/aiter

junhaha666 · 2026-03-18T15:42:34Z

…

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…ck_m is a multiple of 16

github-actions · 2026-03-18T15:43:39Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2333 --add-label <label>

Copilot

Pull request overview

This PR updates the MoE smooth-per-token scaled quantization dispatch to better support block_m values that are multiples of 16, and adjusts the Python-side routing logic for small-token MoE stage1 cases.

Changes:

Update moe_smooth_per_token_scaled_quant_v2 dispatch math to use a fixed block_split=16 and enforce block_m % 16 == 0.
Add an is_balanced flag to moe_smooth_per_token_scaled_quant to control whether stage1 uses the v1 kernel or a smooth_per_token_scaled_quant fallback for small M.
Remove outdated commented call examples in fused_moe_bf16_asm.py.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`csrc/kernels/quant_kernels.cu`	Enforces `block_m` divisibility by 16 and changes v2 block splitting/indexing to be consistent for non-power-of-2 `block_m` (as long as divisible by 16).
`aiter/ops/quant.py`	Adds `is_balanced` flag and changes stage1 small-`M` dispatch behavior (v1 vs smooth-quant fallback).
`aiter/fused_moe_bf16_asm.py`	Removes commented-out alternative quant call paths around `asm_moe`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

aiter/ops/quant.py

csrc/kernels/quant_kernels.cu

update moe_smooth_per_token_scaled_quant dispatch and v2 supports blo…

3efd158

…ck_m is a multiple of 16

junhaha666 requested review from a team and Copilot March 18, 2026 15:42

Copilot started reviewing on behalf of junhaha666 March 18, 2026 15:44 View session

Copilot AI reviewed Mar 18, 2026

View reviewed changes

aiter/ops/quant.py Show resolved Hide resolved

csrc/kernels/quant_kernels.cu Show resolved Hide resolved

valarLip approved these changes Mar 19, 2026

View reviewed changes

valarLip merged commit 3fb9c71 into main Mar 19, 2026
29 checks passed

valarLip deleted the jun/moe_smoothquant3 branch March 19, 2026 06:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update moe_smooth_quant dispatch and v2 supports block_m is a multiple of 16#2333

update moe_smooth_quant dispatch and v2 supports block_m is a multiple of 16#2333
valarLip merged 1 commit intomainfrom
jun/moe_smoothquant3

junhaha666 commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

junhaha666 commented Mar 18, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions bot commented Mar 18, 2026

🏷️ CI Guide

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants