[Bugfix] Fix MoE flatten_tp_size unconditionally including dp_size by AjAnubolu · Pull Request #36240 · vllm-project/vllm

AjAnubolu · 2026-03-06T09:23:24Z

DP was folded into TP for non-EP MoE, causing incorrect partition sizes (e.g. 64 instead of 128 with TP=4 DP=2). Only flatten PCP into TP; keep DP ranks replicated.

Closes #36222

gemini-code-assist

Code Review

This pull request addresses a bug in the Mixture of Experts (MoE) layer configuration where the data parallelism (DP) size was incorrectly flattened into the tensor parallelism (TP) size for non-expert-parallel (non-EP) MoE setups. The fix correctly separates the configuration logic for EP and non-EP cases. For non-EP MoE, it now ensures that DP ranks are treated as holding replicated weights and are not folded into the TP dimension, which aligns with the intended design. The logic for EP MoE, where DP is correctly flattened into the expert dimension, is preserved and moved into its respective code block for clarity. The documentation has also been updated to reflect this corrected behavior with clear examples. The changes appear to be correct and effectively resolve the issue.

Split flatten_tp_across_dp_and_pcp into separate EP and non-EP paths so DP ranks hold replicated MoE weights instead of being folded into TP. Signed-off-by: AjAnubolu <anuboluajay@gmail.com>

aoshen02 · 2026-05-14T06:54:57Z

Thanks for the contribution, @andakai will take a look

andakai · 2026-05-14T09:03:38Z

Thanks for the fix. I checked and it can fix #36222. Now when enable_expert_parallel=False, TP=4, DP=2, PCP=1, returns tp_size=4 instead of tp_size=8. This preserves DP as a replica dimension for non-EP MoE, so an expert intermediate_size=512 is partitioned as 512 / 4 = 128 rather than 512 / 8 = 64.

Here is another file gpt_oss.py which has a few custom MoE weight-loading paths that call
FusedMoEParallelConfig.flatten_tp_across_dp_and_pcp():

vllm/model_executor/models/gpt_oss.py:_load_weights_mxfp4
vllm/model_executor/models/gpt_oss.py:_load_weights_quark
vllm/model_executor/models/gpt_oss.py:_load_weights_other

andakai · 2026-05-14T09:24:54Z

Now the comment inside flatten_tp_across_dp_and_pcp() may be a bit misleading:

In the non-EP path, the function is intentionally called with neutral DP values (dp_size=1, dp_rank=0) so that DP is not included in the flattened TP size. Could we reword this comment to make the EP vs non-EP behavior explicit? For example:

# Flatten the DP/PCP/TP dimensions selected by the caller into a # single TP-like rank space. In non-EP mode callers pass dp_size=1 # and dp_rank=0 so DP remains a replica dimension; in EP mode callers # pass the real DP size/rank so DP participates in the EP rank space.

I think this would make it more clear.

Clarify the EP vs non-EP behavior: in non-EP mode callers pass dp_size=1, dp_rank=0 so DP stays a replica dimension; in EP mode they pass real DP size/rank so DP joins the EP rank space. Co-authored-by: Claude Co-authored-by: andakai Signed-off-by: AjAnubolu <anuboluajay@gmail.com>

The three weight-loading paths in gpt_oss.py (_load_weights_mxfp4, _load_weights_quark, _load_weights_other) call FusedMoEParallelConfig.flatten_tp_across_dp_and_pcp() with the real dp_size / dp_rank regardless of whether expert parallelism is enabled. In non-EP mode this folds DP into the TP-like rank space, slicing each expert's weights too thinly (e.g. TP=4, DP=2 gave intermediate_size//8 instead of intermediate_size//4). Match the canonical pattern in FusedMoEParallelConfig.make(): in non-EP mode pass dp_size=1, dp_rank=0 so DP stays a replica dimension. EP mode behavior is unchanged. Reported-by: andakai Co-authored-by: Claude Signed-off-by: AjAnubolu <anuboluajay@gmail.com>

andakai · 2026-05-16T15:36:00Z

LGTM, thanks for addressing the comments.

AjAnubolu · 2026-05-17T01:44:01Z

LGTM, thanks for addressing the comments.

thanks for the feedback!

AjAnubolu requested review from mgoin and pavanimajety as code owners March 6, 2026 09:23

mergify Bot added the bug Something isn't working label Mar 6, 2026

gemini-code-assist Bot reviewed Mar 6, 2026

View reviewed changes

[Bugfix] Fix MoE flatten_tp_size unconditionally including dp_size

ba22024

Split flatten_tp_across_dp_and_pcp into separate EP and non-EP paths so DP ranks hold replicated MoE weights instead of being folded into TP. Signed-off-by: AjAnubolu <anuboluajay@gmail.com>

AjAnubolu force-pushed the fix/moe-flatten-tp-36222 branch from fad2e99 to ba22024 Compare March 13, 2026 03:08

andakai reviewed May 14, 2026

View reviewed changes

AjAnubolu requested a review from zyongye as a code owner May 15, 2026 21:54

mergify Bot added the gpt-oss Related to GPT-OSS models label May 15, 2026

github-project-automation Bot added this to gpt-oss Issues & Enhancements May 15, 2026

github-project-automation Bot moved this to To Triage in gpt-oss Issues & Enhancements May 15, 2026

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label May 16, 2026

aoshen02 added help wanted Extra attention is needed ci/build labels May 16, 2026

Merge branch 'main' into fix/moe-flatten-tp-36222

52cb1f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix MoE flatten_tp_size unconditionally including dp_size#36240

[Bugfix] Fix MoE flatten_tp_size unconditionally including dp_size#36240
AjAnubolu wants to merge 4 commits into
vllm-project:mainfrom
AjAnubolu:fix/moe-flatten-tp-36222

AjAnubolu commented Mar 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

aoshen02 commented May 14, 2026

Uh oh!

andakai commented May 14, 2026

Uh oh!

andakai May 14, 2026

Uh oh!

andakai commented May 16, 2026

Uh oh!

AjAnubolu commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

AjAnubolu commented Mar 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

aoshen02 commented May 14, 2026

Uh oh!

andakai commented May 14, 2026

Uh oh!

andakai May 14, 2026

Choose a reason for hiding this comment

Uh oh!

andakai commented May 16, 2026

Uh oh!

AjAnubolu commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants