[ROCm][Bugfix] Fix chunk alignment when using context parallelism with TRITON_MLA by micah-wil · Pull Request #46114 · vllm-project/vllm

micah-wil · 2026-06-18T23:39:16Z

There is a bug in mla_attention when using context parallelism on ROCm. max_context_chunk is being aligned properly on CUDA because of the self.aot_schedule path (which is CUDA-only). The chunk misalignment causes zero accuracy in test_context_parallel.py on ROCm with dcp_size=4 using TRITON_MLA.

tests/distributed/test_context_parallel.py::test_cp_generation[deepseek-ai/DeepSeek-V2-Lite-Chat-parallel_setup0-mp-auto-test_options0]

>               raise RuntimeError(
                    f"Test subprocess '{f.__name__}' failed "
                    f"({_format_subprocess_exit(result.returncode)}):\n{tb}"
                )
E               RuntimeError: Test subprocess 'test_cp_generation' failed (exit code 1):
E               Traceback (most recent call last):
E                 File "<string>", line 12, in <module>
E                 File "/projects/vllm/tests/utils.py", line 1739, in wrapper
E                   return f(*args, **kwargs)
E                          ^^^^^^^^^^^^^^^^^^
E                 File "/projects/vllm/tests/distributed/test_context_parallel.py", line 296, in test_cp_generation
E                   _test_cp_gsm8k(
E                 File "/projects/vllm/tests/distributed/test_context_parallel.py", line 255, in _test_cp_gsm8k
E                   assert accuracy >= min_accuracy, (
E                          ^^^^^^^^^^^^^^^^^^^^^^^^
E               AssertionError: TP+DCP accuracy too low: 0.000 < 0.500

tests/utils.py:1795: RuntimeError

This PR resolves the issue by shrinking max_context_chunk to the nearest size that divides evenly across GPUs and lands on a clean cache-block boundary. With this, the above test case is passing. I fixed the corresponding test to not use CUDA-only attention backends as well, and I set the baseline accuracy now that the test successfully runs on ROCm. It was passing but flaky with MIN_ACCURACY=0.64, and after 10 tries the lowest accuracy I saw was about 0.51.

Signed-off-by: Micah Williamson <micah.williamson@amd.com>

micah-wil added 2 commits June 18, 2026 22:09

fix dcp zero accuracy

7b989a7

Signed-off-by: Micah Williamson <micah.williamson@amd.com>

add comment

bc14e58

Signed-off-by: Micah Williamson <micah.williamson@amd.com>

micah-wil requested review from LucasWilkinson and MatthewBonanni as code owners June 18, 2026 23:39

mergify Bot added rocm Related to AMD ROCm bug Something isn't working labels Jun 18, 2026

github-project-automation Bot moved this to Todo in AMD Jun 18, 2026

github-project-automation Bot added this to AMD Jun 18, 2026

micah-wil changed the title ~~[ROCm][Bugfix]~~ [ROCm][Bugfix] Fix chunk alignment when using context parallelism with TRITON_ATTN Jun 19, 2026

micah-wil changed the title ~~[ROCm][Bugfix] Fix chunk alignment when using context parallelism with TRITON_ATTN~~ [ROCm][Bugfix] Fix chunk alignment when using context parallelism with TRITON_MLA Jun 19, 2026

micah-wil added 2 commits June 19, 2026 17:48

Merge branch 'main' into micah/fix-mla-attn

043ed7d

use lm_eval for more stable result

9118870

Signed-off-by: Micah Williamson <micah.williamson@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm][Bugfix] Fix chunk alignment when using context parallelism with TRITON_MLA#46114

[ROCm][Bugfix] Fix chunk alignment when using context parallelism with TRITON_MLA#46114
micah-wil wants to merge 4 commits into
vllm-project:mainfrom
micah-wil:micah/fix-mla-attn

micah-wil commented Jun 18, 2026 •

edited by github-actions Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

micah-wil commented Jun 18, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

micah-wil commented Jun 18, 2026 •

edited by github-actions Bot

Loading