[model] fix: Align Qwen VLM Ulysses + fused-kernel paths and harden multimodal position-id / vision-embed handling by jamindy · Pull Request #5948 · verl-project/verl

jamindy · 2026-04-09T13:38:42Z

What does this PR do?

This PR fixes multiple multimodal runtime issues across qwen2_vl, qwen3_vl, and qwen3_5 under Ulysses sequence parallelism + fused kernels.

It includes:

fixing Qwen3-VL vision position-embedding behavior under FSDP2 + multimodal execution
refactoring shared bilinear vision position-embedding interpolation logic for Qwen3.5 and Qwen3-VL into a common helper
fixing Ulysses SP + fused-kernel label alignment errors in VLM forward paths
removing hard-coded rope_dim=4 assumptions and repairing broken nested position_ids layout handling

Previously observed errors:

aten.index_select.default got mixed torch.Tensor and DTensor
Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda:0
The size of tensor a (xxx) must match the size of tensor b (4) at non-singleton dimension 2
qwen3_vl / qwen3_5 paths previously failed when SP > 1 in specific fused-kernel multimodal flows

Root cause:
The monkey-patched Qwen VLM interpolation and SP/fused-kernel paths were not fully consistent with:

FSDP2 sharded embedding weights (DTensor)
device/offload behavior
and local-label alignment requirements after Ulysses slicing
This led to tensor type/device mismatches and hidden/label shape mismatches.

Related issues

Validation

On the latest verl with Transformers 5.5.0, we validated:

qwen2_vl
qwen3_vl
qwen3_5

All tests passed with Ulysses sequence parallelism and fused kernels enabled.

Test script (example)

python3 -m verl.trainer.main_ppo \
  algorithm.adv_estimator=grpo \
  data.train_files=$train_files \
  data.val_files=$test_files \
  data.train_batch_size=4 \
  data.max_prompt_length=1024 \
  data.max_response_length=2048 \
  data.image_key=images \
  data.truncation=error \
  actor_rollout_ref.model.path=/models/Qwen3-VL-4B-Instruct \
  actor_rollout_ref.model.use_remove_padding=True \
  actor_rollout_ref.model.use_fused_kernels=True \
  actor_rollout_ref.model.fused_kernel_options.impl_backend=torch or triton \
  actor_rollout_ref.actor.strategy=fsdp2 \
  actor_rollout_ref.actor.fsdp_config.model_dtype=bfloat16 \
  actor_rollout_ref.actor.fsdp_config.param_offload=True \
  actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \
  actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \
  actor_rollout_ref.rollout.name=vllm \
  trainer.n_gpus_per_node=4 \
  trainer.nnodes=1 \
  trainer.total_epochs=1

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main.

fixes multimodal runtime issues for `qwen2_vl`, `qwen3_vl`, and `qwen3_5` under `FSDP2 + Ulysses SP + fused kernels`. Key fixes: - Align fused-kernel label handling after Ulysses slicing (avoid hidden/label mismatch). - Fix Qwen3-VL vision position-embedding path for `DTensor`/offload cases. - Refactor shared bilinear vision pos-embed interpolation into a common helper. - Remove hard-coded `rope_dim=4` assumptions and fix broken nested `position_ids` layout recovery. Please enter the commit message for your changes. Lines starting

CLAassistant · 2026-04-09T13:38:54Z

All committers have signed the CLA.

gemini-code-assist

Code Review

This pull request introduces significant updates to support Qwen3-VL models within the verl framework. Key changes include the implementation of a fast bilinear interpolation utility for vision position embeddings, monkey patching for device-safe position embedding lookups, and improved handling of 3D position IDs and sequence parallelism for vision-language models. Additionally, the PR refines loss calculation logic by removing unnecessary label rolling and adds robust input handling for multimodal sequence parallelism. I have no feedback to provide as all changes appear to be well-structured and address the stated requirements.

longboat2010 · 2026-04-14T06:33:24Z

@jamindy Why was this PR closed? How to solve this issues?

- **Qwen3.5 vision patch alignment** - Aligns Qwen3.5 `fast_pos_embed_interpolate` patch behavior with Qwen3-VL - Adds DTensor/sharded-weight-safe embedding lookup (`full_tensor()` + device-local `F.embedding`) - Switches monkey patch wiring to a dedicated, idempotent patch entrypoint - **3D nested `position_ids` robustness** - Improves `maybe_fix_3d_position_ids` repair logic for broken jagged 3D layouts after TensorDict serialize/deserialize flows - Uses `input_ids` offsets as repair targets when valid; otherwise keeps safe fallback behavior - **Tests** - Adds focused unit coverage in `tests/utils/test_padding_on_cpu.py` for: - successful repair - invalid-offset skip paths - warning path for invalid target offsets - empty-batch behavior

jamindy requested review from FightingZhen, PeterSH6, eric-haibin-lin, ji-huazhong, tardis-key and vermouth1992 as code owners April 9, 2026 13:38

gemini-code-assist bot reviewed Apr 9, 2026

View reviewed changes

Kirrito-k423 mentioned this pull request Apr 10, 2026

Qwen3.5 FSDP SFT Bug #5944

Open

4 tasks

Merge branch 'main' into fix_qwen3_vl_vision_embedding

9f6a7ec

jamindy closed this Apr 14, 2026

jamindy deleted the fix_qwen3_vl_vision_embedding branch April 14, 2026 03:50

jamindy restored the fix_qwen3_vl_vision_embedding branch April 14, 2026 09:38

jamindy reopened this Apr 14, 2026

jamindy requested a review from wucong25 as a code owner April 14, 2026 09:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model] fix: Align Qwen VLM Ulysses + fused-kernel paths and harden multimodal position-id / vision-embed handling#5948

[model] fix: Align Qwen VLM Ulysses + fused-kernel paths and harden multimodal position-id / vision-embed handling#5948
jamindy wants to merge 3 commits intoverl-project:mainfrom
jamindy:fix_qwen3_vl_vision_embedding

jamindy commented Apr 9, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Apr 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

longboat2010 commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jamindy commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Related issues

Validation

Test script (example)

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

CLAassistant commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

longboat2010 commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jamindy commented Apr 9, 2026 •

edited

Loading

CLAassistant commented Apr 9, 2026 •

edited

Loading