Skip to content

[doc] refactor: add constraints on the use of vpp and mbridge parameters#5763

Merged
wucong25 merged 2 commits intoverl-project:mainfrom
zjchenn:doc/vpp
Mar 26, 2026
Merged

[doc] refactor: add constraints on the use of vpp and mbridge parameters#5763
wucong25 merged 2 commits intoverl-project:mainfrom
zjchenn:doc/vpp

Conversation

@zjchenn
Copy link
Copy Markdown
Contributor

@zjchenn zjchenn commented Mar 26, 2026

What does this PR do?

This PR updates the Ascend backend documentation to clarify a current compatibility constraint: mbridge does not support VPP (virtual_pipeline_model_parallel_size) in our current stack.

Add a note in docs/ascend_tutorial/features/ascend_backend_features.md (line 276) that VPP should be used only when mbridge is disabled.
Clarify that because verl now enables mbridge by default, users who enable VPP must explicitly set actor_rollout_ref.actor.megatron.use_mbridge=False.
This avoids confusing runtime failures in Megatron actor log-prob computation when using VPP with mbridge.

#4528 (use_mbridge default enabled)

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Signed-off-by: zjchenn <zjchenn@gmail.com>
@zjchenn zjchenn requested a review from FightingZhen as a code owner March 26, 2026 11:04
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the ascend_backend_features.md documentation to include a note about the incompatibility between mbridge and VPP (virtual_pipeline_model_parallel_size) and provides instructions for their configuration. The feedback suggests improving the clarity and accuracy of this note by using full parameter paths instead of abbreviations to prevent configuration errors.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@zjchenn zjchenn changed the title [doc, megatron] add constraints on the use of vpp and mbridge parameters [doc] add constraints on the use of vpp and mbridge parameters Mar 26, 2026
@zjchenn zjchenn changed the title [doc] add constraints on the use of vpp and mbridge parameters [doc] refactor: add constraints on the use of vpp and mbridge parameters Mar 26, 2026
@wucong25 wucong25 merged commit 6aafaec into verl-project:main Mar 26, 2026
4 of 6 checks passed
sijyang pushed a commit to sijyang/verl that referenced this pull request Apr 1, 2026
…ers (verl-project#5763)

### What does this PR do?

This PR updates the Ascend backend documentation to clarify a current
compatibility constraint: mbridge does not support VPP
(virtual_pipeline_model_parallel_size) in our current stack.

Add a note in docs/ascend_tutorial/features/ascend_backend_features.md
(line 276) that VPP should be used only when mbridge is disabled.
Clarify that because verl now enables mbridge by default, users who
enable VPP must explicitly set
actor_rollout_ref.actor.megatron.use_mbridge=False.
This avoids confusing runtime failures in Megatron actor log-prob
computation when using VPP with mbridge.

verl-project#4528 (use_mbridge default enabled)

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `veomni`, `sglang`, `vllm`,
`rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`,
`deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`,
`model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`,
`fully_async`, `one_step_off`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
- [ ] If your PR is related to the `recipe` submodule, please also
update the reference to the submodule commit via `git submodule update
--remote` or `cd recipe && git pull origin main`.

---------

Signed-off-by: zjchenn <zjchenn@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
ZouKexin-522 pushed a commit to ZouKexin-522/verl that referenced this pull request Apr 8, 2026
…ers (verl-project#5763)

### What does this PR do?

This PR updates the Ascend backend documentation to clarify a current
compatibility constraint: mbridge does not support VPP
(virtual_pipeline_model_parallel_size) in our current stack.

Add a note in docs/ascend_tutorial/features/ascend_backend_features.md
(line 276) that VPP should be used only when mbridge is disabled.
Clarify that because verl now enables mbridge by default, users who
enable VPP must explicitly set
actor_rollout_ref.actor.megatron.use_mbridge=False.
This avoids confusing runtime failures in Megatron actor log-prob
computation when using VPP with mbridge.

verl-project#4528 (use_mbridge default enabled)

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `veomni`, `sglang`, `vllm`,
`rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`,
`deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`,
`model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`,
`fully_async`, `one_step_off`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
- [ ] If your PR is related to the `recipe` submodule, please also
update the reference to the submodule commit via `git submodule update
--remote` or `cd recipe && git pull origin main`.

---------

Signed-off-by: zjchenn <zjchenn@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants