Skip to content

[sglang] feat: add NPU GRPO training scripts for Qwen2.5-32B (FSDP/SGLang backends)#5062

Merged
tardis-key merged 5 commits intoverl-project:mainfrom
xiazhahe:feat/sglang
Jan 29, 2026
Merged

[sglang] feat: add NPU GRPO training scripts for Qwen2.5-32B (FSDP/SGLang backends)#5062
tardis-key merged 5 commits intoverl-project:mainfrom
xiazhahe:feat/sglang

Conversation

@xiazhahe
Copy link
Copy Markdown
Contributor

@xiazhahe xiazhahe commented Jan 27, 2026

What does this PR do?

add NPU GRPO training scripts for Qwen2.5-32B (FSDP/SGLang backends). The reward curves of this scenario are also shown.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

64b907c5ae7342249588ee2f42a461b0 6a1371943e3847e4b0435c64fd6866da 9cf5a7b8f2624822a53ba6b3d6df775b

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Comment thread examples/grpo_trainer/run_qwen2-32b_sglang_fsdp_npu.sh Outdated
@tardis-key tardis-key merged commit b178a3c into verl-project:main Jan 29, 2026
RobotGF pushed a commit to RobotGF/verl that referenced this pull request Jan 30, 2026
…Lang backends) (verl-project#5062)

### What does this PR do?
add NPU GRPO training scripts for Qwen2.5-32B (FSDP/SGLang backends).
The reward curves of this scenario are also shown.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `veomni`, `sglang`, `vllm`,
`rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`,
`deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`,
`model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

<img width="1672" height="965" alt="64b907c5ae7342249588ee2f42a461b0"
src="https://github.com/user-attachments/assets/3cf7379e-31dc-4113-8398-ad0381744468"
/>

<img width="1668" height="962" alt="6a1371943e3847e4b0435c64fd6866da"
src="https://github.com/user-attachments/assets/5d2bf9ad-8729-4e1e-9e11-0cf3b46fd47e"
/>

<img width="1667" height="958" alt="9cf5a7b8f2624822a53ba6b3d6df775b"
src="https://github.com/user-attachments/assets/fa285df8-5d7d-4737-b5f6-64f0ee66a8e7"
/>


### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
- [ ] If your PR is related to the `recipe` submodule, please also
update the reference to the submodule commit via `git submodule update
--remote` or `cd recipe && git pull origin main`.
@xiazhahe xiazhahe deleted the feat/sglang branch April 14, 2026 11:27
@xiazhahe xiazhahe restored the feat/sglang branch April 14, 2026 11:27
DaizeDong pushed a commit to DaizeDong/verl that referenced this pull request Apr 19, 2026
…Lang backends) (verl-project#5062)

### What does this PR do?
add NPU GRPO training scripts for Qwen2.5-32B (FSDP/SGLang backends).
The reward curves of this scenario are also shown.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `veomni`, `sglang`, `vllm`,
`rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`,
`deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`,
`model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

<img width="1672" height="965" alt="64b907c5ae7342249588ee2f42a461b0"
src="https://github.com/user-attachments/assets/3cf7379e-31dc-4113-8398-ad0381744468"
/>

<img width="1668" height="962" alt="6a1371943e3847e4b0435c64fd6866da"
src="https://github.com/user-attachments/assets/5d2bf9ad-8729-4e1e-9e11-0cf3b46fd47e"
/>

<img width="1667" height="958" alt="9cf5a7b8f2624822a53ba6b3d6df775b"
src="https://github.com/user-attachments/assets/fa285df8-5d7d-4737-b5f6-64f0ee66a8e7"
/>


### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
- [ ] If your PR is related to the `recipe` submodule, please also
update the reference to the submodule commit via `git submodule update
--remote` or `cd recipe && git pull origin main`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants