Skip to content

[veomni, trainer] feat: add rl support for veomni backend#4882

Merged
wuxibin89 merged 6 commits intoverl-project:mainfrom
ji-huazhong:veomni
Feb 10, 2026
Merged

[veomni, trainer] feat: add rl support for veomni backend#4882
wuxibin89 merged 6 commits intoverl-project:mainfrom
ji-huazhong:veomni

Conversation

@ji-huazhong
Copy link
Copy Markdown
Collaborator

@ji-huazhong ji-huazhong commented Jan 12, 2026

What does this PR do?

Add rl support for veomni backend.

Collaborated with @A1waysBeenHere

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the veomni backend for RL training. It introduces new configuration files for veomni actor, critic, reference policy, and reward models, along with a main PPO trainer configuration. The changes also update the worker selection logic to handle the veomni strategy.

My review has identified a few critical issues in the new configuration files that would lead to runtime errors. Specifically, a misconfigured legacy worker flag, a non-portable model path, and a placeholder value that needs to be replaced. There is also a minor redundancy in one of the configuration files. Addressing these points will improve the correctness and usability of the new veomni backend configuration.

Comment thread verl/trainer/config/ppo_veomni_trainer.yaml Outdated
Comment thread verl/trainer/config/reward_model/veomni_reward_loop.yaml Outdated
Comment thread verl/trainer/config/reward_model/veomni_reward_loop.yaml Outdated
Comment thread verl/trainer/config/reward_model/veomni_reward_model.yaml Outdated
@wuxibin89 wuxibin89 mentioned this pull request Jan 12, 2026
28 tasks
@ji-huazhong ji-huazhong changed the title add rl support for veomni backend [WIP] add rl support for veomni backend Jan 12, 2026
@ji-huazhong ji-huazhong force-pushed the veomni branch 5 times, most recently from a7ad8c1 to f9558aa Compare January 17, 2026 06:37
@ji-huazhong ji-huazhong force-pushed the veomni branch 2 times, most recently from 74d6d7c to 27c4a88 Compare January 20, 2026 11:13
@ji-huazhong ji-huazhong changed the title [WIP] add rl support for veomni backend [veomni, trainer] feat: add rl support for veomni backend Jan 20, 2026
@ji-huazhong ji-huazhong force-pushed the veomni branch 2 times, most recently from 8ee8b9f to 719ed8d Compare January 26, 2026 01:23
veomni:

# Target configuration dataclass
_target_: verl.workers.config.VeOmniEngineConfig
Copy link
Copy Markdown
Collaborator

@wuxibin89 wuxibin89 Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not going to support reward model with model engine, reward models include GRM/DistRM should use rollout engine(vllm/sglang/trt-llm). cc @yyDing1

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here’s the relevant PR: #5194. I’ll delete the reward model code later today. Cheers for pointing this out!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 22ee2f4. cc @wuxibin89

@A1waysBeenHere
Copy link
Copy Markdown
Contributor

A1waysBeenHere commented Jan 31, 2026

We conducted comparative experiments on Qwen3-8B using the GRPO. The results show that the reward trends of the FSDPEngine and the VeOmniEngine are consistent.

yWpj3Jc7nz

cc @ji-huazhong

@ji-huazhong ji-huazhong marked this pull request as ready for review February 4, 2026 05:59
@ji-huazhong ji-huazhong force-pushed the veomni branch 2 times, most recently from 642f4c8 to 22ee2f4 Compare February 6, 2026 15:22
@ji-huazhong
Copy link
Copy Markdown
Collaborator Author

The failed test case is unrelated to this PR.

@ji-huazhong ji-huazhong requested a review from wuxibin89 February 7, 2026 01:40
@ji-huazhong
Copy link
Copy Markdown
Collaborator Author

ji-huazhong commented Feb 8, 2026

Due to current resource constraints, I tested the following script on an A3:

set -x
ENGINE=${1:-vllm}

export MULTI_STREAM_MEMORY_REUSE=2
export VLLM_USE_V1=1
export TASK_QUEUE_ENABLE=1
export CPU_AFFINITY_CONF=1


python3 -m verl.trainer.main_ppo --config-path=config\
    --config-name="ppo_veomni_trainer.yaml" \
    algorithm.adv_estimator=grpo \
    data.train_files=/home/data/geo3k/train.parquet \
    data.val_files=/home/data/geo3k/test.parquet \
    data.train_batch_size=256 \
    data.max_prompt_length=2048 \
    data.max_response_length=2048 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    data.image_key=images \
    actor_rollout_ref.model.path=/home/model/Qwen3-VL-30B-A3B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.actor.optim.optimizer="anyprecision_adamw" \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=128 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=2 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.01 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.veomni.param_offload=True \
    actor_rollout_ref.actor.veomni.optimizer_offload=True \
    actor_rollout_ref.actor.veomni.data_parallel_size=16 \
    actor_rollout_ref.actor.veomni.expert_parallel_size=16 \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=4 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=8 \
    actor_rollout_ref.rollout.name=$ENGINE \
    +actor_rollout_ref.rollout.engine_kwargs.vllm.disable_mm_preprocessor_cache=True \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.7 \
    actor_rollout_ref.rollout.enable_chunked_prefill=False \
    actor_rollout_ref.rollout.enforce_eager=False \
    actor_rollout_ref.rollout.free_cache_engine=True \
    actor_rollout_ref.rollout.n=5 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
    actor_rollout_ref.ref.veomni.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.val_before_train=True \
    trainer.use_legacy_worker_impl=disable \
    trainer.logger='["console","wandb"]' \
    trainer.project_name='verl_grpo_example_geo3k' \
    trainer.experiment_name='qwen3_vl_30b_function_rm' \
    trainer.n_gpus_per_node=16 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=-1 \
    trainer.total_epochs=15 $@

and the partial results obtained are as follows:

image image

cc @wuxibin89

defaults:
# <folder_name>@<field_name>.<field_name>: <yaml_file_name>
# actor_rollout_ref.actor: trainer/config/actor/veomni_actor.yaml
- actor@actor_rollout_ref.actor: veomni_actor
Copy link
Copy Markdown
Collaborator

@wuxibin89 wuxibin89 Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to unify ppo_veomni_trainer.yaml with ppo_trainer.yaml? Eventually, we will unify all trainer(fsdp/megatron/veomni/torchtitan) into one config file.

@ji-huazhong ji-huazhong force-pushed the veomni branch 3 times, most recently from 126202b to e058d04 Compare February 10, 2026 06:30
@ji-huazhong ji-huazhong force-pushed the veomni branch 2 times, most recently from 1a55eee to 66db068 Compare February 10, 2026 07:14
@wuxibin89 wuxibin89 merged commit 2487c36 into verl-project:main Feb 10, 2026
130 of 193 checks passed
@ji-huazhong ji-huazhong deleted the veomni branch February 10, 2026 11:52
Tjh-UKN pushed a commit to Tjh-UKN/verl that referenced this pull request Feb 13, 2026
…ct#4882)

### What does this PR do?

Add rl support for veomni backend.

Collaborated with @A1waysBeenHere

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
- [ ] If your PR is related to the `recipe` submodule, please also
update the reference to the submodule commit via `git submodule update
--remote` or `cd recipe && git pull origin main`.

---------

Co-authored-by: A1waysBeenHere <moyicong1999@163.com>
Superjomn pushed a commit to Superjomn/verl that referenced this pull request Mar 2, 2026
…ct#4882)

### What does this PR do?

Add rl support for veomni backend.

Collaborated with @A1waysBeenHere

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
- [ ] If your PR is related to the `recipe` submodule, please also
update the reference to the submodule commit via `git submodule update
--remote` or `cd recipe && git pull origin main`.

---------

Co-authored-by: A1waysBeenHere <moyicong1999@163.com>
@ji-huazhong
Copy link
Copy Markdown
Collaborator Author

set -x
ENGINE=${1:-vllm}

export WANDB_MODE=offline
export MULTI_STREAM_MEMORY_REUSE=2
export VLLM_USE_V1=1
export TASK_QUEUE_ENABLE=1
export CPU_AFFINITY_CONF=1


python3 -m verl.trainer.main_ppo \
    model_engine=veomni \
    algorithm.adv_estimator=grpo \
    data.train_files=/home/lynn/data/geo3k/train.parquet \
    data.val_files=/home/lynn/data/geo3k/test.parquet \
    data.train_batch_size=256 \
    data.max_prompt_length=2048 \
    data.max_response_length=2048 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    data.image_key=images \
    actor_rollout_ref.model.path=/home/lynn/models/Qwen3-VL-30B-A3B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.actor.optim.optimizer="anyprecision_adamw" \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=128 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=2 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.01 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.veomni.param_offload=True \
    actor_rollout_ref.actor.veomni.optimizer_offload=True \
    actor_rollout_ref.actor.veomni.fsdp_size=16 \
    actor_rollout_ref.actor.veomni.expert_parallel_size=16 \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=4 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=8 \
    actor_rollout_ref.rollout.name=$ENGINE \
    +actor_rollout_ref.rollout.engine_kwargs.vllm.disable_mm_preprocessor_cache=True \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.7 \
    actor_rollout_ref.rollout.enable_chunked_prefill=False \
    actor_rollout_ref.rollout.enforce_eager=False \
    actor_rollout_ref.rollout.free_cache_engine=True \
    actor_rollout_ref.rollout.n=5 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
    actor_rollout_ref.ref.veomni.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.val_before_train=True \
    trainer.use_legacy_worker_impl=disable \
    trainer.logger='["console","wandb"]' \
    trainer.project_name='verl_grpo_example_geo3k' \
    trainer.experiment_name='qwen3_vl_30b_function_rm_baseline' \
    trainer.n_gpus_per_node=16 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=-1 \
    trainer.total_epochs=15 $@

sijyang pushed a commit to sijyang/verl that referenced this pull request Apr 1, 2026
…ct#4882)

### What does this PR do?

Add rl support for veomni backend.

Collaborated with @A1waysBeenHere

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
- [ ] If your PR is related to the `recipe` submodule, please also
update the reference to the submodule commit via `git submodule update
--remote` or `cd recipe && git pull origin main`.

---------

Co-authored-by: A1waysBeenHere <moyicong1999@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants