VeRL-Omni is a general RL training framework focused on multimodal generative models, built on top of verl.
It originated from the multi-modal generation RL effort in verl, and now has a dedicated home so it can evolve in a more focused way.
Multimodal generative RL training differs from text-only LLM RL not only in model structure, but also in I/O patterns, compute characteristics, and runtime bottlenecks. As this space grows, it deserves a dedicated training repository that can evolve quickly around its own constraints.
VeRL-Omni targets RL post-training for three families of generative models:
- Diffusion generative models for image, video, and audio — e.g., Qwen-Image, Wan2.2.
- Unified multimodal understanding + generation models — e.g., BAGEL, HunyuanImage-3.0.
- Omni-modality models that jointly handle text, image, audio, and video — e.g., Qwen3-Omni.
- Specialized rollout via
vLLM-Omnifor high-throughput diffusion and multimodal generation. - Flexible reward pipelines spanning rule-based rewards, model-based rewards, and multimodal reward computation.
- Modular training backends that plug into existing parallelism (FSDP, USP) and other optimizations rather than rebuilding the stack from scratch.
- End-to-end examples and benchmarks validating co-located sync and fully-async RL on the model families above.
- High training throughput — on our reference Qwen-Image FlowGRPO setup,
VeRL-Omniachieves ~25% higher end-to-end throughput than the diffusers-basedflow_grpoimplementation, driven byvLLM-Omnirollout, FSDP training, and overlapped reward computation (asynchronous).
Visit our documentation to learn more.
| Model | Category | Modality | Algorithm | Status |
|---|---|---|---|---|
| Qwen-Image | Diffusion generator | Text → Image | FlowGRPO | ✅ |
| MixGRPO | ✅ | |||
| GRPO-Guard | ✅ | |||
| Wan2.2 | Diffusion generator | Text → Video | DanceGRPO | WIP |
| BAGEL | Unified understand + gen | Text + Image | FlowGRPO | WIP |
| HunyuanImage-3.0 | Unified understand + gen | Text + Image | MixGRPO | Planned |
| SRPO | Planned | |||
| Qwen3-Omni-Thinker | Omni-modality | Text / Image / Video / Audio | GSPO | WIP |
| SD3.5 | Diffusion generator | Text → Image | DPO | WIP |
VeRL-Omni now supports Ascend NPU. For instructions on how to install and get started with FlowGRPO training on Ascend NPU, please refer to our Ascend NPU Quickstart Guide.
Future work is tracked here:
Contributions are welcome.
See the contribution guide.
verl-omni builds on the engineering foundations developed in verl and is closely aligned with multimodal inference systems such as vLLM-Omni.
If you find the project helpful, please cite:
@misc{verlomni_github,
title = {{VeRL-Omni: Easy, Fast, and Stable RL Training for Diffusion and Omni-Modality Models}},
author = {Yongxiang Huang and Cheung Kawai and Jingan Zhou and Yingshu Chen and {openYuanrong Team} and Xibin Wu},
year = {2026},
howpublished = {\url{https://github.com/verl-project/verl-omni}},
urldate = {2026-04-28}
}