-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[trainer] feat: Add Nemo-Automodel as alternative training engine #5407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ISEEKYAN
merged 21 commits into
verl-project:main
from
HuiyingLi:add_automodel_sft_backend
Mar 20, 2026
Merged
Changes from 6 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
07554c9
init version with fsdp2
HuiyingLi 11eb1a9
add mp policy config
HuiyingLi 4641e75
add ep and expose more configs
HuiyingLi 697bf68
fix(dataset): call .tolist() before tokenizer.decode() for tiktoken c…
HuiyingLi 41dd4a8
add test
HuiyingLi c33321b
format
HuiyingLi 9a14478
revert some format changes
HuiyingLi 4d7a193
fix eval ctx
HuiyingLi 6b3f061
fix exp name
HuiyingLi 3208fbd
add expert torch_mm backend to config
HuiyingLi a0b51f8
change copyright
HuiyingLi d2eec66
Merge branch 'main' into add_automodel_sft_backend
HuiyingLi ec3b283
upgrade to automodel r0.3.0 with transformers v5.0.0
HuiyingLi c1e8025
add automodel examples scripts
HuiyingLi 6060737
add docs
HuiyingLi 20cd9dc
update optimizer integration
HuiyingLi 1b9c6aa
update example scripts
HuiyingLi db0d6ca
add dependency req to examples
HuiyingLi 48b7315
format
HuiyingLi 5915ef3
add NVIDIA license header to check_license.py
HuiyingLi ff49467
Merge branch 'main' into add_automodel_sft_backend
HuiyingLi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,76 @@ | ||
| # Target class for this configuration | ||
| _target_: verl.workers.config.AutomodelEngineConfig | ||
|
|
||
| # Backend strategy identifier | ||
| strategy: automodel | ||
|
|
||
| # Distributed training strategy: "fsdp2", "megatron_fsdp", or "ddp" | ||
| distributed_strategy: fsdp2 | ||
|
|
||
| # Parallelism sizes | ||
| tp_size: 1 | ||
| pp_size: 1 | ||
| cp_size: 1 | ||
| ep_size: 1 | ||
| dp_replicate_size: 1 | ||
| sequence_parallel: false | ||
| defer_fsdp_grad_sync: true | ||
|
|
||
| # Whether to offload model parameters to CPU | ||
| param_offload: false | ||
|
|
||
| # Whether to offload optimizer state to CPU | ||
| optimizer_offload: false | ||
|
|
||
| # Whether to enable activation checkpointing | ||
| activation_checkpointing: false | ||
|
|
||
| # Whether to enable FP8 training | ||
| enable_fp8: false | ||
|
|
||
| # Whether to enable torch.compile for the model | ||
| enable_compile: false | ||
|
|
||
| # Model data type for loading weights ("fp32", "bf16", "fp16") | ||
| model_dtype: fp32 | ||
|
|
||
| # Attention implementation ("sdpa", "flash_attention_2", "eager", "te") | ||
| attn_implementation: sdpa | ||
|
|
||
| # Backend settings (nemo_automodel BackendConfig) | ||
| use_te_backend: false | ||
| rope_fusion: true | ||
| gate_precision: null | ||
| enable_hf_state_dict_adapter: true | ||
| enable_fsdp_optimizations: false | ||
|
|
||
| # MoE / Expert Parallelism settings | ||
| enable_deepep: false | ||
| reshard_after_forward: false | ||
| fake_balanced_gate: false | ||
| ignore_router_for_ac: false | ||
| lm_head_precision: null | ||
| wrap_outer_model: true | ||
|
|
||
| # Mixed precision policy (FSDP2 MixedPrecisionPolicy) | ||
| mp_param_dtype: bf16 | ||
| mp_reduce_dtype: fp32 | ||
| mp_output_dtype: bf16 | ||
|
|
||
| # Random seed for reproducibility | ||
| seed: 42 | ||
|
|
||
| # Whether to enable full determinism for distributed training, only for debugging | ||
| full_determinism: false | ||
|
|
||
| # Whether to use forward only mode | ||
| forward_only: false | ||
|
|
||
| # Whether to use torch compile for entropy computation | ||
| use_torch_compile: false | ||
|
|
||
| # Whether to use chunked entropy computation | ||
| entropy_from_logits_with_chunking: false | ||
|
|
||
| # Whether to use checkpointing for entropy computation | ||
| entropy_checkpointing: false |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| # Target class for this configuration | ||
| _target_: verl.workers.config.AutomodelOptimizerConfig | ||
|
|
||
| optimizer: AdamW | ||
|
|
||
| # Module path to import optimizer from | ||
| optimizer_impl: torch.optim | ||
|
|
||
| # Learning rate (maps to max_lr in Automodel's OptimizerParamScheduler) | ||
| lr: 1e-5 | ||
|
|
||
| # LR warmup steps ratio (used when lr_warmup_steps <= 0) | ||
| lr_warmup_steps_ratio: 0.0 | ||
|
|
||
| # Total training steps (injected at runtime) | ||
| total_training_steps: -1 | ||
|
|
||
| # Weight decay | ||
| weight_decay: 0.01 | ||
|
|
||
| # LR warmup steps (set > 0 to override lr_warmup_steps_ratio) | ||
| lr_warmup_steps: -1 | ||
|
|
||
| # Betas for Adam optimizer | ||
| betas: [0.9, 0.999] | ||
|
|
||
| # Clip gradient norm | ||
| clip_grad: 1.0 | ||
|
|
||
| # Initial LR ratio for warmup start (init_lr = lr * init_lr_ratio) | ||
| init_lr_ratio: 0.1 | ||
|
|
||
| # Minimum LR ratio after decay (min_lr = lr * min_lr_ratio) | ||
| min_lr_ratio: 0.01 | ||
|
|
||
| # LR scheduler type (Automodel OptimizerParamScheduler decay style) | ||
| # Options: "constant", "cosine", "linear", "inverse-square-root" | ||
| lr_scheduler_type: cosine | ||
|
|
||
| # Weight decay increment style: "constant", "linear", or "cosine" | ||
| wd_incr_style: constant | ||
|
|
||
| # Kept for backward compatibility (unused by Automodel scheduler) | ||
| num_cycles: 0.5 | ||
| zero_indexed_step: true | ||
|
|
||
| override_optimizer_config: {} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| # Copyright 2025 Bytedance Ltd. and/or its affiliates | ||
|
HuiyingLi marked this conversation as resolved.
Outdated
|
||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from .transformer_impl import AutomodelEngine, AutomodelEngineWithLMHead | ||
|
|
||
| __all__ = [ | ||
| "AutomodelEngine", | ||
| "AutomodelEngineWithLMHead", | ||
| ] | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.