You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[algo] fix: Add seq mean mask denominator option (verl-project#4510)
## Summary
Refactor `agg_loss` function and fix entropy/KL loss scaling in
distributed training.
**Changes:**
- **Refactor**: Unify `seq-mean-*` modes with shared denominator logic
using `masked_sum`
- **Behavior change**: `seq-mean-token-sum-norm` now applies seq-mean
division (denominator = `global_batch_size * dp_size` or `local_bsz`),
matching the mode name
- **Simplification**: Remove fully-masked sequence exclusion from
denominator; use total batch size consistently
NOTE: Since the global loss aggregation logic is not compatible with the
legacy model engine that conduct the aggregation outside `agg_loss` and
is going to be deprecated, we keep this PR from modifying the the legacy
model engine.
⚠️ **Breaking**: `seq-mean-token-sum-norm` now divides by both
`loss_scale_factor` AND `seq_denominator`. Previously only divided by
`loss_scale_factor`.
## Test plan
- [ ] Verify PPO training with `seq-mean-token-sum` mode
- [ ] Verify PPO training with `seq-mean-token-mean` mode
- [ ] Verify PPO training with `seq-mean-token-sum-norm` mode (note:
behavior changed)
- [ ] Confirm entropy/KL loss values are correctly scaled in multi-GPU
training
---------
Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com>
0 commit comments