[Algorithms] Add Clip-Cov and KL-Cov loss functions#251
[Algorithms] Add Clip-Cov and KL-Cov loss functions#251SumanthRH merged 8 commits intoNovaSky-AI:mainfrom
Conversation
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces the Clip-Cov and KL-Cov loss functions, which is a valuable addition based on recent research. The implementation is well-structured and includes corresponding documentation, examples, and tests. My review has identified a couple of high-severity issues in skyrl_train/utils/ppo_utils.py related to performance and type correctness that should be addressed. Additionally, there are several minor suggestions to improve documentation and code style. Overall, this is a great contribution.
skyrl-train/examples/algorithms/clip_cov_kl_cov/run_clip_cov.sh
Outdated
Show resolved
Hide resolved
erictang000
left a comment
There was a problem hiding this comment.
LGTM ty! just few tiny nits
| data.train_data="['$DATA_DIR/train.parquet']" \ | ||
| data.val_data="['$DATA_DIR/validation.parquet']" \ | ||
| trainer.algorithm.policy_loss_type="$POLICY_LOSS" \ | ||
| trainer.algorithm.clip_cov.clip_ratio=0.0002 \ |
There was a problem hiding this comment.
tiny nit, but maybe move these flags with placeholder under the "# Configure Clip-Cov parameters" comment?
| data.train_data="['$DATA_DIR/train.parquet']" \ | ||
| data.val_data="['$DATA_DIR/validation.parquet']" \ | ||
| trainer.algorithm.policy_loss_type="$POLICY_LOSS" \ | ||
| trainer.algorithm.kl_cov.kl_cov_frac=0.2 \ |
| # dual clip parameters | ||
| clip_ratio_c: 3.0 | ||
|
|
||
| # clip-cov parameters (only used when policy_loss_type: "clip_cov") |
There was a problem hiding this comment.
nit: can you update the example config in the docs (above this) to include clip_cov and kl_cov also
like this guy:
policy_loss_type: "regular" # "regular", "dual_clip", "gspo", or customizable with PolicyLossRegistry
|
can you also make sure training runs look normal for a few steps with these guys |
# What does this PR do? Adds Clip-Cov and KL-Cov loss functions based on https://arxiv.org/pdf/2505.22617 --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

What does this PR do?
Adds Clip-Cov and KL-Cov loss functions based on https://arxiv.org/pdf/2505.22617
TODO: