Skip to content

[Algorithms] Add Clip-Cov and KL-Cov loss functions#251

Merged
SumanthRH merged 8 commits intoNovaSky-AI:mainfrom
SumanthRH:add-clip-cov
Sep 8, 2025
Merged

[Algorithms] Add Clip-Cov and KL-Cov loss functions#251
SumanthRH merged 8 commits intoNovaSky-AI:mainfrom
SumanthRH:add-clip-cov

Conversation

@SumanthRH
Copy link
Member

@SumanthRH SumanthRH commented Sep 8, 2025

What does this PR do?

Adds Clip-Cov and KL-Cov loss functions based on https://arxiv.org/pdf/2505.22617

TODO:

  • Add CPU test
  • Cleanup

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
@SumanthRH SumanthRH marked this pull request as ready for review September 8, 2025 06:32
@SumanthRH
Copy link
Member Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the Clip-Cov and KL-Cov loss functions, which is a valuable addition based on recent research. The implementation is well-structured and includes corresponding documentation, examples, and tests. My review has identified a couple of high-severity issues in skyrl_train/utils/ppo_utils.py related to performance and type correctness that should be addressed. Additionally, there are several minor suggestions to improve documentation and code style. Overall, this is a great contribution.

x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Copy link
Collaborator

@erictang000 erictang000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ty! just few tiny nits

data.train_data="['$DATA_DIR/train.parquet']" \
data.val_data="['$DATA_DIR/validation.parquet']" \
trainer.algorithm.policy_loss_type="$POLICY_LOSS" \
trainer.algorithm.clip_cov.clip_ratio=0.0002 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit, but maybe move these flags with placeholder under the "# Configure Clip-Cov parameters" comment?

data.train_data="['$DATA_DIR/train.parquet']" \
data.val_data="['$DATA_DIR/validation.parquet']" \
trainer.algorithm.policy_loss_type="$POLICY_LOSS" \
trainer.algorithm.kl_cov.kl_cov_frac=0.2 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same nit here

# dual clip parameters
clip_ratio_c: 3.0

# clip-cov parameters (only used when policy_loss_type: "clip_cov")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you update the example config in the docs (above this) to include clip_cov and kl_cov also

like this guy:
policy_loss_type: "regular" # "regular", "dual_clip", "gspo", or customizable with PolicyLossRegistry

@erictang000
Copy link
Collaborator

can you also make sure training runs look normal for a few steps with these guys

x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
@SumanthRH
Copy link
Member Author

Testing for just 10 steps, convergence differs a bit but expected. Impl matches reference we should be good:

Screenshot 2025-09-08 at 2 02 18 PM

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
@SumanthRH SumanthRH merged commit 826e821 into NovaSky-AI:main Sep 8, 2025
3 checks passed
dzorlu referenced this pull request in fleet-ai/SkyRL Feb 4, 2026
# What does this PR do?

Adds Clip-Cov and KL-Cov loss functions based on
https://arxiv.org/pdf/2505.22617


---------

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants