[Algorithms] Add Clip-Cov and KL-Cov loss functions by SumanthRH · Pull Request #251 · NovaSky-AI/SkyRL

SumanthRH · 2025-09-08T01:24:33Z

What does this PR do?

Adds Clip-Cov and KL-Cov loss functions based on https://arxiv.org/pdf/2505.22617

TODO:

Add CPU test
Cleanup

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH · 2025-09-08T06:32:58Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces the Clip-Cov and KL-Cov loss functions, which is a valuable addition based on recent research. The implementation is well-structured and includes corresponding documentation, examples, and tests. My review has identified a couple of high-severity issues in skyrl_train/utils/ppo_utils.py related to performance and type correctness that should be addressed. Additionally, there are several minor suggestions to improve documentation and code style. Overall, this is a great contribution.

skyrl-train/skyrl_train/utils/ppo_utils.py

skyrl-train/docs/configuration/config.rst

skyrl-train/examples/algorithms/clip_cov_kl_cov/README.md

skyrl-train/examples/algorithms/clip_cov_kl_cov/run_clip_cov.sh

skyrl-train/examples/algorithms/clip_cov_kl_cov/run_kl_cov.sh

skyrl-train/skyrl_train/config/ppo_base_config.yaml

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

erictang000

LGTM ty! just few tiny nits

erictang000 · 2025-09-08T17:27:54Z

skyrl-train/examples/algorithms/clip_cov_kl_cov/run_clip_cov.sh

+  data.train_data="['$DATA_DIR/train.parquet']" \
+  data.val_data="['$DATA_DIR/validation.parquet']" \
+  trainer.algorithm.policy_loss_type="$POLICY_LOSS" \
+  trainer.algorithm.clip_cov.clip_ratio=0.0002 \


tiny nit, but maybe move these flags with placeholder under the "# Configure Clip-Cov parameters" comment?

erictang000 · 2025-09-08T17:28:05Z

skyrl-train/examples/algorithms/clip_cov_kl_cov/run_kl_cov.sh

+  data.train_data="['$DATA_DIR/train.parquet']" \
+  data.val_data="['$DATA_DIR/validation.parquet']" \
+  trainer.algorithm.policy_loss_type="$POLICY_LOSS" \
+  trainer.algorithm.kl_cov.kl_cov_frac=0.2 \


same nit here

erictang000 · 2025-09-08T17:30:07Z

skyrl-train/docs/configuration/config.rst

      # dual clip parameters
      clip_ratio_c: 3.0

+      # clip-cov parameters (only used when policy_loss_type: "clip_cov")


nit: can you update the example config in the docs (above this) to include clip_cov and kl_cov also

like this guy:
policy_loss_type: "regular" # "regular", "dual_clip", "gspo", or customizable with PolicyLossRegistry

erictang000 · 2025-09-08T17:33:10Z

can you also make sure training runs look normal for a few steps with these guys

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH · 2025-09-08T21:04:57Z

Testing for just 10 steps, convergence differs a bit but expected. Impl matches reference we should be good:

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

# What does this PR do? Adds Clip-Cov and KL-Cov loss functions based on https://arxiv.org/pdf/2505.22617 --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH added 3 commits September 8, 2025 01:22

add clip cov and kl cov

25f0fd2

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

c9868d1

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

979c4ac

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH marked this pull request as ready for review September 8, 2025 06:32

gemini-code-assist bot reviewed Sep 8, 2025

View reviewed changes

SumanthRH added 3 commits September 8, 2025 06:36

x

ea4efde

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

62acc57

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

dd7d618

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH requested a review from erictang000 September 8, 2025 06:42

erictang000 approved these changes Sep 8, 2025

View reviewed changes

x

9decd00

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

nits

2476b3b

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH merged commit 826e821 into NovaSky-AI:main Sep 8, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Algorithms] Add Clip-Cov and KL-Cov loss functions#251

[Algorithms] Add Clip-Cov and KL-Cov loss functions#251
SumanthRH merged 8 commits intoNovaSky-AI:mainfrom
SumanthRH:add-clip-cov

SumanthRH commented Sep 8, 2025 •

edited

Loading

Uh oh!

SumanthRH commented Sep 8, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erictang000 left a comment

Uh oh!

erictang000 Sep 8, 2025

Uh oh!

erictang000 Sep 8, 2025

Uh oh!

erictang000 Sep 8, 2025

Uh oh!

erictang000 commented Sep 8, 2025

Uh oh!

SumanthRH commented Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SumanthRH commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

SumanthRH commented Sep 8, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erictang000 left a comment

Choose a reason for hiding this comment

Uh oh!

erictang000 Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

erictang000 Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

erictang000 Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

erictang000 commented Sep 8, 2025

Uh oh!

SumanthRH commented Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SumanthRH commented Sep 8, 2025 •

edited

Loading