Emit loss_fn_outputs with logprobs for RL losses in forward_backward by tyler-griggs · Pull Request #1047 · NovaSky-AI/SkyRL

tyler-griggs · 2026-02-07T20:07:18Z

Summary

The RL branch of _forward_backward_micro() now packages per-sample logprobs into loss_fn_outputs, matching the existing cross_entropy branch
Required by tinker-cookbook's _training_logprobs_from_fwd_bwd() which reads fwd_bwd_result.loss_fn_outputs[i]["logprobs"] for KL diagnostics
Without this, code_rl and math_rl recipes crash with KeyError: 'loss_fn_outputs'

The RL branch of _forward_backward_micro() now packages per-sample logprobs into loss_fn_outputs, matching the cross_entropy branch. This is required by tinker-cookbook's train.py which reads logprobs from forward_backward results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request correctly adds loss_fn_outputs with log probabilities to the RL training path in _forward_backward_micro, aligning it with the existing cross_entropy path. This change is crucial for downstream consumers that rely on these outputs. The accompanying test updates are thorough and validate the new functionality. I've provided one suggestion to refactor the newly added code block for improved efficiency and maintainability by reducing code duplication.

skyrl-train/skyrl_train/workers/worker.py

devin-ai-integration

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

skyrl-train/skyrl_train/workers/worker.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

relax restriction in `reduce_metrics` to just continue if metrics are not all numbers rather then erroring out, and pop loss_fn_outputs to avoid polluting the logs. Nightly E2E CI (and all training scripts) were failing after #1047. Also deletes claude folder introduced in #999.  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1059" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>

erictang000 · 2026-02-10T01:17:06Z

cc: @tyler-griggs can you be a little more careful when merging to main without review?

this pr broke the main training loop on main, would have been caught by just running the gsm8k example script

fixed in: #1059

erictang000 · 2026-02-10T01:22:12Z

also a case to be made for better testing/alerting on nightly CI

…tinker RL to work (#1102) applies the same changes from #1047 but to the megatron backend. Double PR on both skyrl-train and skyrl for now.  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1102" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>

vercel bot deployed to Preview February 7, 2026 20:08 View deployment

tyler-griggs force-pushed the tyler/rl-loss-fn-outputs branch from 4aef853 to 6f30fe7 Compare February 7, 2026 20:10

vercel bot deployed to Preview February 7, 2026 20:11 View deployment

vercel bot deployed to Preview February 7, 2026 20:30 View deployment

tyler-griggs force-pushed the tyler/rl-loss-fn-outputs branch from ea24a54 to 2cbbd9a Compare February 7, 2026 20:47

vercel bot deployed to Preview February 7, 2026 20:48 View deployment

tyler-griggs marked this pull request as ready for review February 7, 2026 20:50

tyler-griggs force-pushed the tyler/rl-loss-fn-outputs branch from 2cbbd9a to e153aba Compare February 7, 2026 20:52

gemini-code-assist bot reviewed Feb 7, 2026

View reviewed changes

skyrl-train/skyrl_train/workers/worker.py Outdated Show resolved Hide resolved

vercel bot deployed to Preview February 7, 2026 20:53 View deployment

tyler-griggs force-pushed the tyler/rl-loss-fn-outputs branch from e153aba to e050fae Compare February 7, 2026 20:58

vercel bot deployed to Preview February 7, 2026 20:59 View deployment

devin-ai-integration bot reviewed Feb 7, 2026

View reviewed changes

skyrl-train/skyrl_train/workers/worker.py Outdated Show resolved Hide resolved

tyler-griggs force-pushed the tyler/rl-loss-fn-outputs branch from e050fae to 6e17a59 Compare February 7, 2026 21:13

vercel bot deployed to Preview February 7, 2026 21:14 View deployment

tyler-griggs force-pushed the tyler/rl-loss-fn-outputs branch from 6e17a59 to 1327e67 Compare February 7, 2026 21:15

vercel bot deployed to Preview February 7, 2026 21:15 View deployment

Update GPU test to verify loss_fn_outputs in RL forward_backward

9bac20f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

tyler-griggs force-pushed the tyler/rl-loss-fn-outputs branch from 1327e67 to 9bac20f Compare February 7, 2026 21:26

vercel bot deployed to Preview February 7, 2026 21:27 View deployment

tyler-griggs merged commit c7f2d9f into main Feb 7, 2026
4 checks passed

erictang000 mentioned this pull request Feb 10, 2026

[skyrl-train] fix key error on reduce metrics #1059

Merged

erictang000 mentioned this pull request Feb 13, 2026

[skyrl-train] Return loss_fn_outputs for megatron backend to support tinker RL to work #1102

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit loss_fn_outputs with logprobs for RL losses in forward_backward#1047

Emit loss_fn_outputs with logprobs for RL losses in forward_backward#1047
tyler-griggs merged 2 commits intomainfrom
tyler/rl-loss-fn-outputs

tyler-griggs commented Feb 7, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

Uh oh!

Uh oh!

erictang000 commented Feb 10, 2026

Uh oh!

erictang000 commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tyler-griggs commented Feb 7, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

erictang000 commented Feb 10, 2026

Uh oh!

erictang000 commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tyler-griggs commented Feb 7, 2026 •

edited by devin-ai-integration bot

Loading