Skip to content

Emit loss_fn_outputs with logprobs for RL losses in forward_backward#1047

Merged
tyler-griggs merged 2 commits intomainfrom
tyler/rl-loss-fn-outputs
Feb 7, 2026
Merged

Emit loss_fn_outputs with logprobs for RL losses in forward_backward#1047
tyler-griggs merged 2 commits intomainfrom
tyler/rl-loss-fn-outputs

Conversation

@tyler-griggs
Copy link
Member

@tyler-griggs tyler-griggs commented Feb 7, 2026

Summary

  • The RL branch of _forward_backward_micro() now packages per-sample logprobs into loss_fn_outputs, matching the existing cross_entropy branch
  • Required by tinker-cookbook's _training_logprobs_from_fwd_bwd() which reads fwd_bwd_result.loss_fn_outputs[i]["logprobs"] for KL diagnostics
  • Without this, code_rl and math_rl recipes crash with KeyError: 'loss_fn_outputs'

Open with Devin

The RL branch of _forward_backward_micro() now packages per-sample
logprobs into loss_fn_outputs, matching the cross_entropy branch.
This is required by tinker-cookbook's train.py which reads logprobs
from forward_backward results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly adds loss_fn_outputs with log probabilities to the RL training path in _forward_backward_micro, aligning it with the existing cross_entropy path. This change is crucial for downstream consumers that rely on these outputs. The accompanying test updates are thorough and validate the new functionality. I've provided one suggestion to refactor the newly added code block for improved efficiency and maintainability by reducing code duplication.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tyler-griggs tyler-griggs force-pushed the tyler/rl-loss-fn-outputs branch from 1327e67 to 9bac20f Compare February 7, 2026 21:26
@tyler-griggs tyler-griggs merged commit c7f2d9f into main Feb 7, 2026
4 checks passed
erictang000 added a commit that referenced this pull request Feb 10, 2026
relax restriction in `reduce_metrics` to just continue if metrics are
not all numbers rather then erroring out, and pop loss_fn_outputs to
avoid polluting the logs.

Nightly E2E CI (and all training scripts) were failing after #1047.

Also deletes claude folder introduced in #999.
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1059"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->
@erictang000
Copy link
Collaborator

cc: @tyler-griggs can you be a little more careful when merging to main without review?

this pr broke the main training loop on main, would have been caught by just running the gsm8k example script

fixed in: #1059

@erictang000
Copy link
Collaborator

also a case to be made for better testing/alerting on nightly CI

erictang000 added a commit that referenced this pull request Feb 13, 2026
…tinker RL to work (#1102)

applies the same changes from #1047 but to the megatron backend.

Double PR on both skyrl-train and skyrl for now.


<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1102"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants