Emit loss_fn_outputs with logprobs for RL losses in forward_backward#1047
Emit loss_fn_outputs with logprobs for RL losses in forward_backward#1047tyler-griggs merged 2 commits intomainfrom
Conversation
The RL branch of _forward_backward_micro() now packages per-sample logprobs into loss_fn_outputs, matching the cross_entropy branch. This is required by tinker-cookbook's train.py which reads logprobs from forward_backward results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4aef853 to
6f30fe7
Compare
ea24a54 to
2cbbd9a
Compare
2cbbd9a to
e153aba
Compare
There was a problem hiding this comment.
Code Review
This pull request correctly adds loss_fn_outputs with log probabilities to the RL training path in _forward_backward_micro, aligning it with the existing cross_entropy path. This change is crucial for downstream consumers that rely on these outputs. The accompanying test updates are thorough and validate the new functionality. I've provided one suggestion to refactor the newly added code block for improved efficiency and maintainability by reducing code duplication.
e153aba to
e050fae
Compare
e050fae to
6e17a59
Compare
6e17a59 to
1327e67
Compare
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1327e67 to
9bac20f
Compare
relax restriction in `reduce_metrics` to just continue if metrics are not all numbers rather then erroring out, and pop loss_fn_outputs to avoid polluting the logs. Nightly E2E CI (and all training scripts) were failing after #1047. Also deletes claude folder introduced in #999. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1059" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->
|
cc: @tyler-griggs can you be a little more careful when merging to main without review? this pr broke the main training loop on main, would have been caught by just running the gsm8k example script fixed in: #1059 |
|
also a case to be made for better testing/alerting on nightly CI |
…tinker RL to work (#1102) applies the same changes from #1047 but to the megatron backend. Double PR on both skyrl-train and skyrl for now. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1102" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->
Summary
_forward_backward_micro()now packages per-sample logprobs intoloss_fn_outputs, matching the existingcross_entropybranch_training_logprobs_from_fwd_bwd()which readsfwd_bwd_result.loss_fn_outputs[i]["logprobs"]for KL diagnosticscode_rlandmath_rlrecipes crash withKeyError: 'loss_fn_outputs'