Skip to content

TerminalBenchGenerator: logprobs + session ID#448

Merged
tyler-griggs merged 5 commits intoNovaSky-AI:mainfrom
li-boxuan:tbench-generator-session-and-logprobs
Oct 14, 2025
Merged

TerminalBenchGenerator: logprobs + session ID#448
tyler-griggs merged 5 commits intoNovaSky-AI:mainfrom
li-boxuan:tbench-generator-session-and-logprobs

Conversation

@li-boxuan
Copy link
Contributor

@li-boxuan li-boxuan commented Oct 10, 2025

With laude-institute/harbor#45, Sandboxes now returns logprobs for terminus agent, so TerminalBenchGenerator could leverage it if applicable. This PR doesn't enable trainer.algorithm.use_tis=true so it should be a no-op.

With laude-institute/harbor#50, Terminus agent now accepts "session_id" parameter, and it will show up in the litellm request body. TerminalBenchGenerator could leverage this for better routing.

Note that I don't have access to GPU yet so this is not tested.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for logprobs and session_id in TerminalBenchGenerator. The changes look good overall, but I have identified a few areas for improvement. Specifically, I've suggested clarifying an error message, adding a validation check to prevent data misalignment between tokens and logprobs, and fixing a potential bug related to how logprobs are handled when they are empty. These changes will improve the robustness and debuggability of the new functionality.

Copy link
Member

@tyler-griggs tyler-griggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks.

@tyler-griggs tyler-griggs merged commit fe08892 into NovaSky-AI:main Oct 14, 2025
1 check failed
erictang000 added a commit that referenced this pull request Oct 14, 2025
li-boxuan pushed a commit to li-boxuan/SkyRL that referenced this pull request Nov 23, 2025
dzorlu pushed a commit to fleet-ai/SkyRL that referenced this pull request Feb 4, 2026
With laude-institute/harbor#45, Sandboxes now
returns logprobs for terminus agent, so TerminalBenchGenerator could
leverage it if applicable. This PR doesn't enable
`trainer.algorithm.use_tis=true` so it should be a no-op.

With laude-institute/harbor#50, Terminus
agent now accepts "session_id" parameter, and it will show up in the
litellm request body. TerminalBenchGenerator could leverage this for
better routing.

Note that I don't have access to GPU yet so this is not tested.

---------

Co-authored-by: Tyler Griggs <131809874+tyler-griggs@users.noreply.github.com>
dzorlu pushed a commit to fleet-ai/SkyRL that referenced this pull request Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants