Skip to content

get chat_end_index after env init#184

Merged
tyler-griggs merged 1 commit intoNovaSky-AI:mainfrom
DataDog:etnbrd/fix-chat_end_index
Aug 22, 2025
Merged

get chat_end_index after env init#184
tyler-griggs merged 1 commit intoNovaSky-AI:mainfrom
DataDog:etnbrd/fix-chat_end_index

Conversation

@etnbrd
Copy link
Contributor

@etnbrd etnbrd commented Aug 22, 2025

This PR moves the chat_end_index to after the chat_history has been modified by the env.init function, to get the right number.

In our use case, we are injecting the system prompt in addition to the user message in the env.init, so the length changes before and after.
In this case the chat_end_index is 1 instead of 2. We think it might be causing some bugs down the line, leading to this assertion error:

AssertionError: Response ids and loss masks must have the same length, for sample 0 got 27 and 50

Resolves #183

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly moves the calculation of chat_end_index to be after the call to env.init. As the description points out, env.init can modify the chat_history by adding messages like a system prompt. By calculating chat_end_index after this modification, you ensure it accurately reflects the length of the initial chat history, which should resolve the assertion error mentioned. The change is logical and well-contained.

@tyler-griggs
Copy link
Member

Great catch, thanks

@tyler-griggs tyler-griggs merged commit 9e5db05 into NovaSky-AI:main Aug 22, 2025
3 checks passed
dzorlu referenced this pull request in fleet-ai/SkyRL Feb 4, 2026
This PR moves the `chat_end_index` to after the `chat_history` has been
modified by the `env.init` function, to get the right number.

In our use case, we are injecting the system prompt in addition to the
user message in the `env.init`, so the length changes before and after.
In this case the `chat_end_index` is 1 instead of 2. We think it might
be causing some bugs down the line, leading to this assertion error:
```
AssertionError: Response ids and loss masks must have the same length, for sample 0 got 27 and 50
```

Resolves #183
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When env.init changes the prompt length, the chat_end_index is wrong

2 participants