Skip to content

[eval] fix eval batch < dp_size edge case#62

Merged
erictang000 merged 2 commits intomainfrom
erictang000/eval_edge_case
Jul 4, 2025
Merged

[eval] fix eval batch < dp_size edge case#62
erictang000 merged 2 commits intomainfrom
erictang000/eval_edge_case

Conversation

@erictang000
Copy link
Collaborator

What does this PR do

  • removes call to _remove_tail_data from eval loop

Previously, we called _remove_tail_data for every batch in the eval loop

for _, prompts in enumerate(self.eval_dataloader):
            prompts = self._remove_tail_data(prompts)
...

this caused len(prompts) to be 0 if len(prompts) < dp_size, since we do floor division here:

return entries[: (len(entries) // dp_size) * dp_size]

We don't need to _remove_tail_data in the eval loop, since we are just doing generation, which doesn't require explicit sharding, even for batched generation case, where this is handled cleanly here:

dp_item_size = (len(prompts_or_tokens) + num_inference_engines - 1) // num_inference_engines

Copy link
Collaborator

@CharlieFRuan CharlieFRuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@erictang000 erictang000 merged commit 87a301c into main Jul 4, 2025
3 checks passed
@erictang000 erictang000 deleted the erictang000/eval_edge_case branch July 4, 2025 19:40
fannie1208 pushed a commit to vinid/SkyRL that referenced this pull request Aug 19, 2025
# What does this PR do
- removes call to `_remove_tail_data` from eval loop

Previously, we called `_remove_tail_data` for every batch in the eval
loop

```python
for _, prompts in enumerate(self.eval_dataloader):
            prompts = self._remove_tail_data(prompts)
...
```

this caused len(prompts) to be 0 if len(prompts) < dp_size, since we do
floor division here:


https://github.com/NovaSky-AI/SkyRL/blob/42d8c74be77940ea49dfc977ca23a5b885d7c986/skyrl-train/skyrl_train/trainer.py#L351

We don't need to `_remove_tail_data` in the eval loop, since we are just
doing generation, which doesn't require explicit sharding, even for
batched generation case, where this is handled cleanly here:


https://github.com/NovaSky-AI/SkyRL/blob/42d8c74be77940ea49dfc977ca23a5b885d7c986/skyrl-train/skyrl_train/inference_engines/inference_engine_client.py#L96
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants