[eval] fix eval batch < dp_size edge case by erictang000 · Pull Request #62 · NovaSky-AI/SkyRL

erictang000 · 2025-07-04T19:34:06Z

What does this PR do

removes call to _remove_tail_data from eval loop

Previously, we called _remove_tail_data for every batch in the eval loop

for _, prompts in enumerate(self.eval_dataloader):
            prompts = self._remove_tail_data(prompts)
...

this caused len(prompts) to be 0 if len(prompts) < dp_size, since we do floor division here:

SkyRL/skyrl-train/skyrl_train/trainer.py

Line 351 in 8053c90

return entries[: (len(entries) // dp_size) * dp_size]

We don't need to _remove_tail_data in the eval loop, since we are just doing generation, which doesn't require explicit sharding, even for batched generation case, where this is handled cleanly here:

SkyRL/skyrl-train/skyrl_train/inference_engines/inference_engine_client.py

Line 96 in 8053c90

    
           dp_item_size = (len(prompts_or_tokens) + num_inference_engines - 1) // num_inference_engines

CharlieFRuan

Thank you!

# What does this PR do - removes call to `_remove_tail_data` from eval loop Previously, we called `_remove_tail_data` for every batch in the eval loop ```python for _, prompts in enumerate(self.eval_dataloader): prompts = self._remove_tail_data(prompts) ... ``` this caused len(prompts) to be 0 if len(prompts) < dp_size, since we do floor division here: https://github.com/NovaSky-AI/SkyRL/blob/42d8c74be77940ea49dfc977ca23a5b885d7c986/skyrl-train/skyrl_train/trainer.py#L351 We don't need to `_remove_tail_data` in the eval loop, since we are just doing generation, which doesn't require explicit sharding, even for batched generation case, where this is handled cleanly here: https://github.com/NovaSky-AI/SkyRL/blob/42d8c74be77940ea49dfc977ca23a5b885d7c986/skyrl-train/skyrl_train/inference_engines/inference_engine_client.py#L96

erictang000 added 2 commits July 4, 2025 19:24

remove from eval

2d3726f

remove training loop check

e7ec959

CharlieFRuan approved these changes Jul 4, 2025

View reviewed changes

erictang000 merged commit 87a301c into main Jul 4, 2025
3 checks passed

erictang000 deleted the erictang000/eval_edge_case branch July 4, 2025 19:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[eval] fix eval batch < dp_size edge case#62

[eval] fix eval batch < dp_size edge case#62
erictang000 merged 2 commits intomainfrom
erictang000/eval_edge_case

erictang000 commented Jul 4, 2025

Uh oh!

CharlieFRuan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

erictang000 commented Jul 4, 2025

What does this PR do

Uh oh!

CharlieFRuan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants