[Data] Avoid slicing block when total_pending_rows < target#58699
Merged
raulchen merged 2 commits intoray-project:masterfrom Nov 17, 2025
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request aims to fix a crash in streaming repartition that occurs when flushing remaining blocks that have fewer rows than the target block size. The change correctly separates the flushing logic from the main bundling logic, which resolves the crash. However, the implementation has a subtle bug where it may not flush all pending data if there are enough rows for at least one full block when done_adding_bundles is called, potentially leading to data loss. I've suggested a more robust implementation that ensures all full blocks are created before flushing the remainder.
raulchen
approved these changes
Nov 17, 2025
Aydin-ab
pushed a commit
to Aydin-ab/ray-aydin
that referenced
this pull request
Nov 19, 2025
…ect#58699) ## Description Previously we will try slice the block when `self._total_pending_rows >= self._target_num_rows` or `flush_remaining` is True, but flush_remaining doesn't mean `self._total_pending_rows >= self._target_num_rows ` so it could make the slicing failed because our slicing logic is based on assumption there should be at least one full block. This PR fix the logic and added test for such case. --------- Signed-off-by: You-Cheng Lin <mses010108@gmail.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
ykdojo
pushed a commit
to ykdojo/ray
that referenced
this pull request
Nov 27, 2025
…ect#58699) ## Description Previously we will try slice the block when `self._total_pending_rows >= self._target_num_rows` or `flush_remaining` is True, but flush_remaining doesn't mean `self._total_pending_rows >= self._target_num_rows ` so it could make the slicing failed because our slicing logic is based on assumption there should be at least one full block. This PR fix the logic and added test for such case. --------- Signed-off-by: You-Cheng Lin <mses010108@gmail.com> Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>
SheldonTsen
pushed a commit
to SheldonTsen/ray
that referenced
this pull request
Dec 1, 2025
…ect#58699) ## Description Previously we will try slice the block when `self._total_pending_rows >= self._target_num_rows` or `flush_remaining` is True, but flush_remaining doesn't mean `self._total_pending_rows >= self._target_num_rows ` so it could make the slicing failed because our slicing logic is based on assumption there should be at least one full block. This PR fix the logic and added test for such case. --------- Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
Future-Outlier
pushed a commit
to Future-Outlier/ray
that referenced
this pull request
Dec 7, 2025
…ect#58699) ## Description Previously we will try slice the block when `self._total_pending_rows >= self._target_num_rows` or `flush_remaining` is True, but flush_remaining doesn't mean `self._total_pending_rows >= self._target_num_rows ` so it could make the slicing failed because our slicing logic is based on assumption there should be at least one full block. This PR fix the logic and added test for such case. --------- Signed-off-by: You-Cheng Lin <mses010108@gmail.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
peterxcli
pushed a commit
to peterxcli/ray
that referenced
this pull request
Feb 25, 2026
…ect#58699) ## Description Previously we will try slice the block when `self._total_pending_rows >= self._target_num_rows` or `flush_remaining` is True, but flush_remaining doesn't mean `self._total_pending_rows >= self._target_num_rows ` so it could make the slicing failed because our slicing logic is based on assumption there should be at least one full block. This PR fix the logic and added test for such case. --------- Signed-off-by: You-Cheng Lin <mses010108@gmail.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Previously we will try slice the block when
self._total_pending_rows >= self._target_num_rowsorflush_remainingis True, but flush_remaining doesn't meanself._total_pending_rows >= self._target_num_rowsso it could make the slicing failed because our slicing logic is based on assumption there should be at least one full block.This PR fix the logic and added test for such case.