[Data] Streaming Partition enforce row_num per block by owenowenisme · Pull Request #57984 · ray-project/ray

owenowenisme · 2025-10-22T06:04:41Z

Description

Currently, streaming repartition applies a map transform to each block independently and does not merge leftover rows across blocks, so it cannot guarantee exact row counts per output block. This PR introduces a new design that computes, on the driver, the input block ranges for every output block. It avoids driver-side block fetching while ensuring correctness and leveraging the efficiency of parallel map tasks.

Related issues

Closes #57165

Additional information

Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>

…or-streaming-repartition

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>

python/ray/data/_internal/streaming_repartition.py

Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>

cursor · 2025-10-23T15:09:08Z

Bug: Test Fails to Verify Row Counts Post-Repartitioning

The test_repartition_guarantee_row_num_to_be_exact test initializes block_row_counts as an empty list. This prevents the subsequent loop and assertions from executing, meaning the test doesn't actually verify the expected row counts per block after repartitioning.

srinathk10 · 2025-10-23T23:23:08Z

@owenowenisme My first pass looks good. @bveeramani Will do a review for implementation design inside MapOperator.

python/ray/data/_internal/streaming_repartition.py

…t_task Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

srinathk10

LGTM but @alexeykudinkin or @bveeramani need review the impl design

bveeramani

I think the high-level idea sounds reasonable, but I think the current implementation adds a lot of complexity to the MapOperator interfaces.

Could you figure out how to implement this in a way that:

Avoids introducing abstractions that overlap with existing ones (e.g., _TaskInput/TaskContext and StreamingRepartitionTaskBuilder/BlockRefBundler)
Avoids adding streaming-repartition-specific methods to the MapOperator base class (e.g., _submit_task_input and set_task_input_builder)
Makes the correctness easy to test without requiring tens of E2E test cases?

python/ray/data/tests/test_repartition_e2e.py

python/ray/data/dataset.py

python/ray/data/_internal/execution/operators/map_operator.py

python/ray/data/_internal/streaming_repartition.py

…or-streaming-repartition

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

python/ray/data/_internal/streaming_repartition.py

python/ray/data/_internal/iterator/stream_split_iterator.py

python/ray/data/_internal/execution/interfaces/ref_bundle.py

raulchen · 2025-11-12T18:02:01Z

python/ray/data/_internal/execution/interfaces/ref_bundle.py

+            elif metadata.num_rows != block_slice.num_rows:
+                # Partial block - estimate size based on rows
+                per_row = metadata.size_bytes / metadata.num_rows
+                total += max(1, int(math.ceil(per_row * block_slice.num_rows)))


looks like we are double slicing the metadata? one here and one in _slice_block_metadata.

I think let's remove _slice_block_metadata and document that when slices are present, metadata is still the original metadata.

actually _slice_block_metadata is wrong. because then you cannot slice an already-sliced block.
Let's fix it and add a unit test.

Added unit test and remove _slice_block_metadata

raulchen · 2025-11-12T18:12:09Z

python/ray/data/_internal/execution/interfaces/ref_bundle.py

+        else:
+            assert len(self.blocks) == len(
+                self.slices
+            ), "Number of blocks and slices must match"


let's also validate the slices have valid ranges.

raulchen · 2025-11-12T19:54:06Z

python/ray/data/_internal/streaming_repartition.py

+"""
+
+
+class StreamingRepartitionRefBundler(BaseRefBundler):


please also add unit tests for this class.

Added in test_operators just like BlockRefBundler

python/ray/data/_internal/streaming_repartition.py

raulchen · 2025-11-12T19:57:06Z

python/ray/data/_internal/streaming_repartition.py

+        if self._total_pending_rows >= self._target_num_rows or flush_remaining:
+            rows_needed_from_last_bundle = (
+                self._total_pending_rows % self._target_num_rows
+            )


this seems wrong.
should be
self._total_pending_rows % self._target_num_rows - self._total_pending_rows % self._target_num_rows

I think you meant self._pending_bundles[-1].num_rows() - self._total_pending_rows % self._target_num_rows ?

Btw self._pending_bundles[-1].num_rows() - self._total_pending_rows % self._target_num_rows will never be negative, but I added assertion just in case

xinyuangui2 · 2025-11-12T20:28:29Z

python/ray/data/_internal/streaming_repartition.py

+
+
+class StreamingRepartitionRefBundler(BaseRefBundler):
+    """Incrementally builds task inputs to produce target-sized outputs.


Does this refbundler generate exactly the same as target_num_rows_per_block or multiplies of target_num_rows_per_block?

Updated description

python/ray/data/_internal/streaming_repartition.py

python/ray/data/_internal/iterator/stream_split_iterator.py

…edata Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

cursor · 2025-11-13T13:02:14Z

python/ray/data/_internal/execution/interfaces/ref_bundle.py

+            elif metadata.num_rows != block_slice.num_rows:
+                # Partial block - estimate size based on rows
+                per_row = metadata.size_bytes / metadata.num_rows
+                total += max(1, int(math.ceil(per_row * block_slice.num_rows)))


Bug: Incorrect Size for Empty Data

When calculating size_bytes() for a slice with zero rows, the code uses max(1, int(math.ceil(per_row * block_slice.num_rows))) which returns 1 byte even when block_slice.num_rows is 0. An empty slice (0 rows) should contribute 0 bytes to the total size, not 1 byte. The max(1, ...) guard appears intended to prevent zero-byte estimates for non-empty slices but incorrectly applies to empty slices as well.

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

cursor · 2025-11-13T13:08:56Z

python/ray/data/_internal/streaming_repartition.py

+                    rows_needed_from_last_bundle
+                )
+                pending_bundles.append(sliced_bundle)
+            self._ready_bundles.append(RefBundle.merge_ref_bundles(pending_bundles))


Bug: Bundle Exclusion Fails on Exact Completion

When rows_needed_from_last_bundle equals zero, the last bundle should be excluded from the ready bundle but isn't. This occurs when the last bundle's row count exactly equals the remainder (_total_pending_rows % _target_num_rows). For example, with 15 total rows, target of 10, and last bundle of 5 rows, the code outputs all 15 rows instead of outputting 10 rows and keeping 5 pending. The condition at line 39 should handle the zero case by removing the last bundle from pending_bundles before merging.

python/ray/data/_internal/streaming_repartition.py

raulchen · 2025-11-13T22:49:09Z

python/ray/data/tests/test_operators.py

    assert flat_out == list(range(n))


+@pytest.mark.parametrize(


nit, this should be put under tests/unit, as it's a uni test.

raulchen · 2025-11-13T22:49:43Z

python/ray/data/tests/test_operators.py

+            # Test with empty blocks
+            3,
+            [[[1]], [[]], [[2, 3]], [[]], [[4, 5]]],
+            [3, 2],  # Expected: [1,2,3] and [4,5]


let's also check the block contents.

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

addressed

## Description Currently, streaming repartition applies a map transform to each block independently and does not merge leftover rows across blocks, so it cannot guarantee exact row counts per output block. This PR introduces a new design that computes, on the driver, the input block ranges for every output block. It avoids driver-side block fetching while ensuring correctness and leveraging the efficiency of parallel map tasks. ## Related issues Closes ray-project#57165 ## Additional information --------- Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com> Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

## Description Currently, streaming repartition applies a map transform to each block independently and does not merge leftover rows across blocks, so it cannot guarantee exact row counts per output block. This PR introduces a new design that computes, on the driver, the input block ranges for every output block. It avoids driver-side block fetching while ensuring correctness and leveraging the efficiency of parallel map tasks. ## Related issues Closes ray-project#57165 ## Additional information --------- Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com> Signed-off-by: You-Cheng Lin <mses010108@gmail.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>

## Description Currently, streaming repartition applies a map transform to each block independently and does not merge leftover rows across blocks, so it cannot guarantee exact row counts per output block. This PR introduces a new design that computes, on the driver, the input block ranges for every output block. It avoids driver-side block fetching while ensuring correctness and leveraging the efficiency of parallel map tasks. ## Related issues Closes ray-project#57165 ## Additional information --------- Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com> Signed-off-by: You-Cheng Lin <mses010108@gmail.com> Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>

## Description Currently, streaming repartition applies a map transform to each block independently and does not merge leftover rows across blocks, so it cannot guarantee exact row counts per output block. This PR introduces a new design that computes, on the driver, the input block ranges for every output block. It avoids driver-side block fetching while ensuring correctness and leveraging the efficiency of parallel map tasks. ## Related issues Closes ray-project#57165 ## Additional information --------- Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com> Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

## Description Currently, streaming repartition applies a map transform to each block independently and does not merge leftover rows across blocks, so it cannot guarantee exact row counts per output block. This PR introduces a new design that computes, on the driver, the input block ranges for every output block. It avoids driver-side block fetching while ensuring correctness and leveraging the efficiency of parallel map tasks. ## Related issues Closes ray-project#57165 ## Additional information --------- Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com> Signed-off-by: You-Cheng Lin <mses010108@gmail.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>

## Description Currently, streaming repartition applies a map transform to each block independently and does not merge leftover rows across blocks, so it cannot guarantee exact row counts per output block. This PR introduces a new design that computes, on the driver, the input block ranges for every output block. It avoids driver-side block fetching while ensuring correctness and leveraging the efficiency of parallel map tasks. ## Related issues Closes ray-project#57165 ## Additional information --------- Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com> Signed-off-by: You-Cheng Lin <mses010108@gmail.com> Signed-off-by: peterxcli <peterxcli@gmail.com>

owenowenisme added the go add ONLY when ready to merge, run all tests label Oct 22, 2025

update

7e39adb

Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>

owenowenisme force-pushed the data/use-map-op-for-streaming-repartition branch from 6610c21 to 7e39adb Compare October 22, 2025 07:49

Merge remote-tracking branch 'upstream/master' into data/use-map-op-f…

ad81683

…or-streaming-repartition

owenowenisme force-pushed the data/use-map-op-for-streaming-repartition branch from 6b1c2c3 to ad81683 Compare October 22, 2025 12:03

owenowenisme added 3 commits October 22, 2025 21:11

rename _schedule_task_input

02a27c5

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

update

1a28369

Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>

make a task can output multiple blocks

344c0c7

Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>

srinathk10 reviewed Oct 23, 2025

View reviewed changes

owenowenisme added 2 commits October 23, 2025 05:22

remove new term chunk

8c63cd0

Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>

resolve comment

6f47d8a

Signed-off-by: You-Cheng Lin (Owen) <mses010108@gmail.com>

owenowenisme marked this pull request as ready for review October 23, 2025 15:06

owenowenisme requested a review from a team as a code owner October 23, 2025 15:06

ray-gardener bot added the data Ray Data-related issues label Oct 23, 2025

srinathk10 reviewed Oct 24, 2025

View reviewed changes

python/ray/data/_internal/streaming_repartition.py Outdated Show resolved Hide resolved

python/ray/data/_internal/streaming_repartition.py Outdated Show resolved Hide resolved

merge _build_task_from_single_block_full_blocks & _build_single_outpu…

6796f01

…t_task Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

This comment was marked as outdated.

Sign in to view

owenowenisme requested a review from srinathk10 October 24, 2025 16:47

srinathk10 reviewed Oct 24, 2025

View reviewed changes

Merge branch 'master' into data/use-map-op-for-streaming-repartition

0c2e9a1

bveeramani previously requested changes Oct 27, 2025

View reviewed changes

xinyuangui2 reviewed Oct 28, 2025

View reviewed changes

python/ray/data/_internal/streaming_repartition.py Outdated Show resolved Hide resolved

owenowenisme added 4 commits October 31, 2025 13:09

Merge remote-tracking branch 'upstream/master' into data/use-map-op-f…

ff71cbe

…or-streaming-repartition

Merge remote-tracking branch 'upstream/master' into data/use-map-op-f…

05c5e61

…or-streaming-repartition

remove _TaskInput

13d2280

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

make StreamingRepartition default preserve order

a60fbe6

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

This comment was marked as outdated.

Sign in to view

rename slice_rows num_rows_in_slice

25be194

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

cursor bot reviewed Nov 12, 2025

View reviewed changes

python/ray/data/_internal/streaming_repartition.py Show resolved Hide resolved

python/ray/data/_internal/iterator/stream_split_iterator.py Show resolved Hide resolved

owenowenisme requested review from raulchen and xinyuangui2 November 12, 2025 16:57

raulchen reviewed Nov 12, 2025

View reviewed changes

xinyuangui2 reviewed Nov 12, 2025

View reviewed changes

Merge branch 'master' into data/use-map-op-for-streaming-repartition

589b97f

cursor bot reviewed Nov 13, 2025

View reviewed changes

python/ray/data/_internal/streaming_repartition.py Show resolved Hide resolved

python/ray/data/_internal/iterator/stream_split_iterator.py Show resolved Hide resolved

owenowenisme added 3 commits November 13, 2025 10:40

fix logic of row_need_from_last_block and add test for ref_bundle met…

5b23b18

…edata Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

add bundler testing

970b551

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

add assertion to rows_needed_from_last_bundle

a4390f9

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

owenowenisme requested review from raulchen and xinyuangui2 November 13, 2025 12:59

cursor bot reviewed Nov 13, 2025

View reviewed changes

update

042312f

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

cursor bot reviewed Nov 13, 2025

View reviewed changes

Merge branch 'master' into data/use-map-op-for-streaming-repartition

2868f9a

cursor bot reviewed Nov 13, 2025

View reviewed changes

python/ray/data/_internal/streaming_repartition.py Show resolved Hide resolved

python/ray/data/_internal/streaming_repartition.py Show resolved Hide resolved

raulchen approved these changes Nov 13, 2025

View reviewed changes

make test streaming repartition bundler unit test

2c5cc47

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

raulchen enabled auto-merge (squash) November 14, 2025 00:37

raulchen merged commit 47c1015 into ray-project:master Nov 14, 2025
6 of 7 checks passed



		class StreamingRepartitionRefBundler(BaseRefBundler):
		"""Incrementally builds task inputs to produce target-sized outputs.

Conversation

owenowenisme commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Additional information

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot commented Oct 23, 2025

Bug: Test Fails to Verify Row Counts Post-Repartitioning

Uh oh!

srinathk10 commented Oct 23, 2025

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

srinathk10 left a comment

Choose a reason for hiding this comment

Uh oh!

bveeramani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor bot Nov 13, 2025

Choose a reason for hiding this comment

Bug: Incorrect Size for Empty Data

Uh oh!

cursor bot Nov 13, 2025

Choose a reason for hiding this comment

Bug: Bundle Exclusion Fails on Exact Completion

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

owenowenisme commented Oct 22, 2025 •

edited

Loading