[data] HashShuffleAggregator break down block on finalize#58603
[data] HashShuffleAggregator break down block on finalize#58603bveeramani merged 14 commits intoray-project:masterfrom
Conversation
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request correctly identifies and aims to solve an important issue with HashShuffleAggregator handling very large blocks, which can lead to out-of-memory errors. The approach of using BlockOutputBuffer to break down large blocks is sound. However, the current implementation of the finalize method introduces several critical issues, including a risk of deadlocks, potential data loss, and incorrect metrics reporting. My review provides a detailed comment with a suggested replacement for the finalize method that addresses these problems while preserving the original intent of the change.
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
…/aggregator-yield-block-size
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
26404f8 to
ec0e610
Compare
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
| if partition_id in self._finalizing_tasks: | ||
| self._finalizing_tasks.pop(partition_id) | ||
|
|
||
| # Update Finalize Metrics on task completion |
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
|
who is reviewing this pr? |
| # so we do not break the block down further. | ||
| if target_max_block_size is not None: | ||
| # Creating a block output buffer per partition finalize task because: | ||
| # 1. Need to keep track of which tasks have already been finalized |
There was a problem hiding this comment.
I couldn't understand what (1) means. Could you elaborate/revise?
There was a problem hiding this comment.
updated, I don't think 1 makes sense too lol. My intent was that I can keep track of re-finalizing tasks, but that would lead to additional stats + maybe additional locks = more complexity, so kept it simple
| @@ -1560,17 +1570,38 @@ def submit(self, input_seq_id: int, partition_id: int, partition_shard: Block): | |||
| def finalize( | |||
| self, partition_id: int | |||
| ) -> AsyncGenerator[Union[Block, "BlockMetadataWithSchema"], None]: | |||
There was a problem hiding this comment.
Nit: Out-of-scope for this PR, but I think this is a regular generator, not async
| ) -> AsyncGenerator[Union[Block, "BlockMetadataWithSchema"], None]: | |
| ) -> Generator[Union[Block, "BlockMetadataWithSchema"], None]: |
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
…t#58603) ## Description `HashShuffleAggregator` currently doesn't break big blocks into smaller blocks (or combine smaller blocks into bigger ones). For large blocks, this can be very problematic. This PR addresses this by using `OutputBlockBuffer` to reshape the blocks back to `data_context.target_max_block_size` ## Related issues None ## Additional information Encountered this personally with 180GiB block, which would OOD --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
…t#58603) ## Description `HashShuffleAggregator` currently doesn't break big blocks into smaller blocks (or combine smaller blocks into bigger ones). For large blocks, this can be very problematic. This PR addresses this by using `OutputBlockBuffer` to reshape the blocks back to `data_context.target_max_block_size` ## Related issues None ## Additional information Encountered this personally with 180GiB block, which would OOD --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>
…t#58603) ## Description `HashShuffleAggregator` currently doesn't break big blocks into smaller blocks (or combine smaller blocks into bigger ones). For large blocks, this can be very problematic. This PR addresses this by using `OutputBlockBuffer` to reshape the blocks back to `data_context.target_max_block_size` ## Related issues None ## Additional information Encountered this personally with 180GiB block, which would OOD --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
…t#58603) ## Description `HashShuffleAggregator` currently doesn't break big blocks into smaller blocks (or combine smaller blocks into bigger ones). For large blocks, this can be very problematic. This PR addresses this by using `OutputBlockBuffer` to reshape the blocks back to `data_context.target_max_block_size` ## Related issues None ## Additional information Encountered this personally with 180GiB block, which would OOD --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
Description
HashShuffleAggregatorcurrently doesn't break big blocks into smaller blocks (or combine smaller blocks into bigger ones). For large blocks, this can be very problematic because then block being returned will spill to disk. Consider the following scenario:Why this is better
streaming_genbackpressure to avoid materializing the entire object.StreamingRepartition, but that is more work for the user.This PR addresses this by using
OutputBlockBufferto reshape the blocks back todata_context.target_max_block_size.Related issues
None
Additional information
Encountered this personally with 180GiB block, which would OOD