Skip to content

[Data] Replace number of bundles with proper number of blocks#58030

Merged
alexeykudinkin merged 20 commits intomasterfrom
ak/bndl-blk-fix
Oct 27, 2025
Merged

[Data] Replace number of bundles with proper number of blocks#58030
alexeykudinkin merged 20 commits intomasterfrom
ak/bndl-blk-fix

Conversation

@alexeykudinkin
Copy link
Contributor

Description

Currently, we implicitly assume that RefBundle holds exactly 1 block. That's not a safe assumption, and this change is addressing that by explicitly referring to number of blocks instead

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

@alexeykudinkin alexeykudinkin requested review from a team as code owners October 23, 2025 01:06
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a good step towards making block counting more accurate by explicitly using the number of blocks instead of assuming one block per bundle. The refactoring is applied across many files. However, I've found a few critical issues where the new counting logic is implemented incorrectly, which could lead to bugs in metrics, backpressure, and autoscaling. I've also included some minor suggestions for improving code style and efficiency.

cursor[bot]

This comment was marked as outdated.

@ray-gardener ray-gardener bot added the data Ray Data-related issues label Oct 23, 2025
@alexeykudinkin alexeykudinkin added the go add ONLY when ready to merge, run all tests label Oct 23, 2025
@ray.remote
def test_import():
import file_module

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change for this and python/ray/tests/test_runtime_env_working_dir_3.py?

Comment on lines 612 to +613
self._bundle_buffer_size = 0
self._bundle_buffer_size_bytes = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between these 2 vars?

def internal_queue_size(self) -> int:
return len(self._buffer)
def internal_queue_num_blocks(self) -> int:
return sum(len(b.block_refs) for b in self._buffer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if b is type RefBundle, you can do len(b)

@alexeykudinkin alexeykudinkin enabled auto-merge (squash) October 27, 2025 05:05
@github-actions github-actions bot disabled auto-merge October 27, 2025 05:05
@alexeykudinkin alexeykudinkin enabled auto-merge (squash) October 27, 2025 05:55
@github-actions github-actions bot disabled auto-merge October 27, 2025 17:16
cursor[bot]

This comment was marked as outdated.

…bundle

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
… num bytes held inside internal queue;

Revisited `InternalQueueOperatorMixin.internal_queue_size` to remove assumption that every bundle holds just 1 block

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Fixing tests

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@alexeykudinkin alexeykudinkin enabled auto-merge (squash) October 27, 2025 20:02
@alexeykudinkin alexeykudinkin merged commit 7dd8eb2 into master Oct 27, 2025
7 checks passed
@alexeykudinkin alexeykudinkin deleted the ak/bndl-blk-fix branch October 27, 2025 20:31
@alexeykudinkin alexeykudinkin restored the ak/bndl-blk-fix branch October 27, 2025 20:37
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…oject#58030)

## Description

Currently, we implicitly assume that `RefBundle` holds exactly 1 block.
That's not a safe assumption, and this change is addressing that by
explicitly referring to number of blocks instead

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…oject#58030)

## Description

Currently, we implicitly assume that `RefBundle` holds exactly 1 block.
That's not a safe assumption, and this change is addressing that by
explicitly referring to number of blocks instead

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…oject#58030)

## Description

Currently, we implicitly assume that `RefBundle` holds exactly 1 block.
That's not a safe assumption, and this change is addressing that by
explicitly referring to number of blocks instead

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…oject#58030)

## Description

Currently, we implicitly assume that `RefBundle` holds exactly 1 block.
That's not a safe assumption, and this change is addressing that by
explicitly referring to number of blocks instead

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

3 participants