Skip to content

do not submit - just provide comparison baseline#63388

Closed
TimothySeah wants to merge 1 commit into
ray-project:masterfrom
TimothySeah:tseah/test-no-split-do-not-submit
Closed

do not submit - just provide comparison baseline#63388
TimothySeah wants to merge 1 commit into
ray-project:masterfrom
TimothySeah:tseah/test-no-split-do-not-submit

Conversation

@TimothySeah

@TimothySeah TimothySeah commented May 16, 2026

Copy link
Copy Markdown
Contributor

See #63309 for more details

Signed-off-by: Timothy Seah <tseah@anyscale.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the ray.train.DataConfig in ray_dataloader_factory.py by hardcoding datasets_to_split to an empty list. Feedback indicates that this change disables default sharding, which may lead to incorrect performance metrics or resource issues in distributed training; it is recommended to make this setting configurable or provide a clear explanation for disabling sharding.


def get_ray_data_config(self) -> ray.train.DataConfig:
return ray.train.DataConfig(
datasets_to_split=[],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding datasets_to_split=[] disables the default sharding behavior in Ray Train. This causes every training worker to process the entire dataset rather than a shard, which is generally not the intended behavior for distributed training benchmarks and can lead to excessive resource consumption or incorrect performance metrics. If this is for a specific baseline comparison, it would be better to make this configurable in RayDataConfig or add a comment explaining why sharding is being disabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant