-
Notifications
You must be signed in to change notification settings - Fork 7.3k
[Train] Add local mode support to Ray Train v2 (num_workers=0) #55487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
matthewdeng
merged 44 commits into
ray-project:master
from
xinyuangui2:use-fnutils-in-trainer
Sep 3, 2025
Merged
Changes from 5 commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
309eb3a
first commit
xinyuangui2 0750c05
only support single process for now
xinyuangui2 4701f1d
rename some classes
xinyuangui2 69ea86c
rename some classes
xinyuangui2 b91d65e
fix unittest and update experiment name
xinyuangui2 6ea2acf
merge master
xinyuangui2 ef4871b
move to distributedtrainer
xinyuangui2 0cad925
update config
xinyuangui2 c4de5bb
fix some namings
xinyuangui2 879ebbe
Merge branch 'master' into use-fnutils-in-trainer
xinyuangui2 09e245d
add unittests for trainers
xinyuangui2 4ee1dda
fix v2 import for xgboost config
xinyuangui2 0fe6cc2
remove unused changes
xinyuangui2 a4217c8
clean
xinyuangui2 e2a888d
Merge branch 'master' into use-fnutils-in-trainer
xinyuangui2 e0bb04a
move local tests to one single file
xinyuangui2 2772212
add build file for test_local_mode
xinyuangui2 1dfbdb8
fix config
xinyuangui2 ea7991e
fix
xinyuangui2 3e919f5
Merge branch 'master' into use-fnutils-in-trainer
xinyuangui2 046cbe3
remove local_mode_controller as parameter
xinyuangui2 28f46c9
Merge branch 'master' into use-fnutils-in-trainer
xinyuangui2 849e533
fix field
xinyuangui2 0e139be
remove unused changes
xinyuangui2 1052f3b
Merge branch 'master' into use-fnutils-in-trainer
xinyuangui2 329db7a
Merge branch 'master' into use-fnutils-in-trainer
xinyuangui2 bdf8391
merge master
xinyuangui2 6cb513e
Merge branch 'master' into use-fnutils-in-trainer
xinyuangui2 cd6da46
Merge branch 'master' into use-fnutils-in-trainer
xinyuangui2 b87a5c7
resolve comments
xinyuangui2 29a1ce6
add one more local model log
xinyuangui2 56f7557
resolve comments
xinyuangui2 416913e
refactor the xgboostConfig to avoid circular import
xinyuangui2 9881ee4
Revert "refactor the xgboostConfig to avoid circular import"
xinyuangui2 48a161a
exclude xgboosttrainer from local mode for now
xinyuangui2 09ea48f
Merge branch 'master' into use-fnutils-in-trainer
xinyuangui2 4203f72
resolve comments
xinyuangui2 c62ecb6
resolve comments
xinyuangui2 5aac3d0
remove unneeded parameters
xinyuangui2 3551146
add xgboost into local mode
xinyuangui2 62c5f46
Merge branch 'master' into use-fnutils-in-trainer
xinyuangui2 f5fcaee
resolve comments
xinyuangui2 34390e6
Update .gitignore
xinyuangui2 067777a
Apply suggestions from code review
matthewdeng File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
100 changes: 100 additions & 0 deletions
100
python/ray/train/v2/_internal/execution/torch_without_ray_train_controller.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,100 @@ | ||
| import logging | ||
| from typing import Any, Callable, Dict, Optional | ||
|
|
||
| from ray.data import DataIterator | ||
| from ray.train import Checkpoint, Result | ||
| from ray.train.trainer import GenDataset | ||
| from ray.train.v2._internal.execution.train_fn_utils import ( | ||
| TrainFnUtils, | ||
| get_train_fn_utils, | ||
| set_train_fn_utils, | ||
| ) | ||
| from ray.train.v2._internal.util import date_str | ||
| from ray.train.v2.api.context import ( | ||
| TrainContext as ExternalTrainContext, | ||
| TrainContextWithoutRayTrainController, | ||
| ) | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class TorchWithoutRayTrainControllerFnUtils(TrainFnUtils): | ||
| """TrainFnUtils for jobs launched without ray train controller. | ||
| This is more for testing purposes, and some functionality is missing. | ||
| """ | ||
|
|
||
| def __init__( | ||
| self, | ||
| experiment_name: str, | ||
| local_world_size: int, | ||
| local_rank: int, | ||
| dataset_shards: Optional[Dict[str, DataIterator]] = None, | ||
| ): | ||
| self._context = TrainContextWithoutRayTrainController( | ||
| experiment_name=experiment_name, | ||
| local_world_size=local_world_size, | ||
| local_rank=local_rank, | ||
| ) | ||
| self._dataset_shards = dataset_shards | ||
| self._last_metrics = None | ||
|
|
||
| def report( | ||
| self, | ||
| metrics: Dict[str, Any], | ||
| checkpoint: Optional[Checkpoint] = None, | ||
| checkpoint_dir_name: Optional[str] = None, | ||
| ) -> None: | ||
| self._last_metrics = metrics | ||
|
|
||
| def get_checkpoint(self) -> Optional[Checkpoint]: | ||
| return None | ||
|
|
||
| def get_dataset_shard(self, dataset_name: str) -> DataIterator: | ||
| assert ( | ||
| self._dataset_shards is not None and dataset_name in self._dataset_shards | ||
| ), f"Dataset shard {dataset_name} not found." | ||
| return self._dataset_shards[dataset_name] | ||
|
|
||
| def get_context(self) -> ExternalTrainContext: | ||
| return self._context | ||
|
|
||
| def is_running_with_ray_train_controller(self) -> bool: | ||
| return False | ||
|
|
||
| def _get_last_metrics(self) -> Optional[Dict[str, Any]]: | ||
| """return the last metrics reported by the training function. | ||
| This function should only be called by TorchBackendWithoutRayTrainController | ||
| """ | ||
| return self._last_metrics | ||
|
|
||
|
|
||
| class TorchBackendWithoutRayTrainController: | ||
| def __init__(self, datasets: Optional[Dict[str, GenDataset]] = None): | ||
| if datasets is not None: | ||
| datasets = {k: v() if callable(v) else v for k, v in datasets.items()} | ||
|
|
||
| self.local_world_size = 1 | ||
| self.local_rank = 0 | ||
|
|
||
| set_train_fn_utils( | ||
| TorchWithoutRayTrainControllerFnUtils( | ||
| experiment_name=self._get_experiment_name(), | ||
| local_world_size=self.local_world_size, | ||
| local_rank=self.local_rank, | ||
| dataset_shards=datasets, | ||
| ) | ||
| ) | ||
|
|
||
| def _get_experiment_name(self) -> str: | ||
| return f"train_without_ray_train_controller-{date_str()}" | ||
|
|
||
| def fit(self, train_func: Callable[[], None]) -> Result: | ||
| train_func() | ||
| train_fn_utils = get_train_fn_utils() | ||
| assert isinstance(train_fn_utils, TorchWithoutRayTrainControllerFnUtils) | ||
| return Result( | ||
| metrics=train_fn_utils._get_last_metrics(), | ||
| checkpoint=None, | ||
| path=None, | ||
| error=None, | ||
| ) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.