Skip to content

[train] Improve error message if users call training function utils outside of a Ray Train worker#57863

Merged
justinvyu merged 5 commits intoray-project:masterfrom
justinvyu:train_fn_utils_msg
Oct 22, 2025
Merged

[train] Improve error message if users call training function utils outside of a Ray Train worker#57863
justinvyu merged 5 commits intoray-project:masterfrom
justinvyu:train_fn_utils_msg

Conversation

@justinvyu
Copy link
Contributor

Description

Introduce a decorator to mark functions that require running inside a worker process spawned by Ray Train.

Additional information

Before:

>>> import ray.train
>>> ray.train.get_context()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/justin/Developer/ray/python/ray/train/v2/api/train_fn_utils.py", line 153, in get_context
    return get_train_fn_utils().get_context()
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/justin/Developer/ray/python/ray/train/v2/_internal/execution/train_fn_utils.py", line 264, in get_train_fn_utils
    raise RuntimeError("TrainFnUtils has not been initialized.")
RuntimeError: TrainFnUtils has not been initialized.

After:

>>> import ray.train
>>> ray.train.get_context()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/justin/Developer/ray/python/ray/train/v2/_internal/util.py", line 281, in _wrapped_fn
    raise RuntimeError(
RuntimeError: `get_context` cannot be used outside of a Ray Train training function. You are calling this API from the driver or another non-training process. These utilities are only available within a function launched by `trainer.fit()`.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
cursor[bot]

This comment was marked as outdated.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a decorator requires_train_worker to enforce that certain functions are only called within a Ray Train worker process. This enhances error messaging and prevents misuse of training utilities outside of the intended environment. The changes include adding the decorator to relevant functions in collective.py, train_fn_utils.py, and torch/train_loop_utils.py, and updating a test case in test_api_migrations.py to reflect the new error handling.

Copy link
Contributor

@TimothySeah TimothySeah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with a few nits

os.environ["CUDA_VISIBLE_DEVICES"] = "2,3"
ray.get_gpu_ids() == [2]
torch.cuda.is_available() == True
get_device() == torch.device("cuda:0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be cuda:2? The other examples make it seem like it should be cuda:<get_gpu_ids()>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuda:0 is the "logical cuda device", which points to the 0th index in the CUDA_VISIBLE_DEVICES string: CUDA_VISIBLE_DEVICES[0] == "2"

@ray-gardener ray-gardener bot added the train Ray Train Related Issue label Oct 18, 2025
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
@justinvyu justinvyu enabled auto-merge (squash) October 22, 2025 19:22
@github-actions github-actions bot disabled auto-merge October 22, 2025 19:23
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Oct 22, 2025
@justinvyu justinvyu merged commit e4e9399 into ray-project:master Oct 22, 2025
8 checks passed
@justinvyu justinvyu deleted the train_fn_utils_msg branch October 22, 2025 21:03
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 27, 2025
…utside of a Ray Train worker (ray-project#57863)

Introduce a decorator to mark functions that require running inside a
worker process spawned by Ray Train.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…utside of a Ray Train worker (ray-project#57863)

Introduce a decorator to mark functions that require running inside a
worker process spawned by Ray Train.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…utside of a Ray Train worker (ray-project#57863)

Introduce a decorator to mark functions that require running inside a
worker process spawned by Ray Train.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…utside of a Ray Train worker (ray-project#57863)

Introduce a decorator to mark functions that require running inside a
worker process spawned by Ray Train.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Blaze-DSP pushed a commit to Blaze-DSP/ray that referenced this pull request Dec 18, 2025
…utside of a Ray Train worker (ray-project#57863)

Introduce a decorator to mark functions that require running inside a
worker process spawned by Ray Train.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…utside of a Ray Train worker (ray-project#57863)

Introduce a decorator to mark functions that require running inside a
worker process spawned by Ray Train.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests train Ray Train Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants