[train] Improve error message if users call training function utils outside of a Ray Train worker#57863
Conversation
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a decorator requires_train_worker to enforce that certain functions are only called within a Ray Train worker process. This enhances error messaging and prevents misuse of training utilities outside of the intended environment. The changes include adding the decorator to relevant functions in collective.py, train_fn_utils.py, and torch/train_loop_utils.py, and updating a test case in test_api_migrations.py to reflect the new error handling.
TimothySeah
left a comment
There was a problem hiding this comment.
lgtm with a few nits
| os.environ["CUDA_VISIBLE_DEVICES"] = "2,3" | ||
| ray.get_gpu_ids() == [2] | ||
| torch.cuda.is_available() == True | ||
| get_device() == torch.device("cuda:0") |
There was a problem hiding this comment.
Is this supposed to be cuda:2? The other examples make it seem like it should be cuda:<get_gpu_ids()>
There was a problem hiding this comment.
cuda:0 is the "logical cuda device", which points to the 0th index in the CUDA_VISIBLE_DEVICES string: CUDA_VISIBLE_DEVICES[0] == "2"
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
…utside of a Ray Train worker (ray-project#57863) Introduce a decorator to mark functions that require running inside a worker process spawned by Ray Train. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: xgui <xgui@anyscale.com>
…utside of a Ray Train worker (ray-project#57863) Introduce a decorator to mark functions that require running inside a worker process spawned by Ray Train. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com>
…utside of a Ray Train worker (ray-project#57863) Introduce a decorator to mark functions that require running inside a worker process spawned by Ray Train. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
…utside of a Ray Train worker (ray-project#57863) Introduce a decorator to mark functions that require running inside a worker process spawned by Ray Train. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
…utside of a Ray Train worker (ray-project#57863) Introduce a decorator to mark functions that require running inside a worker process spawned by Ray Train. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com>
…utside of a Ray Train worker (ray-project#57863) Introduce a decorator to mark functions that require running inside a worker process spawned by Ray Train. --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
Description
Introduce a decorator to mark functions that require running inside a worker process spawned by Ray Train.
Additional information
Before:
After: