Skip to content

[Train] Update Lightning RayDDPStrategy docstring#40376

Merged
matthewdeng merged 1 commit intoray-project:masterfrom
woshiyyya:train/update_lightning_ddp_args_docstring
Oct 18, 2023
Merged

[Train] Update Lightning RayDDPStrategy docstring#40376
matthewdeng merged 1 commit intoray-project:masterfrom
woshiyyya:train/update_lightning_ddp_args_docstring

Conversation

@woshiyyya
Copy link
Member

@woshiyyya woshiyyya commented Oct 16, 2023

Why are these changes needed?

Ray Train will start a distributed group with the arguments specified in TorchConfig before the training function. if the distributed process group has already been fired up, Lightning will ignore some arguments(backend, timeout).

We need to point the users to specify these arguments in TorchConfig instead of RayDDPStrategy.

Related issue number

Closes #36315

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
@woshiyyya woshiyyya marked this pull request as ready for review October 18, 2023 23:04
For a full list of initialization arguments, please refer to:
https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.strategies.DDPStrategy.html

Note that `process_group_backend`, `timeout`, and `start_method` are disabled here,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also parse the args and print a warning if these values are set.

@matthewdeng matthewdeng merged commit 779c08a into ray-project:master Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Ray Train] Explain how to set timeout when using PyTorch Lightning Trainer

2 participants