-
-
Notifications
You must be signed in to change notification settings - Fork 12.6k
[Distributed][refactor] Add base class for device-specific communicator #11324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
c4f0481 to
eeb5aae
Compare
eeb5aae to
6e6501a
Compare
1a977d3 to
6de2b98
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
b5d2063 to
f03eedb
Compare
242fb40 to
b085f82
Compare
cadfb32 to
cc6d46a
Compare
cc6d46a to
464594a
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
464594a to
1e986a0
Compare
1e986a0 to
79a5eb0
Compare
79a5eb0 to
6851cd0
Compare
|
CI failed due to network issues. This pr is ready for review now, thanks in advance! @youkaichao |
Yikun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@youkaichao Would you mind taking another look?
Or if you are worried about the code changes being too big and want us to split the PR, for example:
- a separate PR for CommunicatorBase and interface change
- Adapts cuda/rocm, hpu, tpu, xpu separately and split to 3 followup PRs
Please let us know, we'd like to do so.
|
sorry I'm super busy recently. will review this week. |
Signed-off-by: Mengqing Cao <[email protected]>
6851cd0 to
26dcd5d
Compare
Signed-off-by: Mengqing Cao <[email protected]>
Signed-off-by: Mengqing Cao <[email protected]>
| f"{current_platform.device_type}:{local_rank}") | ||
| else: | ||
| import torch_xla.core.xla_model as xm | ||
| self.device = xm.xla_device(local_rank) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @youkaichao , I'm not sure if the initialize of self.device is correct for neuron, openvino and tpu devices. Appreciate your help!
part of #11162
This PR provide a base class
CommunicatorBasefor device-specific communicators (HpuCommunicator,TpuCommunicatorandXpuCommunicator), avoiding the cumbersome dispatch in each communicator operator ofGroupCoordinator, e.g.,https://github.com/vllm-project/vllm/blob/main/vllm/distributed/parallel_state.py#L342-L353
In this pr, the communication-related classes are organized as the following fig. This allows new backends to implement their own communicators and dynamic dispatch them in the platform.
