-
Notifications
You must be signed in to change notification settings - Fork 31.7k
(Part 2) feat: allow for tp_size attr for tplizing the model #37054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
SunMarc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, left a couple of comments
| tp_size (`str`, *optional*): | ||
| A torch tensor parallel degree. If not provided would default to world size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed for this specific PR. I don't know if we want to add this option yet cc @ArthurZucker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have it in a separate PR as well, however, its needed to support TP + FSDP/DDP.
I don't know if we want to add this option yet
Sure, @ArthurZucker Let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SunMarc Would appreciate it here, been looking at enabling TP + FSDP and this is exactly what I used myself.
cc @ArthurZucker
|
Please fix the conflits and I will merge this PR ! |
a33e9ef to
b7abb2a
Compare
|
@SunMarc Fixed the conflicts and the failing test seem to be unrelated. Thanks |
|
@SunMarc looks like even the recently merged commit is failing for this testcase, so its totally unrelated to this PR. |
SunMarc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor nits, thanks !
| import torch | ||
|
|
||
| from transformers import AutoModelForCausalLM | ||
|
|
||
|
|
||
| m2 = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0", tp_plan=None) | ||
| m = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0", tp_plan="auto") | ||
|
|
||
| ft = m.lm_head.weight.full_tensor().to("cpu") | ||
| assert torch.equal(ft, m2.lm_head.weight.to("cpu")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add this in the tensor_parallel test file instead of having this here. Please also add a description of what you are trying to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, this file is not intended for this PR, hence removed, thanks.
| generation_config = kwargs.pop("generation_config", None) | ||
| gguf_file = kwargs.pop("gguf_file", None) | ||
| tp_plan = kwargs.pop("tp_plan", None) | ||
| tp_size = kwargs.pop("tp_size", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's raise an error if tp_size was set but not tp_plan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SunMarc Addressed this comment, thank you.
307fc4e to
33af129
Compare
S1ro1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
SunMarc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks !
ccf1889 to
43bb071
Compare
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
|
@SunMarc rebased the branch, are we waiting on something? |
|
Waiting for the tests to pass ;) I will merge it as soon as the ci is green ! |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
…face#37054) * feat: custom tp_size, new transformers tp interface Signed-off-by: Mehant Kammakomati <[email protected]> * fix: review cmt - error when tp_plan not set for tp_size Signed-off-by: Mehant Kammakomati <[email protected]> * fix: nit in docs Signed-off-by: Mehant Kammakomati <[email protected]> --------- Signed-off-by: Mehant Kammakomati <[email protected]> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Matej Sirovatka <[email protected]>
…face#37054) * feat: custom tp_size, new transformers tp interface Signed-off-by: Mehant Kammakomati <[email protected]> * fix: review cmt - error when tp_plan not set for tp_size Signed-off-by: Mehant Kammakomati <[email protected]> * fix: nit in docs Signed-off-by: Mehant Kammakomati <[email protected]> --------- Signed-off-by: Mehant Kammakomati <[email protected]> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Matej Sirovatka <[email protected]>
What does this PR do?
Discussed at huggingface/accelerate#3457
tp_sizeto allow for TP sharding apart from world sizetp_sizean attribute of the model only initialized after TP sharding completed which can be an indicator if the model has undergone tp sharding for usage in accelerate. (discussed with @SunMarc)tp_sizefrom train arguments, since from now on it is to perform TP training only if the model has undergone TP sharding already.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.