-
Notifications
You must be signed in to change notification settings - Fork 31.7k
Don't accidentally mutate the base_model_tp_plan #36677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Lots of failing checks, and I think these are either Hub timeouts or disguised Hub timeouts |
|
cc @SunMarc @muellerzr |
gante
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
@IlyasMoutawwakil is looking at it ! |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment for robustness!
src/transformers/modeling_utils.py
Outdated
| if not self._tp_plan: | ||
| if isinstance(self.config.base_model_tp_plan, dict): | ||
| self._tp_plan = self.config.base_model_tp_plan.copy() | ||
| else: | ||
| self._tp_plan = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a heads-up that if a model which is above the base model in the class hierarchy does not have a tp_plan, we are going to wrongly add again the base_model_plan.
It's not the case as of now in the library though. But maybe
if self.base_model is self:
self._pp_plan = self.config.base_model_pp_plan.copy() if self.config.base_model_pp_plan is not None else None
self._tp_plan = self.config.base_model_tp_plan.copy() if self.config.base_model_tp_plan is not None else {}
else:
self._tp_plan = self._tp_plan or {}
for name, module in self.named_children():
if plan := getattr(module, "_tp_plan", None):
self._tp_plan.update({f"{name}.{k}": v for k, v in plan.items()})is more robust and should still work with composite models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry i just edited haha - the part with pp is less important as we don't modify the dict later
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice thanks!
All credit goes to @gante and @ArthurZucker for figuring this one out and I'm just swooping in and stealing credit because I want this to be merged quickly!
If no
tp_planis provided, our code usesself.config.base_model_tp_planas the default. The problem is that this is a mutable, instance-level dict, and we do in fact mutate it, which causes the instance dict to get very large and weird over time. We resolve the issue by correctly copying the base dict as an instance attribute instead of mutating it in-place.