-
Notifications
You must be signed in to change notification settings - Fork 31.6k
[generate] skip compilation on cpu offload #37709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
SunMarc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks !
| # Exception 1: Some quantization methods do not support compilation | ||
| if getattr(self, "hf_quantizer", None) is not None: | ||
| can_compile &= self.hf_quantizer.is_compileable | ||
|
|
||
| # Exception 2: Never compile if the model is using CPU offload (as of April 2025, this results in a crash) | ||
| if hasattr(self, "hf_device_map"): | ||
| all_model_devices = set(self.hf_device_map.values()) | ||
| has_cpu_offload = "cpu" in all_model_devices and len(all_model_devices) > 1 | ||
| can_compile &= not has_cpu_offload | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for adding there cases, maybe we should also check for disk offload ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Adding too
zucchini-nlp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for fixing!
One question, do we expose "auto-compile" in generate to users through config or is this still under the hood? We might raise a small warning in case users forced "auto-compile" but the model doesn't meet all criteria
There is |
* skip compilation on cpu offload * add test * better logic * docstring * boolean logic * add disk offload check * warn users if compilation options are set but compilation doesn happen * fix test --------- Co-authored-by: Marc Sun <[email protected]>
What does this PR do?
torch.compile+ model CPU offload is resulting in crashes. It should work in theory, but it's not working atm.This PR: