-
Notifications
You must be signed in to change notification settings - Fork 260
Autoquant #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autoquant #82
Conversation
Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
from . import dtypes | ||
from .quantization.quant_api import apply_dynamic_quant | ||
from .quantization.quant_api import apply_weight_only_int8_quant | ||
|
||
__all__ = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should make all of these public right away. Like apply_dynamic_quant
is a bit of a duplicate of change_linear_weights_to_int8_dqtensors
and swap_conv2d_1x1_to_linear
might be something we just want to do automatically instead of making it a toplevel API.
Also dtypes is twice.
Is it possible to only add autoquant for now?
@@ -136,10 +143,14 @@ def apply_dynamic_quant(model, filter_fn=None): | |||
|
|||
|
|||
def _get_subclass_inserter(cls, **kwargs): | |||
|
|||
# pyre-fixme[53]: Captured variable `cls` is not annotated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the pyre-fixmes anymore
Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
This reverts commit 8119319.
Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer.
Test Plan: python test/test.py -k "autoquant"
also tested on SAM and SDXL
pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a
Reviewers:
Subscribers:
Tasks:
Tags:
[ghstack-poisoned]