Skip to content

Autoquant #82

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 25, 2024
Merged

Autoquant #82

merged 1 commit into from
Mar 25, 2024

Conversation

HDCharles
Copy link
Contributor

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer.

Test Plan: python test/test.py -k "autoquant"

also tested on SAM and SDXL
pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]

Summary: Adding autoquantization functionality, using hte do_quant api
we can test kernel speeds and pick the best quantization type (or no
quantization) for each layer.

Test Plan: python test/test.py -k "autoquant"

also tested on SAM and SDXL
pytorch-labs/segment-anything-fast#114
HDCharles/sdxl-fast@8d9942a

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 25, 2024
@HDCharles HDCharles merged commit 8119319 into main Mar 25, 2024
cpuhrsch added a commit that referenced this pull request Mar 26, 2024
This reverts commit 8119319.
@cpuhrsch cpuhrsch mentioned this pull request Mar 26, 2024
cpuhrsch added a commit that referenced this pull request Mar 26, 2024
from . import dtypes
from .quantization.quant_api import apply_dynamic_quant
from .quantization.quant_api import apply_weight_only_int8_quant

__all__ = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should make all of these public right away. Like apply_dynamic_quant is a bit of a duplicate of change_linear_weights_to_int8_dqtensors and swap_conv2d_1x1_to_linear might be something we just want to do automatically instead of making it a toplevel API.

Also dtypes is twice.

Is it possible to only add autoquant for now?

@@ -136,10 +143,14 @@ def apply_dynamic_quant(model, filter_fn=None):


def _get_subclass_inserter(cls, **kwargs):

# pyre-fixme[53]: Captured variable `cls` is not annotated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the pyre-fixmes anymore

cpuhrsch added a commit that referenced this pull request Apr 1, 2024
This reverts commit a6c7367.
cpuhrsch pushed a commit that referenced this pull request Apr 1, 2024
Summary: Adding autoquantization functionality, using hte do_quant api
we can test kernel speeds and pick the best quantization type (or no
quantization) for each layer.

Test Plan: python test/test.py -k "autoquant"

also tested on SAM and SDXL
pytorch-labs/segment-anything-fast#114
HDCharles/sdxl-fast@8d9942a

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
cpuhrsch added a commit that referenced this pull request Apr 5, 2024
dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024
Summary: Adding autoquantization functionality, using hte do_quant api
we can test kernel speeds and pick the best quantization type (or no
quantization) for each layer.

Test Plan: python test/test.py -k "autoquant"

also tested on SAM and SDXL
pytorch-labs/segment-anything-fast#114
HDCharles/sdxl-fast@8d9942a

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024
dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants