Add option for ao base configs #36526

drisspg · 2025-03-04T05:19:08Z

What does this PR do?

This PR updates the TorchAO integration, in release 0.9 we upgraded to specifying quantization config via configs and this allows for these configs to be serialized to the config.json when saving the model.

Main Changes

Enhanced Configuration Support:
- Extended TorchAoConfig to accept two types of configurations:
  - String-based configurations (original approach for BC concerns)
  - New AOBaseConfig object instances for more advanced configuration the new blessed path
Serialization & Deserialization:
- Added functionality to properly serialize and deserialize AOBaseConfig objects
- Allows configs to be saved to disk, shared between applications, and versioned for compatibility
- Implemented through new to_dict() and from_dict() methods

The serialization format uses a structured dictionary with:

{
    "_type": "ConfigClassName",  # Class name, not full module path
    "_version": 1,               # Version from the class's VERSION attribute
    "_data": {                   # Actual configuration parameters
        "param1": value1,
        "param2": value2,
        # Nested objects also get serialized with their types
    }
}

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

drisspg · 2025-03-12T22:58:49Z

src/transformers/quantizers/quantizer_torchao.py

        if version.parse(importlib.metadata.version("accelerate")) > version.parse("0.19.0"):
            from accelerate.utils import CustomDtype
-
+            return None


Not all configs map cleanly to CustomDtype, could someone tell me what should be used here

You can use use the closest dtype that the quantized model will have in average ! This is used to calculate approximately the size of the model so that we can dispatch the weights correctly on each gpus.

Makes sense, hmm this is a little hard to do in general without enforcing extra info, I can do a fuzzy match based of class.name since we have a pretty standard scheme there, is it okay if we default to torch.int8 when it doesn't match, or do things blow up? Since all our techniques are int8 or less

drisspg · 2025-03-13T02:37:19Z

cc @jerryzh168 @SunMarc @ArthurZucker

SunMarc

Thanks for the update ! Left a couple of comments. If it makes sense in the future, we can also force the user to use torchao >= 0.10.0 after a deprecation cycle, so that we don't have to maintain the old behavior.

SunMarc · 2025-03-13T10:21:18Z

src/transformers/quantizers/quantizer_torchao.py

        if version.parse(importlib.metadata.version("accelerate")) > version.parse("0.19.0"):
            from accelerate.utils import CustomDtype
-
+            return None


You can use use the closest dtype that the quantized model will have in average ! This is used to calculate approximately the size of the model so that we can dispatch the weights correctly on each gpus.

src/transformers/utils/quantization_config.py

MekkCyber · 2025-03-13T11:05:47Z

Thanks for the update !

src/transformers/utils/quantization_config.py

docs/source/en/quantization/torchao.md

SunMarc

Thanks for fixing everything. One last thing would be to make sure that autoquant still works with the latest torchao.

src/transformers/utils/quantization_config.py

jerryzh168 · 2025-03-19T05:04:48Z

docs/source/en/quantization/torchao.md

+
+> **⚠️ DEPRECATION WARNING**
+>
+> Starting with version 0.10.0, the string-based API for quantization configuration (e.g., `TorchAoConfig("int4_weight_only", group_size=128)`) is **deprecated** and will be removed in a future release.


I remember it's fine for transformers to always depend on the most recent torchao versions

If possible, we would like to support older version of torchao too but I feel like for now, it's fine for the user to download the most recent version of torchao.

jerryzh168 · 2025-03-19T05:08:06Z

src/transformers/quantizers/quantizer_torchao.py

+                if isinstance(quant_type, AOBaseConfig):
+                    # Extract size digit using fuzzy match on the class name
+                    config_name = quant_type.__class__.__name__
+                    size_digit = fuzzy_match_size(config_name)


this seems a bit fragile? e.g. what would it look like for mx, fp4 etc.

jerryzh168 · 2025-03-19T05:08:37Z

src/transformers/quantizers/quantizer_torchao.py

+                    if size_digit == "4":
+                        return CustomDtype.INT4
+                    else:
+                        # Default to int8
+                        return torch.int8


I'm wondering if these are really needed, cc @SunMarc when are these used?

They are used when calculating the appropriate device_map (e.g. to know how to dispatch the layers in the different gpus). This is needed in torchao case as the model architecture is not changed prior to calculating the device_map.

SunMarc

Thanks for iterating !

BenjaminBossan · 2025-03-20T10:51:47Z

Hey @drisspg @SunMarc, after this PR I get an error with this code:

from transformers import AutoModelForCausalLM, TorchAoConfig

quantization_config = TorchAoConfig(quant_type="int8_weight_only")
model = AutoModelForCausalLM.from_pretrained(
    "facebook/opt-125m", device_map=0, quantization_config=quantization_config
)

It raises:

    model = AutoModelForCausalLM.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/name/work/forks/transformers/src/transformers/models/auto/auto_factory.py", line 573, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/name/work/forks/transformers/src/transformers/modeling_utils.py", line 272, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/name/work/forks/transformers/src/transformers/modeling_utils.py", line 4442, in from_pretrained
    ) = cls._load_pretrained_model(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/name/work/forks/transformers/src/transformers/modeling_utils.py", line 4871, in _load_pretrained_model
    disk_offload_index, cpu_offload_index = _load_state_dict_into_meta_model(
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/name/anaconda3/envs/peft/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/name/work/forks/transformers/src/transformers/modeling_utils.py", line 853, in _load_state_dict_into_meta_model
    hf_quantizer.create_quantized_param(
  File "/home/name/work/forks/transformers/src/transformers/quantizers/quantizer_torchao.py", line 239, in create_quantized_param
    quantize_(module, self.quantization_config.get_apply_tensor_subclass(), set_inductor_config=False)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'TorchAoConfig' object has no attribute 'get_apply_tensor_subclass'

This is because get_apply_tensor_subclass was renamed to get_quantize_config but the old name is still being used:

https://github.com/huggingface/transformers/blob/1ddb64937cd31bd25df3213b1dc275396ef695cd/src/transformers/quantizers/quantizer_torchao.py#L239C56-L239C81

If there is no important reason to use the new name, I would suggest to revert to the old name because other libraries may also rely on that name (for instance PEFT would otherwise need to be updated).

SunMarc · 2025-03-20T11:09:52Z

I will do a quick pr to revert to the old name !

SunMarc · 2025-03-20T11:16:12Z

Fixed it here #36849

Summary: We add the new torchao API support in hf transformers: huggingface#36526 one thing that's missing is it does not account for int4 weight only quant config only works on cuda, this PR adds back the workaround also updated the version requirement to > 0.9 temporarily so that we can use the torchao nightly before 0.10 is released, we should chagne this back before land Test Plan: local test: https://gist.github.com/jerryzh168/0e749d0dab40e2a62a7f2e48639f77b5 (we can setup deserialization test later when we can quantize a small model and host in a stable place like TinyLlama/TinyLlama-1.1B-Chat-v1.0) Reviewers: Subscribers: Tasks: Tags:

airMeng · 2025-03-27T08:30:57Z

src/transformers/utils/quantization_config.py

+        assert (
+            len(quant_type) == 1 and "default" in quant_type
+        ), "Expected only one key 'default' in quant_type dictionary"
+        quant_type = quant_type["default"]


This one breaks

transformers/tests/quantization/torchao_integration/test_torchao.py

Line 345 in 49b5ab6

def check_serialization_expected_output(self, device, expected_output):

drisspg force-pushed the ao-base-configs branch from 53876b5 to d1660b9 Compare March 4, 2025 05:28

drisspg mentioned this pull request Mar 5, 2025

Config serde pytorch/ao#1806

Closed

drisspg force-pushed the ao-base-configs branch from d1660b9 to 9becf07 Compare March 5, 2025 21:18

drisspg mentioned this pull request Mar 6, 2025

Torchao vllm-project/vllm#14231

Merged

2 tasks

drisspg force-pushed the ao-base-configs branch 2 times, most recently from f527770 to c39df74 Compare March 12, 2025 22:54

drisspg commented Mar 12, 2025

View reviewed changes

drisspg force-pushed the ao-base-configs branch 3 times, most recently from 79c7b13 to 2d02d5d Compare March 13, 2025 00:08

drisspg marked this pull request as ready for review March 13, 2025 00:08

SunMarc reviewed Mar 13, 2025

View reviewed changes

MekkCyber reviewed Mar 13, 2025

View reviewed changes

src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved

src/transformers/utils/quantization_config.py Show resolved Hide resolved

vkuzo reviewed Mar 13, 2025

View reviewed changes

src/transformers/utils/quantization_config.py Show resolved Hide resolved

andrewor14 reviewed Mar 13, 2025

View reviewed changes

docs/source/en/quantization/torchao.md Show resolved Hide resolved

drisspg force-pushed the ao-base-configs branch 12 times, most recently from e5e7c32 to fa21f9a Compare March 17, 2025 21:08

drisspg requested a review from SunMarc March 18, 2025 03:33

drisspg requested a review from MekkCyber March 18, 2025 03:33

SunMarc reviewed Mar 18, 2025

View reviewed changes

src/transformers/utils/quantization_config.py Show resolved Hide resolved

Add option for ao base configs

53e7f2c

drisspg force-pushed the ao-base-configs branch from fa21f9a to 53e7f2c Compare March 18, 2025 23:46

jerryzh168 reviewed Mar 19, 2025

View reviewed changes

SunMarc approved these changes Mar 19, 2025

View reviewed changes

SunMarc merged commit e8d9603 into huggingface:main Mar 19, 2025
21 checks passed

BenjaminBossan mentioned this pull request Mar 20, 2025

[torchao] revert to get_apply_tensor_subclass #36849

Merged

jerryzh168 mentioned this pull request Mar 25, 2025

Add device workaround for int4 weight only quantization after API update #36980

Merged

airMeng reviewed Mar 27, 2025

View reviewed changes

airMeng mentioned this pull request Mar 27, 2025

Latest TorchAO config breaks serialization #37035

Closed

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025

Add option for ao base configs (huggingface#36526)

9b4ed3d

Add option for ao base configs #36526

Add option for ao base configs #36526

Uh oh!

Conversation

drisspg commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Main Changes

Before submitting

Who can review?

Uh oh!

drisspg Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drisspg commented Mar 13, 2025

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MekkCyber commented Mar 13, 2025

Uh oh!

Uh oh!

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BenjaminBossan commented Mar 20, 2025

Uh oh!

SunMarc commented Mar 20, 2025

Uh oh!

SunMarc commented Mar 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

drisspg commented Mar 4, 2025 •

edited

Loading

drisspg Mar 12, 2025 •

edited

Loading

jerryzh168 Mar 19, 2025 •

edited

Loading