Skip to content

Conversation

@MekkCyber
Copy link
Contributor

What does this PR do?

Since HQQ overrides the load_state_dict method for HQQLinear, it directly loads both the weight and bias parameters. This differs from our approach, where we iterate through the parameters one by one and load the bias separately from the weights.

This PR updates the behavior to simply ignore the bias parameter, assuming it was already loaded alongside the weights in the case of pre-quantized models.

@github-actions github-actions bot marked this pull request as draft April 15, 2025 13:16
@github-actions
Copy link
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@MekkCyber MekkCyber requested a review from SunMarc April 15, 2025 13:16
@MekkCyber MekkCyber marked this pull request as ready for review April 15, 2025 13:17
Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM ! Let's add a small test if this isn't tested

@mobicham
Copy link
Contributor

Thank you @MekkCyber. Can you also change this:
https://github.com/huggingface/transformers/blob/main/tests/quantization/hqq/test_hqq.py#L85
to facebook/opt-125m
The current hqq test use a model without a bias.

@MekkCyber
Copy link
Contributor Author

we already have this test : https://github.com/huggingface/transformers/blob/main/tests/quantization/hqq/test_hqq.py#L151, we just need to add the pre_quantized case

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@mobicham
Copy link
Contributor

we already have this test : https://github.com/huggingface/transformers/blob/main/tests/quantization/hqq/test_hqq.py#L151, we just need to add the pre_quantized case

It doesn't check for serialization of a model with a bias. If it did, the tests would have failed actually

@mobicham
Copy link
Contributor

Can you please run this script:
https://gist.github.com/mobicham/701dd564c52590203ee09631425ad797

If it doesn't throw an error, the fix works as expected.

@MekkCyber
Copy link
Contributor Author

https://github.com/huggingface/transformers/blob/main/tests/quantization/hqq/test_hqq.py#L151, we just need to add the pre_quantized case

It doesn't check for serialization of a model with a bias. If it did, the tests would have failed actually

yep that's what i meant, we only need to add the case of pre_quantized (serialized) models

@MekkCyber
Copy link
Contributor Author

The snippet works well

@mobicham
Copy link
Contributor

Awesome, thank you again @MekkCyber 🙏

@MekkCyber MekkCyber merged commit 7752e74 into main Apr 16, 2025
21 checks passed
@MekkCyber MekkCyber deleted the fix_hqq branch April 16, 2025 11:58
cyr0930 pushed a commit to cyr0930/transformers that referenced this pull request Apr 18, 2025
@mobicham
Copy link
Contributor

@MekkCyber unfortunately it seems that it's not fully resolved. For example, when I tried to load a quantized Qwen model that has a bias:

import torch
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor

model_id = "mobiuslabsgmbh/Qwen2.5-VL-3B-Instruct_4bitgs64_hqq_hf"
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(model_id, torch_dtype=torch.float16, device_map="cuda:0")
File /opt/conda/lib/python3.11/site-packages/accelerate/utils/modeling.py:283, in set_module_tensor_to_device(module, tensor_name, device, value, dtype, fp16_statistics, tied_params_map)
    280     return
    282 if old_value.device == torch.device("meta") and device not in ["meta", torch.device("meta")] and value is None:
--> 283     raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to put in on {device}.")
    285 param = module._parameters[tensor_name] if tensor_name in module._parameters else None
    286 param_cls = type(param)

ValueError: bias is on the meta device, we need a `value` to put in on cuda:0.

@MekkCyber
Copy link
Contributor Author

Thanks for reporting @mobicham, will take a look

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants