Bias quantization for prequantized checkpoints #1821

jackzhxng · 2025-03-03T23:15:26Z

Following up from a chat with @jainapurva

For private internal model enablement purposes, we would like to request support for bias quantization in prequantized checkpoint loading. At the moment we are doing manual source transformation after loading the prequantized checkpoint here https://github.com/pytorch/executorch/blob/main/examples/models/llama/source_transformation/pre_quantization.py#L40 into your deprecated Int8DynActInt4WeightLinear, which doesn't support bias quantization.

The text was updated successfully, but these errors were encountered:

**Summary:** Previously, when we see a linear with bias, we simply do not swap it to `Int8DynActInt4WeightLinear` and leave it as is. Now we do swap it, but bias is not quantized and passed to F.linear in full precision. Fixes #1821 **Test Plan:** python test/quantization/test_quant_api.py -k test_8da4w_quantizer_linear_bias

jackzhxng mentioned this issue Mar 3, 2025

Enable bias quantization for prequantized checkpoints pytorch/executorch#8903

Closed

jainapurva added enhancement New feature or request quantize labels Mar 5, 2025

andrewor14 mentioned this issue Mar 5, 2025

Add bias support for Int8DynActInt4WeightLinear #1845

Merged

andrewor14 self-assigned this Mar 6, 2025

andrewor14 closed this as completed in #1845 Mar 10, 2025

andrewor14 closed this as completed in f64d5a1 Mar 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bias quantization for prequantized checkpoints #1821

Bias quantization for prequantized checkpoints #1821

jackzhxng commented Mar 3, 2025

Bias quantization for prequantized checkpoints #1821

Bias quantization for prequantized checkpoints #1821

Comments

jackzhxng commented Mar 3, 2025