Skip to content

Commit 2a3f748

Browse files
authored
[docs] gptq formatting fix (#43216)
[docs] gptq formatting
1 parent 3797426 commit 2a3f748

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

docs/source/en/quantization/gptq.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,15 +18,15 @@ rendered properly in your Markdown viewer.
1818

1919
The [GPT-QModel](https://github.com/ModelCloud/GPTQModel) project (Python package `gptqmodel`) implements the GPTQ algorithm, a post-training quantization technique where each row of the weight matrix is quantized independently to find a version of the weights that minimizes the error. These weights are quantized to int4, but they're restored to fp16 on the fly during inference. This can save memory usage by 4x because the int4 weights are dequantized in a fused kernel rather than a GPU's global memory. Inference is also faster because a lower bitwidth takes less time to communicate.
2020

21-
AutoGPTQ is no longer supported in Transformers. Install GPT-QModel] instead.
21+
AutoGPTQ is no longer supported in Transformers. Install GPT-QModel instead.
2222

2323
Install Accelerate, Transformers and Optimum first.
2424

2525
```bash
2626
pip install --upgrade accelerate optimum transformers
2727
```
2828

29-
Then run the command below to install GPT-QModel].
29+
Then run the command below to install GPT-QModel.
3030

3131
```bash
3232
pip install gptqmodel --no-build-isolation
@@ -107,13 +107,13 @@ from transformers import AutoModelForCausalLM, GPTQConfig
107107
model = AutoModelForCausalLM.from_pretrained("{your_username}/opt-125m-gptq", device_map="auto", quantization_config=GPTQConfig(bits=4, backend="marlin"))
108108
```
109109

110-
## GPT-QModel]
110+
## GPT-QModel
111111

112-
GPT-QModel] is the actively maintained backend for GPTQ in Transformers. It was originally forked from AutoGPTQ, but has since diverged with significant improvements such as faster quantization, lower memory usage, and more accurate defaults.
112+
GPT-QModel is the actively maintained backend for GPTQ in Transformers. It was originally forked from AutoGPTQ, but has since diverged with significant improvements such as faster quantization, lower memory usage, and more accurate defaults.
113113

114-
GPT-QModel] provides asymmetric quantization which can potentially lower quantization errors compared to symmetric quantization. It is not backward compatible with legacy AutoGPTQ checkpoints, and not all kernels (Marlin) support asymmetric quantization.
114+
GPT-QModel provides asymmetric quantization which can potentially lower quantization errors compared to symmetric quantization. It is not backward compatible with legacy AutoGPTQ checkpoints, and not all kernels (Marlin) support asymmetric quantization.
115115

116-
GPT-QModel] also has broader support for the latest LLM models, multimodal models (Qwen2-VL and Ovis1.6-VL), platforms (Linux, macOS, Windows 11), and hardware (AMD ROCm, Apple Silicon, Intel/AMD CPUs, and Intel Datacenter Max/Arc GPUs, etc.).
116+
GPT-QModel also has broader support for the latest LLM models, multimodal models (Qwen2-VL and Ovis1.6-VL), platforms (Linux, macOS, Windows 11), and hardware (AMD ROCm, Apple Silicon, Intel/AMD CPUs, and Intel Datacenter Max/Arc GPUs, etc.).
117117

118118
The Marlin kernels are also updated for A100 GPUs and other kernels are updated to include auto-padding for legacy models and models with non-uniform in/out-features.
119119

0 commit comments

Comments
 (0)