Compared to GPTQ, [AWQ](https://github.com/mit-han-lab/llm-awq) is more accurate and has much better inference performance. Benchmark: https://github.com/lm-sys/FastChat/blob/main/docs/awq.md#benchmark ~Note: Multi-Query Attention is [not yet supported](https://github.com/mit-han-lab/llm-awq/issues/53#issuecomment-1659417305).~