Skip to content

Releases: dropbox/hqq

v0.2.8.post1

20 Oct 15:39
835e259

Choose a tag to compare

Minor toml file patch

v0.2.8

18 Aug 10:55

Choose a tag to compare

Bug fixing:
-Fix static cache init with new transformers version
-Add mxfp vllm patching utils
-Improve cuda graphs/compile settings for transformer models

v0.2.7.post1

12 Jun 15:16

Choose a tag to compare

Bug fixing:
-HIP graph fix in generation: bc8f4c7
-Fix HQQLinear with None linear inputs: 3b86ac9

v0.2.7

02 Jun 08:07

Choose a tag to compare

  • Fix nan bug when max - min is very small: 373cbea
  • Add DISABLE_CUDA=1 env variable to disable building cuda kernels for then aten backend. This allows faster pip build. 861f690
  • Improve memory usage a566c78
  • Fix vLLM torch fallback logic: d3f14b4

v0.2.6

13 May 11:05

Choose a tag to compare

  • Fix cuda build
  • torchcompile() support for hqq_aten
  • bfloat16 support for vllm/hqq
  • Update vllm utils to support hqq_gemlite and hqq_torch aliases
  • FIx vLLM v1 issues
  • Extend save_to_safetensors to VLMs

Full Changelog: v0.2.5...0.2.6

v0.2.5

17 Mar 15:24

Choose a tag to compare

-Fix .name in backends
-Skip gemlite invalid in/out feature sizes in VLLM patching
-Faster VLLM packing via GemLite

v.0.2.3.post1

20 Feb 11:12

Choose a tag to compare

Bug fixes:

  • Check W_q in state dict to fix peft issue #151
  • Fix bugs related to AutoHQQHFModel.save_to_safetensors

v0.2.3

17 Feb 08:43
6e4c992

Choose a tag to compare

  • VLLM support via patching - GemLite backend + on-the-fly quantization
  • Add support for Aria
  • Add support to load quantized SequenceClassification
  • Faster decoding via (custom cudagraphs, sdpa math backend, etc.)
  • Fix bugs related torch compile and hf_generator related to the newer transformers versions
  • Fix bugs related to saving quantized models with no grouping
  • Fix bugs related to saving large quantized models
  • Update examples
  • Add support for HQQLinear .to(device)

v0.2.2

12 Sep 15:23

Choose a tag to compare

HQQ v0.2.2

  • Support static cache compilation without using HFGenerator
  • Fixing various issues related to torch.compile

v.0.2.1

29 Aug 16:25

Choose a tag to compare

HQQ v0.2.1