Releases · dropbox/hqq

20 Oct 15:39

mobicham

v0.2.8.post1

835e259

v0.2.8.post1 Latest

Latest

Minor toml file patch

Assets 2

18 Aug 10:55

mobicham

v0.2.8

f62d06e

v0.2.8

Bug fixing:
-Fix static cache init with new transformers version
-Add mxfp vllm patching utils
-Improve cuda graphs/compile settings for transformer models

Assets 2

12 Jun 15:16

mobicham

0.2.7.post1

93f1903

v0.2.7.post1

Bug fixing:
-HIP graph fix in generation: bc8f4c7
-Fix HQQLinear with None linear inputs: 3b86ac9

Assets 2

02 Jun 08:07

mobicham

0.2.7

373cbea

v0.2.7

Fix nan bug when max - min is very small: 373cbea
Add DISABLE_CUDA=1 env variable to disable building cuda kernels for then aten backend. This allows faster pip build. 861f690
Improve memory usage a566c78
Fix vLLM torch fallback logic: d3f14b4

Assets 2

13 May 11:05

mobicham

0.2.6

a86e0f4

v0.2.6

Fix cuda build
torchcompile() support for hqq_aten
bfloat16 support for vllm/hqq
Update vllm utils to support hqq_gemlite and hqq_torch aliases
FIx vLLM v1 issues
Extend save_to_safetensors to VLMs

Full Changelog: v0.2.5...0.2.6

Assets 2

17 Mar 15:24

mobicham

v0.2.5

7418f59

v0.2.5

-Fix .name in backends
-Skip gemlite invalid in/out feature sizes in VLLM patching
-Faster VLLM packing via GemLite

Assets 2

20 Feb 11:12

mobicham

0.2.3.post1

c60218e

v.0.2.3.post1

Bug fixes:

Check W_q in state dict to fix peft issue #151
Fix bugs related to AutoHQQHFModel.save_to_safetensors

Assets 2

17 Feb 08:43

mobicham

0.2.3

6e4c992

v0.2.3

VLLM support via patching - GemLite backend + on-the-fly quantization
Add support for Aria
Add support to load quantized SequenceClassification
Faster decoding via (custom cudagraphs, sdpa math backend, etc.)
Fix bugs related torch compile and hf_generator related to the newer transformers versions
Fix bugs related to saving quantized models with no grouping
Fix bugs related to saving large quantized models
Update examples
Add support for HQQLinear .to(device)

Assets 2

12 Sep 15:23

mobicham

0.2.2

126a8b2

v0.2.2

HQQ v0.2.2

Support static cache compilation without using HFGenerator
Fixing various issues related to torch.compile

Assets 2

29 Aug 16:25

mobicham

0.2.1

5f8f0d2

v.0.2.1

HQQ v0.2.1

HQQLinear.state_dict() for non-initialized layers. Mainly used in for huggingface/transformers#33141

Assets 2

Releases: dropbox/hqq

v0.2.8.post1

Uh oh!

v0.2.8

Uh oh!

v0.2.7.post1

Uh oh!

v0.2.7

Uh oh!

v0.2.6

Uh oh!

v0.2.5

Uh oh!

v.0.2.3.post1

Uh oh!

v0.2.3

Uh oh!

v0.2.2

HQQ v0.2.2

Uh oh!

v.0.2.1

HQQ v0.2.1

Uh oh!