Upgrade to exllama v2

### Feature request

https://github.com/turboderp/exllamav2

### Motivation

Overview of differences compared to V1
Faster, better kernels
Cleaner and more versatile codebase
Support for a new quant format




Model | Mode | Size | grpsz | act | V1: 3090Ti | V1: 4090 | V2: 3090Ti | V2: 4090
-- | -- | -- | -- | -- | -- | -- | -- | --
Llama | GPTQ | 7B | 128 | no | 143 t/s | 173 t/s | 175 t/s | 195 t/s
Llama | GPTQ | 13B | 128 | no | 84 t/s | 102 t/s | 105 t/s | 110 t/s
Llama | GPTQ | 33B | 128 | yes | 37 t/s | 45 t/s | 45 t/s | 48 t/s
OpenLlama | GPTQ | 3B | 128 | yes | 194 t/s | 226 t/s | 295 t/s | 321 t/s



### Your contribution

I could take a look to actual exllama implementation and what it takes to upgrade, if wanted

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Upgrade to exllama v2 #1016

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model	Mode	Size	grpsz	act	V1: 3090Ti	V1: 4090	V2: 3090Ti	V2: 4090
Llama	GPTQ	7B	128	no	143 t/s	173 t/s	175 t/s	195 t/s
Llama	GPTQ	13B	128	no	84 t/s	102 t/s	105 t/s	110 t/s
Llama	GPTQ	33B	128	yes	37 t/s	45 t/s	45 t/s	48 t/s
OpenLlama	GPTQ	3B	128	yes	194 t/s	226 t/s	295 t/s	321 t/s

Upgrade to exllama v2 #1016

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions