Skip to content

Upgrade to exllama v2 #1016

@flozi00

Description

@flozi00

Feature request

https://github.com/turboderp/exllamav2

Motivation

Overview of differences compared to V1
Faster, better kernels
Cleaner and more versatile codebase
Support for a new quant format

Model Mode Size grpsz act V1: 3090Ti V1: 4090 V2: 3090Ti V2: 4090
Llama GPTQ 7B 128 no 143 t/s 173 t/s 175 t/s 195 t/s
Llama GPTQ 13B 128 no 84 t/s 102 t/s 105 t/s 110 t/s
Llama GPTQ 33B 128 yes 37 t/s 45 t/s 45 t/s 48 t/s
OpenLlama GPTQ 3B 128 yes 194 t/s 226 t/s 295 t/s 321 t/s

Your contribution

I could take a look to actual exllama implementation and what it takes to upgrade, if wanted

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions