Skip to content

Quantization does not write the quantization version to ftype #1590

Closed
@philpax

Description

@philpax

Expected Behavior

When quantizing with llama.cpp, the quantization version should be written to the ftype in the hyperparameters.

Current Behavior

A ftype is produced by llama_model_quantize_internal and is passed through as-is to llama_file_saver, which writes it to disk without encoding it using GGML_QNT_VERSION:

https://github.com/ggerganov/llama.cpp/blob/ac7876ac20124a15a44fd6317721ff1aa2538806/llama.cpp#L2052-L2068

https://github.com/ggerganov/llama.cpp/blob/ac7876ac20124a15a44fd6317721ff1aa2538806/llama.cpp#L557

Loaders which are expecting the quantization version, like llm, detect a quantization version of 0:

     Running `target/release/llm llama info -m models/llama/7B/koala-7B.ggmlv3.q5_1.bin`
[2023-05-25T00:10:05Z INFO  llm] Container type: Ggjt(3)
[2023-05-25T00:10:05Z INFO  llm] Hyperparameters: Hyperparameters { n_vocab: 32000, n_embd: 4096, n_mult: 256, n_head: 32, n_layer: 32, n_rot: 128, file_type: FileType { format: MostlyQ5_1, quantization_version: 0 } }
[2023-05-25T00:10:05Z INFO  llm] Vocabulary size: 32000

Environment and Context

This was reproduced on ac7876a. I initially detected this when testing with one of the models on HuggingFace, then re-quantized a model locally to test it for myself.

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

  1. make
  2. ./quantize ggml-model-f16.bin ggml-model-f16-q4_0.bin q4_0
  3. Check the ftype in the written hyperparameters.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions