Quantization does not write the quantization version to `ftype`

# Expected Behavior

When quantizing with llama.cpp, the quantization version should be written to the `ftype` in the hyperparameters.

# Current Behavior

A `ftype` is produced by `llama_model_quantize_internal` and is passed through as-is to `llama_file_saver`, which writes it to disk without encoding it using `GGML_QNT_VERSION`:

https://github.com/ggerganov/llama.cpp/blob/ac7876ac20124a15a44fd6317721ff1aa2538806/llama.cpp#L2052-L2068

https://github.com/ggerganov/llama.cpp/blob/ac7876ac20124a15a44fd6317721ff1aa2538806/llama.cpp#L557

Loaders which are expecting the quantization version, like [llm](https://github.com/rustformers/llm), detect a quantization version of 0:

```
     Running `target/release/llm llama info -m models/llama/7B/koala-7B.ggmlv3.q5_1.bin`
[2023-05-25T00:10:05Z INFO  llm] Container type: Ggjt(3)
[2023-05-25T00:10:05Z INFO  llm] Hyperparameters: Hyperparameters { n_vocab: 32000, n_embd: 4096, n_mult: 256, n_head: 32, n_layer: 32, n_rot: 128, file_type: FileType { format: MostlyQ5_1, quantization_version: 0 } }
[2023-05-25T00:10:05Z INFO  llm] Vocabulary size: 32000
```

# Environment and Context

This was reproduced on https://github.com/ggerganov/llama.cpp/commit/ac7876ac20124a15a44fd6317721ff1aa2538806. I initially detected this when testing with one of the models on HuggingFace, then re-quantized a model locally to test it for myself.

# Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

1. `make`
2. `./quantize ggml-model-f16.bin ggml-model-f16-q4_0.bin q4_0`
3. Check the `ftype` in the written hyperparameters.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quantization does not write the quantization version to `ftype` #1590

Expected Behavior

Current Behavior

Environment and Context

Steps to Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Quantization does not write the quantization version to ftype #1590

Description

Expected Behavior

Current Behavior

Environment and Context

Steps to Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Quantization does not write the quantization version to `ftype` #1590