convert: add tensor hash general.hash.sha256 to kv store #8645

mofosyne · 2024-07-23T12:13:38Z

While autogeneration of UUID is a bit controversial, I decided to adapt the logic a bit for a straight up sha256 tensor hash in the kv store as general.hash.sha256.

While there already other choices like xxhash and sha1, in this context I think we have better value with a known strong cryptographic hash method like sha256. This I think would also pave the way for self signed gguf file so you can be sure it came from a known entity.

I also thought about 'per tensor layer' hash, but not sure how useful it would be at this stage as per layer tensor seems to be more of a 'developer debugging tool' at this stage. So best to keep to whole tensor level hashing instead.

For model repo maintainers like huggingface, this has immediate use in being able to track models even when KV metadata has been updated (e.g. fixing authorship metadata).

For anyone who may be interested, you might want to add some logic to either llama-gguf-hash to self check a gguf tensor data if this hash is present in the kv store. I opted against doing it as I wasn't sure on the utility yet and it would be more work than this current PR.

Testing process I did

During conversion you would get this new print out

INFO:hf-to-gguf:blk.7.attn_v.weight,        torch.bfloat16 --> F16, shape = {64, 64}
INFO:hf-to-gguf:output_norm.weight,         torch.bfloat16 --> F32, shape = {64}
INFO:hf-to-gguf:tensor hash (sha256): 8b3e00226cc2a55398b1ffbda7af8464040f9cd7b22ccbef8ba60b227924a2b1
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters

Checked that gguf-dump --markdown I can see the new entry:

|   4 | STRING    |     1 | general.architecture                   | `llama`                                                                          |
|   5 | STRING    |     1 | general.type                           | `model`                                                                          |
|   6 | STRING    |     1 | general.hash.sha256                    | `8b3e00226cc2a55398b1ffbda7af84`...`0f9cd7b22ccbef8ba60b227924a2b1`              |
|   7 | STRING    |     1 | general.name                           | `TinyLLama`                                                                      |
|   8 | STRING    |     1 | general.author                         | `Maykeye`                                                                        |

Checked that the sha256 is consistent with llama-gguf-hash:

llama-gguf-hash --all --no-layer TinyLLama-4.6M-v0.0-F16.gguf
xxh64     cbd383cfd4c897e6  TinyLLama-4.6M-v0.0-F16.gguf
sha1      a9de42f2bbeee1eba49bc39b25cf69ff7a0937f6  TinyLLama-4.6M-v0.0-F16.gguf
sha256    8b3e00226cc2a55398b1ffbda7af8464040f9cd7b22ccbef8ba60b227924a2b1  TinyLLama-4.6M-v0.0-F16.gguf

So at least it appears the sha256 process is consistent. The logic is similar to my attempt at autogenerated UUID which was also consistent, so less likely to have an error creep in this context.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Galunid

I'm unconvinced this is desirable. One side effect of introducing this change is prolonging the conversion process. Say someone wants to convert Llama 3 405B that's stored on HDD. How much longer is it going to take?

It's also quite easy to overwrite the hash and other metadata. A small change in tensor will result in change of hash, so if someone wants to avoid making model possible to track it's not difficult.

I think conversion as bf16 and f16 will also generate different hashes.

Another thing is doesn't hashlib need to store all the tensors in memory to calculate the hash? According to docs .update concatenates the subsequent calls, which would suggest that at some point all the tensors would be stored in memory, which would be an absolute deal-breaker when converting bigger models.

compilade · 2024-07-23T19:37:58Z

@mofosyne

I agree with @Galunid regarding the overhead (both CPU-wise and memory-wise).

This also has the exact same problems as the UUID autogeneration, because the hash for an f32 model quantized (with llama-quantize) to q8_0 would not have the same hash as a model converted with --outtype q8_0, even though the tensor contents are actually equal (this was at least the case in #7234, and this should still be true on master).

If you truly want this to work as an integrity check, then llama-quantize should update the hash otherwise it would never match with the weights of the files most people use.

But the way llama-quantize is structured, this is not easy to do, because it writes the header completely before beginning to quantize the tensors (I think?), so the resulting data is not known beforehand, unless it's all kept in memory.

Another thing is doesn't hashlib need to store all the tensors in memory to calculate the hash?

@Galunid No, hashlib by itself doesn't, because hashing functions usually work in blocks and so only the inner state of the hash needs to be kept in memory.

But in this case, reading the tensor data from a LazyNumpyTensor materializes it, and so yes, this would put all the tensors in memory, since they are only freed when writing them to a file, which is done after writing the metadata. (In GGUFWriter, the tensors are normally only materialized when writing them, since (usually) nothing reads their data before that)

An eventual solution would be to put metadata at the end of GGUF model files, which would also help with editing metadata without rewriting all of the data (good for tokenizer fixes too). But this requires deeper format changes, although it might be possible to fit this backward-compatibly into the existing GGUF v3. (if you have ideas for this, feel free to explore them)

But as this PR is now, I think it has the following problems:

convert_hf_to_gguf.py --outtype q8_0 and llama_quantize model-F32.gguf model-Q8_0.gguf q8_0 would not result in exactly the same files
This would cause a big memory regression for lazy conversion by making it the same as --no-lazy

mofosyne · 2024-07-24T13:52:23Z

@Galunid the sha256 sum by itself will not consume memory as @compilade said, it does a running hash sum as bytes come into it.

However I see your point regarding impacting lazy loading and I don't see anyway around it, so am inclined to close this PR. Maybe if they really need to, they could just leverage off llama-gguf-hash anyway on load to their database.

@compilade regarding the idea of extending the end of the gguf file as an extension. GG is heavily against the idea unless it is truly unavoidable as he would prefer ensuring backwards compatibility via the kv store. That's not to say it won't happen in the future, but if we do then we better have a good reason... or spin off a new file format standard not encumbered by the past (If so, then I'll suggest using CBOR over inventing our own structure format for metadata... and of course sticking the metadata at the end like you suggest)

convert_hf_to_gguf.py --outtype q8_0 and llama_quantize model-F32.gguf model-Q8_0.gguf q8_0 would not result in exactly the same files

That's a bit strange, does llama-gguf-hash also show difference? Is this a translation between 'safetensor to GGUF Q8' vs 'gguf F32 to GGUF Q8'? If so then maybe it's valid to have a difference, since there might be slight difference in behavior due to difference between converting from two different float formats to Q8?

compilade · 2024-07-24T14:23:30Z

@mofosyne

That's a bit strange, does llama-gguf-hash also show difference?

No, the difference is only in the metadata, because of the hash introduced in this PR which differs, because it depends on the output when converting and it's not updated by llama-quantize, so it doesn't reflect the tensor contents in that case.

Another solution (instead of updating the hash in llama-quantize) would be to hash the source tensors when converting, but this would not be usable as an integrity check; it would only mark provenance.

Is this a translation between 'safetensor to GGUF Q8' vs 'gguf F32 to GGUF Q8'? If so then maybe it's valid to have a difference, since there might be slight difference in behavior due to difference between converting from two different float formats to Q8?

There is no difference in behavior. See #7234. Internally, Q8_0 conversion is always done from F32.

convert-*.py: add tensor hash general.hash.sha256 to kv store

60d4789

mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Jul 23, 2024

github-actions bot added the python python script changes label Jul 23, 2024

Galunid requested changes Jul 23, 2024

View reviewed changes

compilade mentioned this pull request Jul 23, 2024

convert-*.py: autogenerate general.uuid if missing #8565

Closed

4 tasks

mofosyne closed this Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

convert: add tensor hash general.hash.sha256 to kv store #8645

convert: add tensor hash general.hash.sha256 to kv store #8645

Uh oh!

mofosyne commented Jul 23, 2024 •

edited

Loading

Uh oh!

Galunid left a comment

Uh oh!

compilade commented Jul 23, 2024

Uh oh!

mofosyne commented Jul 24, 2024

Uh oh!

compilade commented Jul 24, 2024

Uh oh!

Uh oh!

convert: add tensor hash general.hash.sha256 to kv store #8645

convert: add tensor hash general.hash.sha256 to kv store #8645

Uh oh!

Conversation

mofosyne commented Jul 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing process I did

Uh oh!

Galunid left a comment

Choose a reason for hiding this comment

Uh oh!

compilade commented Jul 23, 2024

Uh oh!

mofosyne commented Jul 24, 2024

Uh oh!

compilade commented Jul 24, 2024

Uh oh!

Uh oh!

mofosyne commented Jul 23, 2024 •

edited

Loading