Bug: convert-hf-to-gguf.py on Gemma model ValueError: Duplicated key name 'tokenizer.chat_template' #7923

bartowski1182 · 2024-06-13T19:09:23Z

What happened?

When trying to convert

https://huggingface.co/SakanaAI/DiscoPOP-zephyr-7b-gemma/

I get the error in the title, but it's only defined a single time in tokenizer_config.json:

https://huggingface.co/SakanaAI/DiscoPOP-zephyr-7b-gemma/blob/main/tokenizer_config.json#L59

Verified locally with cat *.json | grep chat_template and I only get the one result

Is it somehow trying to load it twice?

Looks like when Gemma is initialized, it runs _set_vocab_sentencepiece(), which runs special_vocab.add_to_gguf (which pulls in the chat_template), and then it also again runs special_vocab.add_to_gguf

but that would mean it's been broken since April 16..

#6689

Name and Version

b3145 ubuntu 22.04

What operating system are you seeing the problem on?

Linux

Relevant log output

INFO:hf-to-gguf:Loading model: DiscoPOP-zephyr-7b-gemma
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 2
INFO:gguf.vocab:Setting special token type eos to 1
INFO:gguf.vocab:Setting special token type unk to 3
INFO:gguf.vocab:Setting special token type pad to 0
INFO:gguf.vocab:Setting add_bos_token to False
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {% if messages[0]['role'] == 'user' or messages[0]['role'] == 'system' %}{{ bos_token }}{% endif %}{% for message in messages %}{{ '<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
' }}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% elif messages[-1]['role'] == 'assistant' %}{{ eos_token }}{% endif %}
INFO:gguf.vocab:Setting special token type prefix to 67
INFO:gguf.vocab:Setting special token type suffix to 69
INFO:gguf.vocab:Setting special token type middle to 68
WARNING:gguf.vocab:No handler for special token type fsep with id 70 - skipping
INFO:gguf.vocab:Setting special token type eot to 107
INFO:gguf.vocab:Setting chat_template to {% if messages[0]['role'] == 'user' or messages[0]['role'] == 'system' %}{{ bos_token }}{% endif %}{% for message in messages %}{{ '<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
' }}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% elif messages[-1]['role'] == 'assistant' %}{{ eos_token }}{% endif %}
Traceback (most recent call last):
  File "/llama.cpp/convert-hf-to-gguf.py", line 2882, in <module>
    main()
  File "/llama.cpp/convert-hf-to-gguf.py", line 2867, in main
    model_instance.set_vocab()
  File "/llama.cpp/convert-hf-to-gguf.py", line 2251, in set_vocab
    special_vocab.add_to_gguf(self.gguf_writer)
  File "/llama.cpp/gguf-py/gguf/vocab.py", line 73, in add_to_gguf
    gw.add_chat_template(self.chat_template)
  File "/llama.cpp/gguf-py/gguf/gguf_writer.py", line 565, in add_chat_template
    self.add_string(Keys.Tokenizer.CHAT_TEMPLATE, value)
  File "/llama.cpp/gguf-py/gguf/gguf_writer.py", line 206, in add_string
    self.add_key_value(key, val, GGUFValueType.STRING)
  File "/llama.cpp/gguf-py/gguf/gguf_writer.py", line 166, in add_key_value
    raise ValueError(f'Duplicated key name {key!r}')
ValueError: Duplicated key name 'tokenizer.chat_template'

The text was updated successfully, but these errors were encountered:

bartowski1182 · 2024-06-13T19:15:30Z

Potential fix: just setting special_vocab.chat_template = None before calling special_vocab.add_to_gguf() seems to work

maab19 · 2024-06-14T09:11:41Z

It is broken since #7827 , so only a couple of days. It broke because a check if a key is already set was added in the add_key_value() function of the GGUFWriter class which throws the exception. Before, the second call of special_vocab.add_to_gguf() just updated the value without throwing an exception.

I also already reported the issue #7897

bartowski1182 · 2024-06-14T15:33:03Z

Ohhhh good catch, that makes a lot more sense than it being broken for months!

bartowski1182 added bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches) labels Jun 13, 2024

github-actions bot added the stale label Jul 15, 2024

compilade removed the stale label Jul 19, 2024

compilade mentioned this issue Jul 19, 2024

convert_hf : fix Gemma v1 conversion #8597

Merged

2 tasks

compilade added bug Something isn't working and removed bug-unconfirmed labels Jul 19, 2024

compilade closed this as completed in #8597 Jul 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: convert-hf-to-gguf.py on Gemma model ValueError: Duplicated key name 'tokenizer.chat_template' #7923

Bug: convert-hf-to-gguf.py on Gemma model ValueError: Duplicated key name 'tokenizer.chat_template' #7923

bartowski1182 commented Jun 13, 2024

bartowski1182 commented Jun 13, 2024

maab19 commented Jun 14, 2024

bartowski1182 commented Jun 14, 2024

Bug: convert-hf-to-gguf.py on Gemma model ValueError: Duplicated key name 'tokenizer.chat_template' #7923

Bug: convert-hf-to-gguf.py on Gemma model ValueError: Duplicated key name 'tokenizer.chat_template' #7923

Comments

bartowski1182 commented Jun 13, 2024

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

bartowski1182 commented Jun 13, 2024

maab19 commented Jun 14, 2024

bartowski1182 commented Jun 14, 2024