Skip to content

Bug: missing option --vocab-type bpe in convert-hf-to-gguf.py #7912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gakugaku opened this issue Jun 13, 2024 · 3 comments
Closed

Bug: missing option --vocab-type bpe in convert-hf-to-gguf.py #7912

gakugaku opened this issue Jun 13, 2024 · 3 comments
Labels
bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)

Comments

@gakugaku
Copy link

What happened?

README:

https://github.com/ggerganov/llama.cpp/blob/f578b86b2123d0f92afbaa98a031df4d4464e582/README.md?plain=1#L625-L626

Actual Output:

$ python convert-hf-to-gguf.py ./mymodels/ --vocab-type bpe
usage: convert-hf-to-gguf.py [-h] [--vocab-only] [--awq-path AWQ_PATH] [--outfile OUTFILE] [--outtype {f32,f16,bf16,q8_0,auto}] [--bigendian] [--use-temp-file] [--no-lazy] [--model-name MODEL_NAME] [--verbose] model
convert-hf-to-gguf.py: error: unrecognized arguments: --vocab-type bpe

Name and Version

$ ./llama-cli --version
version: 3143 (f578b86)
built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

No response

@gakugaku gakugaku added bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches) labels Jun 13, 2024
@Galunid
Copy link
Collaborator

Galunid commented Jun 13, 2024

Hi, that's not a bug. convert-hf-to-gguf.py automatically detects which vocab should be used based on model. There's no need for --vocab-type anymore.

I'll fix the readme.

@cmp-nct
Copy link
Contributor

cmp-nct commented Jun 14, 2024

Hi, that's not a bug. convert-hf-to-gguf.py automatically detects which vocab should be used based on model. There's no need for --vocab-type anymore.

I'll fix the readme.

Struggling with the same issue, trying to convert the minicpm-2.5 model which is dynamically generated (like llava-surgery) during the pre-conversion process. Forcing the tokenizer/model type should be an option available

In the same process I also got an error to trust remote code during get_vocab_base - from_pretrained()

@Galunid
Copy link
Collaborator

Galunid commented Jun 15, 2024

@cmp-nct Yup, convert-hf-to-gguf-update.py is a bit of a pain for models with new tokenizers. I replied in more details in #7599, but you can still use examples/convert-legacy-llama.py which is old convert.py script. We should add some sort of option for this though. I agree. I'll take a look at that tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)
Projects
None yet
Development

No branches or pull requests

3 participants