-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[User] Producing tokenizer.model from transformers tokenizers.json #2443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Also, just curious if anyone has compared latency between inferencing using this project compared to a simple transformers text-generation pipeline? |
I guess you could make it work by copying the original tokenizer.model to the folder. |
Hey @klosax I added some special tokens for the downstream task so I don't think I can do that unfortunately |
It looks like a solution was added in PR #2228 You could try something like: |
raise FileNotFoundError(f"Could not find any of {[self._FILES[vt] for vt in vocab_types]}") |
I am running into this same issue trying to convert llama 70B into GGUF ... there is no tokenizer.model file, only tokenizer.json. When I run |
I have a Llama 2 7b model fine tuned for a downstream task and stored in transformers format, i.e. my model file structure looks like this:
I know the convert.py file expects the original Llama 2 structure, how would I modify it to make this work? I'm not too sure what the tokenizer.model file format is like, or how to convert the tokenizer.json file into it.
The text was updated successfully, but these errors were encountered: