-
Notifications
You must be signed in to change notification settings - Fork 537
Export llama3.1 Runtime error: Missing out variants: {'quantized_decomposed::dequantize_per_token'....... #7775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Another thing is that, the memory required is very high when running the converting script ... |
Added some debug logs, the
|
Also getting this error on the llama export attempting to quantize with these params; but perhaps I am missing a param? The docs recommend using a pre-quantized model, but I am trying to better understand the executorch quantizing options for another model, but not quite sure what the correct params are, so this one might be my own error...
swapping out the pt2e quantize param with Note, I also had to downgrade numpy from the executorch default to run this at all... UPDATE: in my case, the -d bf16 parameter was not compatible with xnnpack; and switching that to fp16 was successful. |
Hi @mcr229 , thanks a lot for the help, I can now export the model to However, when trying to run the model, I'm getting another tokenizer load error 😅:
There seems be some errors when loading the [Update]
|
really appreciate the helps here, i'm closing this now😄 |
🐛 Describe the bug
Hello, we got and runtime error when trying to export the llama3.1 8B model, here's the reproduce script:
Then we got error:
Versions
env:
The text was updated successfully, but these errors were encountered: