-
Notifications
You must be signed in to change notification settings - Fork 537
Error: input 3 is none #7614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @jds250, Thanks for trying. I believe that it should not be necessary to get the root to run your model. |
Hi, I am using the branch release/0.4, and here is my step to reproduce, it seems that exporting pte file is included in the llama.py script, which is in the examples/qualcomm/oss_scripts/llama2, and the pte file is generated in the llama_qnn folder. Step 1: Setup
Step2: Prepare ModelDownload and preapre stories110M model # tokenizer.model & stories110M.pt:
wget "https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.pt"
wget "https://raw.githubusercontent.com/karpathy/llama2.c/master/tokenizer.model"
# tokenizer.bin:
python -m extension.llm.tokenizer.tokenizer -t tokenizer.model -o tokenizer.bin
# params.json:
echo '{"dim": 768, "multiple_of": 32, "n_heads": 12, "n_layers": 12, "norm_eps": 1e-05, "vocab_size": 32000}' > params.json Step3: Run default examplesDefault example generates the story based on the given prompt, "Once". # 16a4w quant:
python llama.py -b /home/jds/executorch/build-android -s 1f1fa994 -m SM8650 --ptq 16a4w --checkpoint stories110M.pt --params params.json --tokenizer_model tokenizer.model --tokenizer_bin tokenizer.bin --prompt "what is python?" --pre_gen_pte /home/jds/executorch/examples/qualcomm/oss_scripts/llama2/llama2_qnn/ |
Got it. Let me clarify one thing. |
Oh, I see. It's a bug to set input. We have a fix in this PR. If possible, could you use main branch? |
yes, I have compiled it first |
Thank you! I will try it again |
BTW, if you are interested in llama 3.2, we have provided this script to export and run. To enhance user experience, we will integrate our script for Llama as soon as possible. |
Hi @shewu-quic, I am experiencing a very similar issue where the model does not respond, and in the logcat, I see the error Environment:
Steps Tried:
Request for Help:Could you please advise if there are any additional fixes or specific steps to resolve these issues? Thank you for your support! I appreciate any guidance you can provide. |
Hi @michaelk77, Thanks for trying.
Feel free to let me know if you need any further assistance! |
Hi @shewu-quic, Thank you for your response and clarification! Runtime Environment:
Updated Status:
PTE Generation:I generated the python examples/qualcomm/oss_scripts/llama3_2/llama.py \
-b build-android \
-m SM8475 \
--checkpoint "consolidated.00.pth" \
--params "original_params.json" \
--ptq 16a4w \
--model_size 1B \
--tokenizer_model "tokenizer.model" \
--prompt "what is 1+1" \
--temperature 0 \
--model_mode kv \
--prefill_seq_len 32 \
--kv_seq_len 128 \
--compile_only Model Source:I am using the model files from Meta Llama 3.2 1B Instruct on Hugging Face. If you need additional logs or further details, please let me know. I appreciate your assistance! |
Thanks for your information, Could you please use the following command to run pte?
|
Thank you for providing the command to run the PTE. I have executed the provided command with a minor addition to specify my device using the python examples/qualcomm/oss_scripts/llama3_2/llama.py \
-b build-android \
-m SM8475 \
--checkpoint "consolidated.00.pth" \
--params "original_params.json" \
--ptq 16a4w \
--model_size 1B \
--tokenizer_model "tokenizer.model" \
--prompt "what is 1+1" \
--temperature 0 \
--model_mode kv \
--prefill_seq_len 32 \
--kv_seq_len 128 \
--pre_gen_pte ${path_to_your_pte_directory} \
-s # my device code from ADB. Observations:
Log Details:Here is the relevant portion of the logcat output during execution:
Performance Stats:The
Could you let me know if there’s any misconfiguration or additional step I should take? Thank you for your assistance! |
Hi @michaelk77 Sorry for late reply. |
Thanks for your advice! I have successfully run storiesllama and llama3.2-1B on my device.
Any insights you could share would be very helpful! Thank you in advance for your time and assistance. |
Congratulations on your effort. I’m pleased to help you.
If you have any questions, please don’t hesitate to let me know. |
Thank you very much for your response! I have a few more questions that I hope you could help me:
I would really appreciate any clarification on these points. If I misunderstood any part, I’d be grateful if you could point that out! |
Hi @jds250, Of course. Happy to help.
I hope this helps! Let me know if you need any further assistance. |
Thank you for your helpful response! I have a few follow-up questions regarding some details:
I appreciate your time and help! |
I hope this clarifies things! Feel free to reach out if you need any more help. |
Thank you for your detailed reply! I'm really looking forward to your upcoming documentation on kv cache management. In the meantime, I'd like to explore custom quantization algorithms to deploy a model on an NPU. I've noticed that Qualcomm provides the AIMET toolkit for quantization, and I'm also considering using executorch to implement a custom quantization flow. Could you share any guidance or examples on how to implement a custom quantization approach for NPU deployment? Specifically, I'm wondering if we need to implement custom quantization operators tailored to the NPU or if there's existing support we can leverage. Any advice or references you could point me to would be incredibly helpful. Thanks again for all your assistance! |
Title
Error: input 3 is none
when running Llama example in QNN ExecuTorch on AndroidDescription
I followed the instructions in the [Llama2 README](https://github.com/pytorch/executorch/blob/main/examples/qualcomm/oss_scripts/llama2/README.md) to run the
llama.py
script using QNN ExecuTorch on Android. The execution process fails with the errorinput 3 is none
, and metadata seems to be read from the model twice during execution.Steps to Reproduce
Environment setup:
Run the following command:
python llama.py -b executorch/build-android -s 112dhb -m SM8650 \ --ptq 16a4w --checkpoint stories110M.pt --params params.json \ --tokenizer_model tokenizer.model --tokenizer_bin tokenizer.bin \ --prompt "what is python?" \ --pre_gen_pte executorch/examples/qualcomm/oss_scripts/llama2/llama2_qnn/
LOG
So I found there is no output in my output file.
adb logcat
BTW I also notice that there is some fastrpc error: (maybe I don't have the root)
I wonder if it is necessary to get the root to deploy our model?
The text was updated successfully, but these errors were encountered: