-
Notifications
You must be signed in to change notification settings - Fork 18
Open
Description
Environment
- Ubuntu 24.04
- QNN: 2.37.0.250724 / 2.26.0.240828 (neither is working)
- NDK: 29.0.14206865
Can not build QNN backend following qualcomm_README.md
python -m extension.llm.export.export_llm base.checkpoint="${MODEL_DIR}/consolidated.00.pth" base.params="${MODEL_DIR}/params.json" model.use_kv_cache=True model.enable_dynamic_shape=False backend.qnn.enabled=True backend.qnn.quantization="qnn_16a4w" model.dtype_override="fp32" base.metadata='"{\"get_bos_id\":128000, \"get_eos_ids\":[128009, 128001]}"' export.output_name="test.pte"
I tokenizers:regex.cpp:27] Registering override fallback regex
Could not override 'backend.qnn.quantization'.
To append to your config use +backend.qnn.quantization=qnn_16a4w
Key 'quantization' not in 'QNNConfig'
full_key: backend.qnn.quantization
reference_type=QNNConfig
object_type=QNNConfig
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
After +backend.qnn.quantization="qnn_16a4w"
python -m extension.llm.export.export_llm base.checkpoint="${MODEL_DIR}/consolidated.00.pth" base.params="${MODEL_DIR}/params.json" model.use_kv_cache=True model.enable_dynamic_shape=False backend.qnn.enabled=True +backend.qnn.quantization="qnn_16a4w" model.dtype_override="fp32" base.metadata='"{\"get_bos_id\":128000, \"get_eos_ids\":[128009, 128001]}"' export.output_name="test.pte"
I tokenizers:regex.cpp:27] Registering override fallback regex
[2025-11-06 21:52:57,430][root][INFO] - Applying quantizers: []
[2025-11-06 21:53:00,519][root][INFO] - Checkpoint dtype: torch.bfloat16
[2025-11-06 21:53:16,518][root][INFO] - Model after source transforms: Transformer(
(tok_embeddings): Embedding(128256, 4096)
(layers): ModuleList(
(0-31): 32 x TransformerBlock(
(attention): AttentionMHA(
(wq): Conv2D(
(conv): Conv2d(4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(wk): Conv2D(
(conv): Conv2d(1024, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(wv): Conv2D(
(conv): Conv2d(1024, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(wo): Conv2D(
(conv): Conv2d(4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(rope): Rope(
(apply_rotary_emb): RotaryEmbedding()
)
(kv_cache): KVCacheSimple()
(SDPA): SDPAFlex()
)
(feed_forward): FeedForward(
(w1): Conv2D(
(conv): Conv2d(14336, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(w2): Conv2D(
(conv): Conv2d(4096, 14336, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(w3): Conv2D(
(conv): Conv2d(14336, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
)
(attention_norm): RMSNorm()
(ffn_norm): RMSNorm()
)
)
(rope): Rope(
(apply_rotary_emb): RotaryEmbedding()
)
(norm): RMSNorm()
(output): Conv2D(
(conv): Conv2d(128256, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
)
[2025-11-06 21:53:16,518][root][INFO] - Exporting with:
[2025-11-06 21:53:16,528][root][INFO] - inputs: (tensor([[1]]), {'input_pos': tensor([0])})
[2025-11-06 21:53:16,528][root][INFO] - kwargs: None
[2025-11-06 21:53:16,528][root][INFO] - dynamic shapes: None
[2025-11-06 21:53:22,331][root][INFO] - Running canonical pass: RemoveRedundantTransposes
[2025-11-06 21:53:22,387][root][INFO] - Using pt2e [] to quantizing the model...
[2025-11-06 21:53:22,387][root][INFO] - No quantizer provided, passing...
Killed
Metadata
Metadata
Assignees
Labels
No labels