Skip to content

Key 'quantization' not in 'QNNConfig' #109

@luffy-yu

Description

@luffy-yu

Environment

  • Ubuntu 24.04
  • QNN: 2.37.0.250724 / 2.26.0.240828 (neither is working)
  • NDK: 29.0.14206865

Can not build QNN backend following qualcomm_README.md

python -m extension.llm.export.export_llm base.checkpoint="${MODEL_DIR}/consolidated.00.pth" base.params="${MODEL_DIR}/params.json" model.use_kv_cache=True model.enable_dynamic_shape=False backend.qnn.enabled=True backend.qnn.quantization="qnn_16a4w" model.dtype_override="fp32" base.metadata='"{\"get_bos_id\":128000, \"get_eos_ids\":[128009, 128001]}"' export.output_name="test.pte"
I tokenizers:regex.cpp:27] Registering override fallback regex
Could not override 'backend.qnn.quantization'.
To append to your config use +backend.qnn.quantization=qnn_16a4w
Key 'quantization' not in 'QNNConfig'
    full_key: backend.qnn.quantization
    reference_type=QNNConfig
    object_type=QNNConfig

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

After +backend.qnn.quantization="qnn_16a4w"

python -m extension.llm.export.export_llm base.checkpoint="${MODEL_DIR}/consolidated.00.pth" base.params="${MODEL_DIR}/params.json" model.use_kv_cache=True model.enable_dynamic_shape=False backend.qnn.enabled=True +backend.qnn.quantization="qnn_16a4w" model.dtype_override="fp32" base.metadata='"{\"get_bos_id\":128000, \"get_eos_ids\":[128009, 128001]}"' export.output_name="test.pte"
I tokenizers:regex.cpp:27] Registering override fallback regex
[2025-11-06 21:52:57,430][root][INFO] - Applying quantizers: []
[2025-11-06 21:53:00,519][root][INFO] - Checkpoint dtype: torch.bfloat16
[2025-11-06 21:53:16,518][root][INFO] - Model after source transforms: Transformer(
  (tok_embeddings): Embedding(128256, 4096)
  (layers): ModuleList(
    (0-31): 32 x TransformerBlock(
      (attention): AttentionMHA(
        (wq): Conv2D(
          (conv): Conv2d(4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False)
        )
        (wk): Conv2D(
          (conv): Conv2d(1024, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False)
        )
        (wv): Conv2D(
          (conv): Conv2d(1024, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False)
        )
        (wo): Conv2D(
          (conv): Conv2d(4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False)
        )
        (rope): Rope(
          (apply_rotary_emb): RotaryEmbedding()
        )
        (kv_cache): KVCacheSimple()
        (SDPA): SDPAFlex()
      )
      (feed_forward): FeedForward(
        (w1): Conv2D(
          (conv): Conv2d(14336, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False)
        )
        (w2): Conv2D(
          (conv): Conv2d(4096, 14336, kernel_size=(1, 1), stride=(1, 1), bias=False)
        )
        (w3): Conv2D(
          (conv): Conv2d(14336, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False)
        )
      )
      (attention_norm): RMSNorm()
      (ffn_norm): RMSNorm()
    )
  )
  (rope): Rope(
    (apply_rotary_emb): RotaryEmbedding()
  )
  (norm): RMSNorm()
  (output): Conv2D(
    (conv): Conv2d(128256, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False)
  )
)
[2025-11-06 21:53:16,518][root][INFO] - Exporting with:
[2025-11-06 21:53:16,528][root][INFO] - inputs: (tensor([[1]]), {'input_pos': tensor([0])})
[2025-11-06 21:53:16,528][root][INFO] - kwargs: None
[2025-11-06 21:53:16,528][root][INFO] - dynamic shapes: None
[2025-11-06 21:53:22,331][root][INFO] - Running canonical pass: RemoveRedundantTransposes
[2025-11-06 21:53:22,387][root][INFO] - Using pt2e [] to quantizing the model...
[2025-11-06 21:53:22,387][root][INFO] - No quantizer provided, passing...
Killed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions