-
-
Notifications
You must be signed in to change notification settings - Fork 39
Open
Description
Overview
- Model Arch: Qwen2ForCausalLM
- Exllamav3 Version: v0.0.3 and v0.0.4
Conversion of moonshotai/Kimi-Dev-72B faults for several bpw variants (6.0_H6, 4.25_H6, 4.0_H6, 3.0_H6) with the following error:
torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 1 is not positive-definite).
Other bpw variants (8.0_H8, 8.0_H6, 5.0_H6, 3.5_H6) were successfully converted. Model itself works correctly so there is no issue with raw weights inference.
Trace
Traceback (most recent call last):
File "/opt/exl/exllamav3/convert.py", line 11, in <module>
main(_in_args, _job_state)
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/exl/exllamav3/exllamav3/conversion/convert_model.py", line 417, in main
proxy_err = linear.convert_exl3(
^^^^^^^^^^^^^^^^^^^^
File "/opt/exl/exllamav3/exllamav3/modules/linear.py", line 235, in convert_exl3
weight_q, proxy_err, out_tensors = quantize_exl3(
^^^^^^^^^^^^^^
File "/opt/exl/exllamav3/exllamav3/modules/quant/exl3_lib/quantize.py", line 781, in quantize_exl3
H, L, su, H_diag = finalize_capture_H(H_data, quant_args, verbose)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/exl/exllamav3/exllamav3/modules/quant/exl3_lib/quantize.py", line 480, in finalize_capture_H
L, H = block_ldl(H, 16, verbose)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/exl/exllamav3/exllamav3/modules/quant/exl3_lib/quantize.py", line 287, in block_ldl
raise e
File "/opt/exl/exllamav3/exllamav3/modules/quant/exl3_lib/quantize.py", line 274, in block_ldl
L = torch.linalg.cholesky(H)
^^^^^^^^^^^^^^^^^^^^^^^^
torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 1 is not positive-definite).
Conversion log (restart)
- [4x] Linear
- RMSNorm
- GatedMLP
- [3x] Linear
- RMSNorm
- Linear
-- Loaded tokenizer
Vocab size: 151665
-- Resuming at: model.layers.79
-- Loading unquantized module: model.layers.79
-- Captured: model.layers.79
-- Quantized: model.layers.79.self_attn.q_proj bpw: 6.00 proxy_err: 0.000066 o g_sc: 0.821332 [2.97 s]
-- Quantized: model.layers.79.self_attn.k_proj bpw: 6.00 proxy_err: 0.000066 o g_sc: 0.812287 [1.36 s]
-- Quantized: model.layers.79.self_attn.v_proj bpw: 6.00 proxy_err: 0.000098 o g_sc: 0.835967 [1.36 s]
-- Quantized: model.layers.79.self_attn.o_proj bpw: 6.00 proxy_err: 0.000022 o g_sc: 0.806696 [2.35 s]
-- Quantized: model.layers.79.mlp.up_proj bpw: 6.00 proxy_err: 0.000023 o g_sc: 0.812287 [7.09 s]
-- Quantized: model.layers.79.mlp.gate_proj bpw: 6.00 proxy_err: 0.000027 o g_sc: 0.815741 [6.20 s]
-- Quantized: model.layers.79.mlp.down_proj bpw: 6.00 proxy_err: 0.000008 . g_sc: 0.821332 [14.25 s]
-- Quantized: model.layers.79 bpw: 6.00 rfn: 0.003657 cos: 0.000007 sqnr: 48.787139 [68.06 s]
-- Estimated remaining time: 3 minutes
-- Loading unquantized module: model.norm
https://github.com/turboderp-org/exllamav3
-- Quantized: model.norm bpw: 16.00 rfn: 0.000000 cos: 0.000000 sqnr: 0.000000 [3.01 s]
-- Estimated remaining time: 1 minute
-- Loading unquantized module: lm_head
-- Captured: lm_head
!! Warning: block state has 0 inf values and 1 NaN values (out of 1,677,721,600)
Downtown-Case
Metadata
Metadata
Assignees
Labels
No labels