-
Notifications
You must be signed in to change notification settings - Fork 543
Missing Out Variants When Running Llama3.2 Example Without XNNPack #6975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you add try adding
I don't think we have a quantized linear kernel in ExecuTorch outside of XNNPACK or torchao, so I guess using those ops probably dequantizes the weights and does the linear computation in float32, and it might not be a good comparison. cc @larryliu0820 for missing ops and @digantdesai for XNNPACK |
Hmm...We should have |
@digantdesai the only missing op in @metascroy To try using the torchao ops I am currently trying to use the main branch but hitting some minor issues like quantization args not getting passed to ModelArgs |
I encountered this problem as well. I found that it is necessary to register the output variant into the PyTorch system. The quantized library depends on the portable library and has to be loaded explicitly. However, why doesn’t ExecuTorch load it implicitly? import executorch.extension.pybindings.portable_lib
import executorch.kernels.quantized |
@AkiSakurai thanks linking the portable library helped with all the defined ops in executorch/kernels/quantized/cpu. But
|
No, It looks like this operation is not yet implemented. |
@digantdesai can you help with the implementation of the missing op quantized_decomposed::dequantize_per_channel_group? |
As @AkiSakurai correctly said, it seems like we do not have that op implemented in the quantized library here - |
Got it, thanks |
is #7775 a duplicate of this ? we got runtime error when trying to convert llama3.1 8b:
|
Thanks a lot for the clarifications,we added some debug logs and turns out the
|
I am follwing the instructions in the Llama2 README to test llama model with Executorch.
I want to compare the performance of the model with and without XNNPack. From the code, it seems that DQLinear operations are delegated to XNNPack by default. However, I would like to understand how to use the quantized ops defined in Executorch, as listed in quantized.yaml. Could you provide guidance on configuring the model to use Executorch's quantized ops instead of XNNPack?
I encounter the following error when the -X(--xnnpack) flag is removed from the python export:
raise RuntimeError(f"Missing out variants: {missing_out_vars}") RuntimeError: Missing out variants: {'quantized_decomposed::choose_qparams_per_token_asymmetric', 'quantized_decomposed::dequantize_per_channel', 'quantized_decomposed::dequantize_per_channel_group', 'quantized_decomposed::dequantize_per_token', 'quantized_decomposed::quantize_per_token'}
What adjustments are required to resolve the "missing out variants" error when the -X flag is omitted?
Thank you for your assistance!
Versions
Collecting environment information...
PyTorch version: 2.6.0.dev20240927+cpu
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.5 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 14.0.0-1ubuntu1.1
CMake version: version 3.31.0
Libc version: glibc-2.35
Python version: 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.15.167.1-1.cm2-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: 12.6.77
Versions of relevant libraries:
[pip3] executorch==0.5.0a0+20a157f
[pip3] numpy==1.26.4
[pip3] torch==2.6.0.dev20240927+cpu
[pip3] torchao==0.5.0+git0916b5b2
[pip3] torchaudio==2.5.0.dev20240927+cpu
[pip3] torchsr==1.0.4
[pip3] torchvision==0.20.0.dev20240927+cpu
[conda] executorch 0.5.0a0+20a157f pypi_0 pypi
[conda] numpy 1.26.4 pypi_0 pypi
[conda] torch 2.6.0.dev20240927+cpu pypi_0 pypi
[conda] torchaudio 2.5.0.dev20240927+cpu pypi_0 pypi
[conda] torchsr 1.0.4 pypi_0 pypi
[conda] torchvision 0.20.0.dev20240927+cpu pypi_0 pypi
cc @digantdesai @mcr229 @JacobSzwejbka @dbort
The text was updated successfully, but these errors were encountered: