Release 0.6 llama low-bit kernels #10166

lucylq · 2025-04-14T22:23:45Z

🐛 Describe the bug

Going through the low-bit kernel section on macbook M1. Note this section requires Arm-based Mac.

These are still experimental, and require you do development on an Arm-based Mac

Export is successful, runtime build is not.
Build executorch

cmake -DPYTHON_EXECUTABLE=python \
    -DCMAKE_INSTALL_PREFIX=cmake-out \
    -DEXECUTORCH_ENABLE_LOGGING=1 \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
    -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -Bcmake-out .
cmake --build cmake-out -j16 --target install --config Release

^ this is OK.

Build llama

cmake -DPYTHON_EXECUTABLE=python \
    -DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_XNNPACK=OFF \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -DEXECUTORCH_BUILD_TORCHAO=ON \
    -Bcmake-out/examples/models/llama \
    examples/models/llama
cmake --build cmake-out/examples/models/llama -j16 --config Release

See the error:

CMake Error at CMakeLists.txt:210 (add_executable):
  Impossible to link target 'llama_main' because the link item 'custom_ops',
  specified without any feature or 'DEFAULT' feature, has already occurred
  with the feature 'WHOLE_ARCHIVE', which is not allowed.


CMake Error at CMakeLists.txt:210 (add_executable):
  Impossible to link target 'llama_main' because the link item 'custom_ops',
  specified without any feature or 'DEFAULT' feature, has already occurred
  with the feature 'WHOLE_ARCHIVE', which is not allowed.


CMake Error at CMakeLists.txt:210 (add_executable):
  Impossible to link target 'llama_main' because the link item 'custom_ops',
  specified without any feature or 'DEFAULT' feature, has already occurred
  with the feature 'WHOLE_ARCHIVE', which is not allowed.


CMake Error at CMakeLists.txt:210 (add_executable):
  Impossible to link target 'llama_main' because the link item 'custom_ops',
  specified without any feature or 'DEFAULT' feature, has already occurred
  with the feature 'WHOLE_ARCHIVE', which is not allowed.

Versions

(executorch) lfq@lfq-mbp executorch % python collect_env.py
Collecting environment information...
PyTorch version: 2.7.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 15.3.2 (arm64)
GCC version: Could not collect
Clang version: 16.0.0 (clang-1600.0.26.3)
CMake version: version 3.29.5
Libc version: N/A

Python version: 3.10.0 (default, Mar  3 2022, 03:54:28) [Clang 12.0.0 ] (64-bit runtime)
Python platform: macOS-15.3.2-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Pro

Versions of relevant libraries:
[pip3] executorch==0.6.0a0+4517126
[pip3] executorchcoreml==0.0.1
[pip3] flake8==6.1.0
[pip3] flake8-breakpoint==1.1.0
[pip3] flake8-bugbear==23.9.16
[pip3] flake8-comprehensions==3.14.0
[pip3] flake8-plugin-utils==1.3.3
[pip3] flake8-pyi==23.5.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==2.2.2
[pip3] torch==2.7.0
[pip3] torchao==0.10.0+git8b264ce1
[pip3] torchaudio==2.7.0
[pip3] torchsr==1.0.4
[pip3] torchtune==0.1.1
[pip3] torchvision==0.22.0
[conda] executorch                0.6.0a0+4517126          pypi_0    pypi
[conda] executorchcoreml          0.0.1                    pypi_0    pypi
[conda] numpy                     2.2.2                    pypi_0    pypi
[conda] torch                     2.7.0                    pypi_0    pypi
[conda] torchao                   0.10.0+git8b264ce1          pypi_0    pypi
[conda] torchaudio                2.7.0                    pypi_0    pypi
[conda] torchfix                  0.5.0                    pypi_0    pypi
[conda] torchsr                   1.0.4                    pypi_0    pypi
[conda] torchtune                 0.1.1                    pypi_0    pypi
[conda] torchvision               0.22.0                   pypi_0    pypi

cc @larryliu0820 @jathu

The text was updated successfully, but these errors were encountered:

lucylq · 2025-04-14T23:10:45Z

Also tried bash .ci/scripts/test_llama_torchao_lowbit.sh, as the test is running well in CI, however have the same error.

Ran ./install_executorch.sh --clean in between.

Fix #10166 target_link_options_shared_lib has different behavior for macos and other platforms; seems like whole-archive was causing issues. Test Plan `sh .ci/scripts/test_llama_torchao_lowbit.sh` https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#running-with-low-bit-kernels

Fix #10166 target_link_options_shared_lib has different behavior for macos and other platforms; seems like whole-archive was causing issues. Test Plan `sh .ci/scripts/test_llama_torchao_lowbit.sh` https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#running-with-low-bit-kernels (cherry picked from commit 5809428)

Fix #10166 target_link_options_shared_lib has different behavior for macos and other platforms; seems like whole-archive was causing issues. Test Plan `sh .ci/scripts/test_llama_torchao_lowbit.sh` https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#running-with-low-bit-kernels Co-authored-by: lucylq <[email protected]> Co-authored-by: Scott Roy <[email protected]>

Fix pytorch#10166 target_link_options_shared_lib has different behavior for macos and other platforms; seems like whole-archive was causing issues. Test Plan `sh .ci/scripts/test_llama_torchao_lowbit.sh` https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#running-with-low-bit-kernels

lucylq added the module: build/install Issues related to the cmake and buck2 builds, and to installing ExecuTorch label Apr 14, 2025

lucylq mentioned this issue Apr 14, 2025

Update llama cmake for custom ops #10176

Merged

lucylq closed this as completed in #10176 Apr 15, 2025

pytorchbot mentioned this issue Apr 15, 2025

Update llama cmake for custom ops #10201

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 0.6 llama low-bit kernels #10166

Release 0.6 llama low-bit kernels #10166

lucylq commented Apr 14, 2025 •

edited by pytorch-bot bot

Loading

lucylq commented Apr 14, 2025

Release 0.6 llama low-bit kernels #10166

Release 0.6 llama low-bit kernels #10166

Comments

lucylq commented Apr 14, 2025 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Versions

lucylq commented Apr 14, 2025

lucylq commented Apr 14, 2025 •

edited by pytorch-bot bot

Loading