Support for Intel Neural Processing Unit (NPU) and Intel Arc GPU acceleration in llama.cpp #15883

ItsMeForLua · 2025-09-08T22:15:49Z

ItsMeForLua
Sep 8, 2025

Currently, llama.cpp provides CPU and NVIDIA CUDA GPU acceleration backends, but native support for Intel's hardware accelerators is not available yet.
My laptop is equipped with an Intel Arc GPU and supports Intel NPU via a well-documented C API (Intel NPU Acceleration Library). Enabling support in llama.cpp to utilize these accelerators would lead to significantly better inference performance and efficiency, particularly for laptops and systems without NVIDIA GPUs.
(i used an AI for portions of this report so ignore the awkward phrasing)
The existing Intel NPU C API is accessible and theoretically would integrate well (e.g., via OpenVINO and Intel's acceleration libraries) Given that hardware acceleration is critical for efficient large language model inference, this feature would benefit users with Intel hardware by increasing throughput and reducing CPU load and power consumption.
Possible Implementation:

Investigate integrating Intel NPU Acceleration Library for neural network kernel offloading and inference.
Add Vulkan or OpenCL backend support for Intel Arc GPU acceleration to leverage GPU parallelism.
Coordinate with upstream Intel projects like OpenVINO and Intel NPU libraries to ensure compatibility.
Provide configuration flags or detection mechanisms to enable the use of Intel hardware accelerators when available.
Additional Context:
Intel NPU C API and related acceleration libraries are documented here:
Intel NPU Acceleration Library
Intel Arc GPU supports Vulkan and OpenCL and is gaining adoption for AI workloads on consumer hardware.
This would greatly enhance the performance of llama.cpp on Intel-based laptops and desktops without relying solely on CPUs.

NPU Illustration: From: Intel NPU Acceleration Library

NeoZhangJianyu · 2025-09-09T01:04:34Z

NeoZhangJianyu
Sep 9, 2025
Collaborator

@ItsMeForLua
SYCL/Vulkan/OpenBlas(OpenCL) backends already support Intel GPU.

There is a PR to enable OpenVINO as new backend in llama.cpp, which could support Intel CPU/GPU/NPU.

1 reply

ItsMeForLua Sep 9, 2025
Author

@ItsMeForLua

SYCL/Vulkan/OpenBlas(OpenCL) backends already support Intel GPU.

There is a PR to enable OpenVINO as new backend in llama.cpp, which could support Intel CPU/GPU/NPU.

Ahh, then I overlooked the PR, my apologies.
I'm glad because to utilize my Integrated GPU, I'd have to switch over to lmstudio, which would be a hit or miss performance-wise, since lmstudio obviously is memory intensive, and has suspicious memory leakage.

This is something I'd have to do for certain models, for certain situations.

The NPU ordeal though: honestly has just been a waiting game, since many models in general don't have official support for NPU in the first place (yet) because it's relatively new.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for Intel Neural Processing Unit (NPU) and Intel Arc GPU acceleration in llama.cpp #15883

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Support for Intel Neural Processing Unit (NPU) and Intel Arc GPU acceleration in llama.cpp #15883

Uh oh!

ItsMeForLua Sep 8, 2025

Replies: 1 comment · 1 reply

Uh oh!

NeoZhangJianyu Sep 9, 2025 Collaborator

Uh oh!

ItsMeForLua Sep 9, 2025 Author

ItsMeForLua
Sep 8, 2025

Replies: 1 comment 1 reply

NeoZhangJianyu
Sep 9, 2025
Collaborator

ItsMeForLua Sep 9, 2025
Author