2626
2727### Llama.cpp + SYCL
2828
29- The llama.cpp SYCL backend is designed to support ** Intel GPU** firstly. Based on the cross-platform feature of SYCL, it could support other vendor GPUs: Nvidia GPU ( * AMD GPU coming * ) .
29+ The llama.cpp SYCL backend is designed to support ** Intel GPU** firstly. Based on the cross-platform feature of SYCL, it also supports other vendor GPUs: Nvidia and AMD.
3030
3131## Recommended Release
3232
@@ -115,10 +115,18 @@ SYCL backend supports Intel GPU Family:
115115
116116** Verified devices**
117117
118- | Nvidia GPU | Status | Verified Model |
119- | --------------------------| ---------| ----------------|
120- | Ampere Series | Support | A100, A4000 |
121- | Ampere Series * (Mobile)* | Support | RTX 40 Series |
118+ | Nvidia GPU | Status | Verified Model |
119+ | --------------------------| -----------| ----------------|
120+ | Ampere Series | Supported | A100, A4000 |
121+ | Ampere Series * (Mobile)* | Supported | RTX 40 Series |
122+
123+ | AMD GPU | Status | Verified Model |
124+ | --------------------------| --------------| ----------------|
125+ | Radeon Pro | Experimental | W6800 |
126+ | Radeon RX | Experimental | 6700 XT |
127+
128+ Note: AMD GPU support is highly experimental and is incompatible with F16.
129+ Additionally, it only supports GPUs with a sub_group_size (warp size) of 32.
122130
123131## Docker
124132The docker build option is currently limited to * intel GPU* targets.
@@ -190,6 +198,10 @@ Platform #0: Intel(R) OpenCL HD Graphics
190198
191199In order to target Nvidia GPUs through SYCL, please make sure the CUDA/CUBLAS native requirements * -found [ here] ( README.md#cuda ) -* are installed.
192200
201+ - ** AMD GPU**
202+
203+ To target AMD GPUs with SYCL, the ROCm stack must be installed first.
204+
1932052 . ** Install Intel® oneAPI Base toolkit**
194206
195207- ** For Intel GPU**
@@ -216,6 +228,19 @@ cmake -B buildWithCublas -DCMAKE_CXX_COMPILER=icpx -DCMAKE_C_COMPILER=icx -DENAB
216228cmake --build buildWithCublas --config Release
217229```
218230
231+ - ** Adding support to AMD GPUs**
232+
233+ ** oneAPI Plugin** : In order to enable SYCL support on AMD GPUs, please install the [ Codeplay oneAPI Plugin for AMD GPUs] ( https://developer.codeplay.com/products/oneapi/amd/download ) . As with Nvidia GPUs, the user should also make sure the plugin version matches the installed base toolkit.
234+
235+ ** oneMKL for rocBlas** : The current oneMKL releases * (shipped with the oneAPI base-toolkit)* doesn't contain the rocBLAS backend. A build from source of the upstream [ oneMKL] ( https://github.com/oneapi-src/oneMKL ) with the * rocBLAS* backend enabled is thus required to run it on AMD GPUs.
236+
237+ ``` sh
238+ git clone https://github.com/oneapi-src/oneMKL
239+ cd oneMKL
240+ # Find your HIPTARGET with rocminfo, under the key 'Name:'
241+ cmake -B buildWithrocBLAS -DCMAKE_CXX_COMPILER=icpx -DCMAKE_C_COMPILER=icx -DENABLE_MKLGPU_BACKEND=OFF -DENABLE_MKLCPU_BACKEND=OFF -DENABLE_ROCBLAS_BACKEND=ON -DHIPTARGETS=${HIPTARGET} -DTARGET_DOMAINS=blas
242+ cmake --build buildWithrocBLAS --config Release
243+ ```
219244
2202453 . ** Verify installation and environment**
221246
@@ -227,22 +252,32 @@ sycl-ls
227252
228253- ** Intel GPU**
229254
230- When targeting an intel GPU, the user should expect one or more level-zero devices among the available SYCL devices. Please make sure that at least one GPU is present, for instance [ ` ext_oneapi_level_zero :gpu:0 ` ] in the sample output below:
255+ When targeting an intel GPU, the user should expect one or more level-zero devices among the available SYCL devices. Please make sure that at least one GPU is present, for instance [ ` level_zero :gpu` ] in the sample output below:
231256
232257```
233- [opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
234- [opencl:cpu:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
235- [opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.30.26918.50]
236- [ext_oneapi_level_zero :gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918]
258+ [opencl:acc][opencl :0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
259+ [opencl:cpu][opencl :1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
260+ [opencl:gpu][opencl :2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.30.26918.50]
261+ [level_zero :gpu][level_zero :0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918]
237262```
238263
239264- ** Nvidia GPU**
240265
241- Similarly, user targeting Nvidia GPUs should expect at least one SYCL-CUDA device [ ` ext_oneapi_cuda:gpu ` ] as bellow:
266+ Similarly, user targeting Nvidia GPUs should expect at least one SYCL-CUDA device [ ` cuda:gpu ` ] as below:
267+
268+ ```
269+ [opencl:acc][opencl:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
270+ [opencl:cpu][opencl:1] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
271+ [cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA A100-PCIE-40GB 8.0 [CUDA 12.5]
272+ ```
273+
274+ - ** AMD GPU**
275+
276+ For AMD GPUs we should expect at least one SYCL-HIP device [ ` hip:gpu ` ] :
277+
242278```
243- [opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
244- [opencl:cpu:1] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
245- [ext_oneapi_cuda:gpu:0] NVIDIA CUDA BACKEND, NVIDIA A100-PCIE-40GB 8.0 [CUDA 12.2]
279+ [opencl:cpu][opencl:0] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i9-12900K OpenCL 3.0 (Build 0) [2024.18.6.0.02_160000]
280+ [hip:gpu][hip:0] AMD HIP BACKEND, AMD Radeon PRO W6800 gfx1030 [HIP 60140.9]
246281```
247282
248283### II. Build llama.cpp
@@ -270,6 +305,7 @@ cmake --build build --config Release -j -v
270305```
271306
272307#### Nvidia GPU
308+
273309``` sh
274310# Export relevant ENV variables
275311export LD_LIBRARY_PATH=/path/to/oneMKL/buildWithCublas/lib:$LD_LIBRARY_PATH
@@ -287,7 +323,25 @@ cmake -B build -DGGML_SYCL=ON -DGGML_SYCL_TARGET=NVIDIA -DCMAKE_C_COMPILER=icx -
287323
288324# build all binary
289325cmake --build build --config Release -j -v
326+ ```
327+
328+ #### AMD GPU
290329
330+ ``` sh
331+ # Export relevant ENV variables
332+ export LD_LIBRARY_PATH=/path/to/oneMKL/buildWithrocBLAS/lib:$LD_LIBRARY_PATH
333+ export LIBRARY_PATH=/path/to/oneMKL/buildWithrocBLAS/lib:$LIBRARY_PATH
334+ export CPLUS_INCLUDE_DIR=/path/to/oneMKL/buildWithrocBLAS/include:$CPLUS_INCLUDE_DIR
335+
336+ # Build LLAMA with rocBLAS acceleration through SYCL
337+
338+ # # AMD
339+ # Use FP32, FP16 is not supported
340+ # Find your GGML_SYCL_HIP_TARGET with rocminfo, under the key 'Name:'
341+ cmake -B build -DGGML_SYCL=ON -DGGML_SYCL_TARGET=AMD -DGGML_SYCL_HIP_TARGET=${GGML_SYCL_HIP_TARGET} -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
342+
343+ # build all binary
344+ cmake --build build --config Release -j -v
291345```
292346
293347### III. Run the inference
@@ -623,11 +677,11 @@ use 1 SYCL GPUs: [0] with Max compute units:512
623677
624678#### Build
625679
626- | Name | Value | Function |
627- | --------------------| -----------------------------------| ---------------------------------------------|
628- | GGML_SYCL | ON (mandatory) | Enable build with SYCL code path.<br >FP32 path - recommended for better perforemance than FP16 on quantized model|
629- | GGML_SYCL_TARGET | INTEL * (default)* \| NVIDIA | Set the SYCL target device type. |
630- | GGML_SYCL_F16 | OFF * (default)* \| ON * (optional)* | Enable FP16 build with SYCL code path. |
680+ | Name | Value | Function |
681+ | --------------------| --------------------------------------- | ---------------------------------------------|
682+ | GGML_SYCL | ON (mandatory) | Enable build with SYCL code path.<br >FP32 path - recommended for better perforemance than FP16 on quantized model|
683+ | GGML_SYCL_TARGET | INTEL * (default)* \| NVIDIA \| AMD | Set the SYCL target device type. |
684+ | GGML_SYCL_F16 | OFF * (default)* \| ON * (optional)* | Enable FP16 build with SYCL code path. |
631685| CMAKE_C_COMPILER | ` icx ` * (Linux)* , ` icx/cl ` * (Windows)* | Set ` icx ` compiler for SYCL code path. |
632686| CMAKE_CXX_COMPILER | ` icpx ` * (Linux)* , ` icx ` * (Windows)* | Set ` icpx/icx ` compiler for SYCL code path. |
633687
0 commit comments