Skip to content

Commit cda673c

Browse files
committed
New backend for RDNA3-igpu
Il y a plusieurs version et on ne code (pour l'instant) que le mulmat. - small optim for iGPU (V6) backend - iGPU backend (V7): optimised tensor loading - iGPU backend (V8) - iGPU backend (V9) Ca devrait etre fonctionel au moins la V9+. - vu les resultat de la compagne de bench on peu faire mieux (meme avec la V9 ...) - add cmake option to enable compiler output of kernel resource usage metrics
1 parent f10ee3a commit cda673c

33 files changed

+9722
-0
lines changed

README.md

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,135 @@
1+
# experimental support of Ryzen 7x40 (Linux)
2+
in my case a 7940HS with 64Go of RAM.with fedora:41/rocm-hip:6.2.1
3+
4+
The backend only add mulmat(bf16) support OP with hip (no use for rocblas.) There is no limit on RAM usage (GTT/VRAM) weight are allocate on RAM.
5+
6+
If you want to test:
7+
8+
```sh
9+
# build:
10+
rm -rf build/igpu
11+
cmake -S . -B build/igpu -DGGML_IGPU=ON -DAMDGPU_TARGETS=gfx1103 -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=OFF
12+
cmake --build build/igpu --config Release -- -j 8
13+
14+
# run: (please use -ngl 999 --no-mmap -ctk bf16 -ctv bf16 for the best)
15+
build/igpu/bin/llama-cli --color -ngl 999 --no-mmap -ctk bf16 -ctv bf16 -m Meta-Llama-3.1-8B-Instruct.BF16.gguf
16+
```
17+
18+
to be fare there is some aleatory crache with 'MES' error, may need some correction on AMD firmware
19+
20+
01/03/2025: 1er version of kernel (V1) (support only BF16 quantisation)
21+
14/03/2025: create a new kernel (V2) (support only BF16 quantisation)
22+
01/04/2025: V4 optimise small N
23+
15/04/2025: V5 kernel support BF16 & FP16 quant
24+
25/05/2025: V7 optimised tensor loading (WIP)
25+
26+
Next:
27+
- create kernel for FP8 and support optional conversion of weight (FP16/BF16/FP32) to BFP on load.
28+
- create true block kernel for CPU ("blis" like)?
29+
30+
Some result (when it did not crash):
31+
32+
## Llama-3.2-1B-Instruct/BF16.gguf
33+
| model | size | params | type_k | type_v | test | CPU | V1 | V2 | V4 | V9 | Vulkan |
34+
| --------------- | ---------: | -------: | -----: | -----: | ----: | -----: | ------: | ------: | ------: | ------: | ------: |
35+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp1 | 23.26 | 18.53 | 27.59 | 30.14 | 30.21 | 30.99 |
36+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp2 | 45.39 | 36.20 | 34.22 | 57.68 | 57.89 | 60.76 |
37+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp4 | 90.47 | 71.78 | 65.12 | 111.07 | 111.81 | 117.07 |
38+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp8 | 176.86 | 139.26 | 119.79 | 200.94 | 201.25 | 229.28 |
39+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp16 | 344.33 | 266.42 | 200.51 | 315.39 | 314.93 | 196.28 |
40+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp32 | 562.30 | 422.50 | 429.52 | 423.95 | 596.81 | 366.10 |
41+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp48 | 665.70 | 653.25 | 601.83 | 597.82 | 912.48 | 594.74 |
42+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp64 | 679.13 | 717.96 | 760.94 | 764.79 | 1134.05 | 744.75 |
43+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp128 | 723.15 | 990.37 | 1062.69 | 1061.43 | 1632.71 | 1007.61 |
44+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp192 | 738.65 | 1131.50 | 1304.20 | 1298.02 | 1904.42 | 1054.13 |
45+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp256 | 746.87 | 1151.29 | 1326.96 | 1329.72 | 1832.45 | 1153.88 |
46+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp384 | 714.54 | 1178.65 | 1220.25 | 1197.43 | 1355.90 | 1238.02 |
47+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp512 | 677.09 | 963.16 | 950.69 | 946.85 | 958.19 | 1207.43 |
48+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp768 | 665.30 | 901.93 | 884.07 | 874.94 | 913.08 | 1162.78 |
49+
| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | tg16 | 23.00 | 18.26 | 27.69 | 30.13 | 30.16 | 31.17 |
50+
51+
52+
## Llama-3.2-3B-Instruct/BF16.gguf
53+
| model | size | params | type_k | type_v | test | CPU | V1 | V2 | V4 | V9 | Vulkan |
54+
| --------------- | ---------: | -------: | -----: | -----: | ----: | -----: | -----: | -----: | -----: | -----: | -----: |
55+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp1 | 8.94 | 7.85 | 11.03 | 11.84 | 11.83 | 12.07 |
56+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp2 | 17.56 | 15.67 | 14.61 | 23.08 | 22.86 | 23.67 |
57+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp4 | 35.02 | 31.11 | 27.86 | 44.61 | 44.23 | 44.96 |
58+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp8 | 69.18 | 61.01 | 51.21 | 82.57 | 81.46 | 90.41 |
59+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp16 | 131.72 | 117.77 | 86.80 | 135.50 | 135.25 | 78.39 |
60+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp32 | 209.28 | 185.05 | 178.08 | 176.60 | 258.01 | 142.46 |
61+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp48 | 232.70 | 273.60 | 249.61 | 251.45 | 364.37 | 196.73 |
62+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp64 | 237.90 | 300.62 | 313.17 | 316.92 | 445.82 | 246.77 |
63+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp128 | 261.37 | 390.84 | 438.12 | 438.36 | 673.04 | 316.93 |
64+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp192 | 263.82 | 445.00 | 506.12 | 504.17 | 760.73 | 368.65 |
65+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp256 | 265.27 | 450.11 | 516.21 | 512.75 | 750.77 | 373.97 |
66+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp384 | 261.27 | 470.54 | 485.27 | 476.42 | 682.73 | 400.52 |
67+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp512 | 254.72 | 441.51 | 480.40 | 479.50 | 559.39 | 390.60 |
68+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp768 | 253.87 | 429.79 | 462.86 | 462.20 | 538.43 | 384.43 |
69+
| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | tg16 | 8.90 | 7.85 | 11.02 | 11.88 | 11.89 | 12.30 |
70+
71+
72+
## Meta-Llama-3.1-8B-Instruct/BF16.gguf
73+
| model | size | params | type_k | type_v | test | CPU | V1 | V2 | V4 | V9 | Vulkan |
74+
| --------------- | ---------: | -------: | -----: | -----: | ----: | -----: | -----: | -----: | -----: | -----: | -----: |
75+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp1 | 3.88 | 3.88 | 4.88 | 5.21 | 5.21 | 5.35 |
76+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp2 | 7.59 | 7.74 | 7.40 | 10.12 | 10.12 | 10.60 |
77+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp4 | 15.04 | 15.43 | 14.20 | 19.67 | 19.62 | 20.59 |
78+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp8 | 29.73 | 30.23 | 26.37 | 36.74 | 36.60 | 40.71 |
79+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp16 | 56.55 | 58.55 | 45.95 | 61.51 | 61.39 | 41.17 |
80+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp32 | 84.81 | 91.54 | 83.38 | 81.09 | 117.81 | 75.68 |
81+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp48 | 90.43 | 114.77 | 116.55 | 114.14 | 163.96 | 106.00 |
82+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp64 | 85.45 | 137.17 | 139.46 | 142.46 | 200.57 | 132.83 |
83+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp128 | 103.68 | 152.59 | 195.33 | 192.79 | 277.65 | 150.98 |
84+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp192 | 107.07 | 183.30 | 215.62 | 217.06 | 294.23 | 159.43 |
85+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp256 | 107.43 | 185.74 | 235.19 | 233.90 | 304.70 | 164.52 |
86+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp384 | 106.74 | 213.56 | 230.65 | 229.00 | 316.93 | 168.15 |
87+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp512 | 104.39 | 203.01 | 232.16 | 231.73 | 306.62 | 167.31 |
88+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp768 | 104.19 | 194.98 | 225.46 | 225.09 | 290.90 | 165.74 |
89+
| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | tg16 | 3.88 | 3.88 | 4.87 | 5.21 | 5.21 | 5.36 |
90+
91+
92+
## Mistral-Nemo-Instruct-2407/BF16.gguf
93+
| model | size | params | type_k | type_v | test | CPU | V1 | V2 | V4 | Vulkan |
94+
| --------------- | ---------: | -------: | -----: | -----: | ----: | -----: | -----: | -----: | -----: | -----: |
95+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp1 | 2.52 | 2.76 | 3.16 | 3.39 | 3.47 |
96+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp2 | 4.94 | 5.49 | 4.90 | 6.59 | 6.89 |
97+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp4 | 9.82 | 10.92 | 9.42 | 12.85 | 13.38 |
98+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp8 | 19.40 | 21.60 | 17.56 | 23.92 | 25.51 |
99+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp16 | 36.85 | 42.03 | 30.77 | 40.88 | 12.83 |
100+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp32 | 50.40 | 65.33 | 56.43 | 55.22 | 22.44 |
101+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp48 | 52.77 | 77.46 | 76.93 | 75.94 | 37.75 |
102+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp64 | 54.65 | 94.48 | 93.57 | 94.02 | 48.15 |
103+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp128 | 65.72 | 103.87 | 127.90 | 128.54 | 51.19 |
104+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp192 | 67.66 | 121.43 | 143.60 | 147.41 | 54.16 |
105+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp256 | 68.45 | 130.03 | 156.00 | 155.52 | 54.07 |
106+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp384 | 67.64 | 142.89 | 154.52 | 153.33 | 54.42 |
107+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp512 | 67.02 | 136.18 | 156.22 | 156.51 | 46.71 |
108+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp768 | 66.74 | 130.78 | 151.59 | 151.78 | 46.73 |
109+
| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | tg16 | 2.52 | 2.76 | 3.16 | 3.39 | 3.48 |
110+
111+
112+
## Mistral-Small-24B-Instruct-2501/BF16.gguf
113+
| model | size | params | type_k | type_v | test | CPU | V1 | V2 | V4 |
114+
| --------------- | ---------: | -------: | -----: | -----: | ----: | -----: | -----: | -----: | ------: |
115+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp1 | 1.28 | 1.39 | 1.64 | 1.73 |
116+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp2 | 2.52 | 2.76 | 2.71 | 3.40 |
117+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp4 | 5.02 | 5.50 | 5.26 | 6.63 |
118+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp8 | 9.87 | 10.89 | 9.94 | 12.52 |
119+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp16 | 18.32 | 21.32 | 17.86 | 22.36 |
120+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp32 | 25.53 | 34.65 | 31.50 | 30.18 |
121+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp48 | 24.53 | 36.05 | 43.93 | 43.43 |
122+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp64 | 25.88 | 47.87 | 53.96 | 53.73 |
123+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp128 | 29.69 | 52.03 | 69.64 | 65.84 |
124+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp192 | 29.99 | 61.00 | 79.73 | 80.14 |
125+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp256 | 30.94 | 63.11 | 87.30 | 87.01 |
126+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp384 | 32.51 | 75.00 | 86.26 | - |
127+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp512 | 32.28 | 71.11 | 88.11 | - |
128+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp768 | 32.02 | 67.33 | 85.47 | - |
129+
| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | tg16 | 1.28 | 1.38 | 1.62 | - |
130+
131+
-------------------------------
132+
1133
# llama.cpp
2134

3135
![llama](https://user-images.githubusercontent.com/1991296/230134379-7181e485-c521-4d23-a0d6-f7b3b61ba524.png)

ggml/CMakeLists.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,8 @@ set(GGML_BLAS_VENDOR ${GGML_BLAS_VENDOR_DEFAULT} CACHE STRING
154154
"ggml: BLAS library vendor")
155155
option(GGML_LLAMAFILE "ggml: use LLAMAFILE" ${GGML_LLAMAFILE_DEFAULT})
156156

157+
option(GGML_IGPU "ggml: use IGPU" OFF)
158+
157159
option(GGML_CUDA "ggml: use CUDA" OFF)
158160
option(GGML_MUSA "ggml: use MUSA" OFF)
159161
option(GGML_CUDA_FORCE_MMQ "ggml: use mmq kernels instead of cuBLAS" OFF)
@@ -272,6 +274,7 @@ set(GGML_PUBLIC_HEADERS
272274
include/ggml-cann.h
273275
include/ggml-cpp.h
274276
include/ggml-cuda.h
277+
include/ggml-igpu.h
275278
include/ggml-opt.h
276279
include/ggml-metal.h
277280
include/ggml-rpc.h

ggml/cmake/ggml-config.cmake.in

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,11 @@ if (NOT GGML_SHARED_LIB)
8383
set(GGML_HIP_INTERFACE_LINK_LIBRARIES hip::host roc::rocblas roc::hipblas)
8484
endif()
8585

86+
if (GGML_IGPU)
87+
find_package(hip REQUIRED)
88+
list(APPEND GGML_HIP_INTERFACE_LINK_LIBRARIES hip::host)
89+
endif()
90+
8691
if (GGML_SYCL)
8792
set(GGML_SYCL_INTERFACE_LINK_LIBRARIES "")
8893
find_package(DNNL)

ggml/include/ggml-alloc.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,8 @@ GGML_API size_t ggml_gallocr_get_buffer_size(ggml_gallocr_t galloc, int buffer_i
6868

6969
// Utils
7070
// Create a buffer and allocate all the tensors in a ggml_context
71+
// Y a un probleme, les tenseurs sont initialisé mais leur type n'est pas "bon"
72+
// TODO: ajouter "enum ggml_backend_buffer_usage usage = GGML_BACKEND_BUFFER_USAGE_ANY" en parametre !!!
7173
GGML_API struct ggml_backend_buffer * ggml_backend_alloc_ctx_tensors_from_buft(struct ggml_context * ctx, ggml_backend_buffer_type_t buft);
7274
GGML_API struct ggml_backend_buffer * ggml_backend_alloc_ctx_tensors(struct ggml_context * ctx, ggml_backend_t backend);
7375

ggml/include/ggml-igpu.h

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
#pragma once
2+
3+
#include "ggml.h"
4+
#include "ggml-backend.h"
5+
6+
7+
#ifdef __cplusplus
8+
extern "C" {
9+
#endif
10+
11+
GGML_BACKEND_API ggml_backend_reg_t ggml_backend_igpu_reg(void);
12+
13+
14+
#ifdef __cplusplus
15+
}
16+
#endif

ggml/src/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -378,6 +378,7 @@ ggml_add_backend(BLAS)
378378
ggml_add_backend(CANN)
379379
ggml_add_backend(CUDA)
380380
ggml_add_backend(HIP)
381+
ggml_add_backend(IGPU)
381382
ggml_add_backend(METAL)
382383
ggml_add_backend(MUSA)
383384
ggml_add_backend(RPC)

ggml/src/ggml-backend-reg.cpp

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,10 @@
6565
#include "ggml-cann.h"
6666
#endif
6767

68+
#ifdef GGML_USE_IGPU
69+
#include "ggml-igpu.h"
70+
#endif
71+
6872
// disable C++17 deprecation warning for std::codecvt_utf8
6973
#if defined(__clang__)
7074
# pragma clang diagnostic push
@@ -165,6 +169,9 @@ struct ggml_backend_registry {
165169
std::vector<ggml_backend_dev_t> devices;
166170

167171
ggml_backend_registry() {
172+
#ifdef GGML_USE_IGPU
173+
register_backend(ggml_backend_igpu_reg());
174+
#endif
168175
#ifdef GGML_USE_CUDA
169176
register_backend(ggml_backend_cuda_reg());
170177
#endif

ggml/src/ggml-igpu/CMakeLists.txt

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
if (WIN32)
2+
message(FATAL_ERROR "Not teste on Windows OS")
3+
endif()
4+
5+
# pas encore bon dans
6+
# => /usr/share/cmake/Modules/CMakeDetermineHIPCompiler.cmake
7+
# => /usr/lib64/cmake/hip-lang/hip-lang-config.cmake vs /usr/local/lib64/cmake/hip-lang/hip-lang-config.cmake
8+
# hipcc -v -print-targets | grep "Found HIP installation" => mauvais?
9+
# /usr/lib64/llvm18/bin/clang++ -v -print-targets | grep "Found HIP installation" => mauvais !!!
10+
# clang-18 -v -print-targets | grep "Found HIP installation" => OK
11+
# hipconfig --rocmpath => OK => on va le forcer
12+
13+
# forcer la dernier methode...
14+
if(NOT CMAKE_HIP_COMPILER_ROCM_ROOT)
15+
execute_process(
16+
COMMAND hipconfig --rocmpath
17+
OUTPUT_VARIABLE _CMAKE_HIPCONFIG_ROCMPATH
18+
RESULT_VARIABLE _CMAKE_HIPCONFIG_RESULT
19+
)
20+
if(_CMAKE_HIPCONFIG_RESULT EQUAL 0 AND EXISTS "${_CMAKE_HIPCONFIG_ROCMPATH}")
21+
set(CMAKE_HIP_COMPILER_ROCM_ROOT "${_CMAKE_HIPCONFIG_ROCMPATH}")
22+
endif()
23+
endif()
24+
25+
# Forward AMDGPU_TARGETS to CMAKE_HIP_ARCHITECTURES.
26+
if (AMDGPU_TARGETS AND NOT CMAKE_HIP_ARCHITECTURES)
27+
set(CMAKE_HIP_ARCHITECTURES ${AMDGPU_TARGETS})
28+
endif()
29+
cmake_minimum_required(VERSION 3.21)
30+
31+
enable_language(HIP)
32+
33+
find_package(hip REQUIRED)
34+
find_package(OpenMP REQUIRED)
35+
36+
# pas testé d'autre version...
37+
if (${hip_VERSION} VERSION_LESS 6.2)
38+
message(FATAL_ERROR "At least ROCM/HIP V6.2 is required")
39+
endif()
40+
41+
message(STATUS "HIP found")
42+
43+
set(TARGET_NAME ggml-igpu)
44+
45+
file(GLOB GGML_SOURCES_ROCM "*.cpp")
46+
#file(GLOB SRCS "*.hip")
47+
#list(APPEND GGML_SOURCES_ROCM ${SRCS})
48+
49+
ggml_add_backend_library(${TARGET_NAME}
50+
../../include/ggml-igpu.h
51+
ggml-hip.h
52+
mulmat.h
53+
mulmat-imp.h
54+
tensor.h
55+
types.h
56+
# mulmat-bf16.h
57+
mulmat-bf16bloc_V1.h
58+
mulmat-bf16bloc_V2.h
59+
mulmat-bf16bloc_V3.h
60+
mulmat-bf16bloc_V4.h
61+
mulmat-bf16bloc_V5.h
62+
${GGML_SOURCES_ROCM}
63+
)
64+
65+
add_compile_definitions(GGML_USE_IGPU)
66+
67+
set_source_files_properties(${GGML_SOURCES_ROCM} PROPERTIES LANGUAGE HIP)
68+
69+
target_compile_features(${TARGET_NAME} PRIVATE c_std_11 cxx_std_20)
70+
71+
target_compile_options(${TARGET_NAME} PRIVATE ${OpenMP_CXX_FLAGS})
72+
# target_link_libraries(${TARGET_NAME} PRIVATE hip::device ${OpenMP_CXX_FLAGS})
73+
74+
if (GGML_STATIC)
75+
message(FATAL_ERROR "Static linking not supported for HIP/ROCm")
76+
endif()
77+
78+
if (GGML_HIP_EXPORT_METRICS)
79+
set(CMAKE_HIP_FLAGS "${CMAKE_HIP_FLAGS} -Rpass-analysis=kernel-resource-usage --save-temps")
80+
endif()
81+
82+
83+
target_link_libraries(${TARGET_NAME} PRIVATE ggml-base hip::host OpenMP::OpenMP_CXX ${OpenMP_CXX_FLAGS})
84+
85+
message(STATUS "OpenMP_CXX_FLAGS ${OpenMP_CXX_FLAGS} ")

ggml/src/ggml-igpu/ggml-hip.cpp

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#include "ggml-hip.h"
2+
3+
namespace ggml::hip {
4+
void setDevice(int id) {
5+
HIP_CHECK_ERROR(hipSetDevice(id));
6+
}
7+
}

ggml/src/ggml-igpu/ggml-hip.h

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
#pragma once
2+
#include <hip/hip_runtime.h>
3+
4+
#include "ggml.h"
5+
6+
#define HIP_CHECK_ERROR(trt) \
7+
do { \
8+
hipError_t _tmpVal; \
9+
if((_tmpVal = trt) != hipSuccess) { \
10+
GGML_ABORT("HIP_ERROR(%s => %s)", hipGetErrorString(_tmpVal), #trt); \
11+
} \
12+
} while(0)
13+
14+
namespace ggml::hip {
15+
16+
template<typename T>
17+
T* allocateHost(const std::size_t size) {
18+
void * ptr;
19+
HIP_CHECK_ERROR(hipHostMalloc(&ptr, size*sizeof(T), hipHostMallocNonCoherent));
20+
return reinterpret_cast<T*>(ptr);
21+
}
22+
23+
template<typename T>
24+
T* allocateDevice(const std::size_t size) {
25+
void * ptr;
26+
HIP_CHECK_ERROR(hipMalloc(&ptr, size*sizeof(T)));
27+
return reinterpret_cast<T*>(ptr);
28+
}
29+
30+
template<typename T>
31+
void deallocateHost(T * ptr) {
32+
HIP_CHECK_ERROR(hipHostFree((void*)ptr));
33+
}
34+
35+
template<typename T>
36+
void deallocateDevice(T * ptr) {
37+
HIP_CHECK_ERROR(hipFree((void*)ptr));
38+
}
39+
40+
template<typename T>
41+
T* getDeviceMem(T* host_adr) {
42+
void * ptr=nullptr;
43+
HIP_CHECK_ERROR(hipHostGetDevicePointer(&ptr, host_adr, 0));
44+
return reinterpret_cast<T*>(ptr);
45+
}
46+
47+
void setDevice(int id);
48+
}

0 commit comments

Comments
 (0)