|
| 1 | +# experimental support of Ryzen 7x40 (Linux) |
| 2 | +in my case a 7940HS with 64Go of RAM.with fedora:41/rocm-hip:6.2.1 |
| 3 | + |
| 4 | +The backend only add mulmat(bf16) support OP with hip (no use for rocblas.) There is no limit on RAM usage (GTT/VRAM) weight are allocate on RAM. |
| 5 | + |
| 6 | +If you want to test: |
| 7 | + |
| 8 | +```sh |
| 9 | +# build: |
| 10 | +rm -rf build/igpu |
| 11 | +cmake -S . -B build/igpu -DGGML_IGPU=ON -DAMDGPU_TARGETS=gfx1103 -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=OFF |
| 12 | +cmake --build build/igpu --config Release -- -j 8 |
| 13 | + |
| 14 | +# run: (please use -ngl 999 --no-mmap -ctk bf16 -ctv bf16 for the best) |
| 15 | +build/igpu/bin/llama-cli --color -ngl 999 --no-mmap -ctk bf16 -ctv bf16 -m Meta-Llama-3.1-8B-Instruct.BF16.gguf |
| 16 | +``` |
| 17 | + |
| 18 | +to be fare there is some aleatory crache with 'MES' error, may need some correction on AMD firmware |
| 19 | + |
| 20 | +01/03/2025: 1er version of kernel (V1) (support only BF16 quantisation) |
| 21 | +14/03/2025: create a new kernel (V2) (support only BF16 quantisation) |
| 22 | +01/04/2025: V4 optimise small N |
| 23 | +15/04/2025: V5 kernel support BF16 & FP16 quant |
| 24 | +25/05/2025: V7 optimised tensor loading (WIP) |
| 25 | + |
| 26 | +Next: |
| 27 | + - create kernel for FP8 and support optional conversion of weight (FP16/BF16/FP32) to BFP on load. |
| 28 | + - create true block kernel for CPU ("blis" like)? |
| 29 | + |
| 30 | +Some result (when it did not crash): |
| 31 | + |
| 32 | +## Llama-3.2-1B-Instruct/BF16.gguf |
| 33 | +| model | size | params | type_k | type_v | test | CPU | V1 | V2 | V4 | V9 | Vulkan | |
| 34 | +| --------------- | ---------: | -------: | -----: | -----: | ----: | -----: | ------: | ------: | ------: | ------: | ------: | |
| 35 | +| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp1 | 23.26 | 18.53 | 27.59 | 30.14 | 30.21 | 30.99 | |
| 36 | +| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp2 | 45.39 | 36.20 | 34.22 | 57.68 | 57.89 | 60.76 | |
| 37 | +| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp4 | 90.47 | 71.78 | 65.12 | 111.07 | 111.81 | 117.07 | |
| 38 | +| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp8 | 176.86 | 139.26 | 119.79 | 200.94 | 201.25 | 229.28 | |
| 39 | +| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp16 | 344.33 | 266.42 | 200.51 | 315.39 | 314.93 | 196.28 | |
| 40 | +| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp32 | 562.30 | 422.50 | 429.52 | 423.95 | 596.81 | 366.10 | |
| 41 | +| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp48 | 665.70 | 653.25 | 601.83 | 597.82 | 912.48 | 594.74 | |
| 42 | +| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp64 | 679.13 | 717.96 | 760.94 | 764.79 | 1134.05 | 744.75 | |
| 43 | +| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp128 | 723.15 | 990.37 | 1062.69 | 1061.43 | 1632.71 | 1007.61 | |
| 44 | +| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp192 | 738.65 | 1131.50 | 1304.20 | 1298.02 | 1904.42 | 1054.13 | |
| 45 | +| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp256 | 746.87 | 1151.29 | 1326.96 | 1329.72 | 1832.45 | 1153.88 | |
| 46 | +| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp384 | 714.54 | 1178.65 | 1220.25 | 1197.43 | 1355.90 | 1238.02 | |
| 47 | +| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp512 | 677.09 | 963.16 | 950.69 | 946.85 | 958.19 | 1207.43 | |
| 48 | +| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp768 | 665.30 | 901.93 | 884.07 | 874.94 | 913.08 | 1162.78 | |
| 49 | +| llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | tg16 | 23.00 | 18.26 | 27.69 | 30.13 | 30.16 | 31.17 | |
| 50 | + |
| 51 | + |
| 52 | +## Llama-3.2-3B-Instruct/BF16.gguf |
| 53 | +| model | size | params | type_k | type_v | test | CPU | V1 | V2 | V4 | V9 | Vulkan | |
| 54 | +| --------------- | ---------: | -------: | -----: | -----: | ----: | -----: | -----: | -----: | -----: | -----: | -----: | |
| 55 | +| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp1 | 8.94 | 7.85 | 11.03 | 11.84 | 11.83 | 12.07 | |
| 56 | +| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp2 | 17.56 | 15.67 | 14.61 | 23.08 | 22.86 | 23.67 | |
| 57 | +| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp4 | 35.02 | 31.11 | 27.86 | 44.61 | 44.23 | 44.96 | |
| 58 | +| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp8 | 69.18 | 61.01 | 51.21 | 82.57 | 81.46 | 90.41 | |
| 59 | +| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp16 | 131.72 | 117.77 | 86.80 | 135.50 | 135.25 | 78.39 | |
| 60 | +| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp32 | 209.28 | 185.05 | 178.08 | 176.60 | 258.01 | 142.46 | |
| 61 | +| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp48 | 232.70 | 273.60 | 249.61 | 251.45 | 364.37 | 196.73 | |
| 62 | +| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp64 | 237.90 | 300.62 | 313.17 | 316.92 | 445.82 | 246.77 | |
| 63 | +| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp128 | 261.37 | 390.84 | 438.12 | 438.36 | 673.04 | 316.93 | |
| 64 | +| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp192 | 263.82 | 445.00 | 506.12 | 504.17 | 760.73 | 368.65 | |
| 65 | +| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp256 | 265.27 | 450.11 | 516.21 | 512.75 | 750.77 | 373.97 | |
| 66 | +| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp384 | 261.27 | 470.54 | 485.27 | 476.42 | 682.73 | 400.52 | |
| 67 | +| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp512 | 254.72 | 441.51 | 480.40 | 479.50 | 559.39 | 390.60 | |
| 68 | +| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp768 | 253.87 | 429.79 | 462.86 | 462.20 | 538.43 | 384.43 | |
| 69 | +| llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | tg16 | 8.90 | 7.85 | 11.02 | 11.88 | 11.89 | 12.30 | |
| 70 | + |
| 71 | + |
| 72 | +## Meta-Llama-3.1-8B-Instruct/BF16.gguf |
| 73 | +| model | size | params | type_k | type_v | test | CPU | V1 | V2 | V4 | V9 | Vulkan | |
| 74 | +| --------------- | ---------: | -------: | -----: | -----: | ----: | -----: | -----: | -----: | -----: | -----: | -----: | |
| 75 | +| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp1 | 3.88 | 3.88 | 4.88 | 5.21 | 5.21 | 5.35 | |
| 76 | +| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp2 | 7.59 | 7.74 | 7.40 | 10.12 | 10.12 | 10.60 | |
| 77 | +| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp4 | 15.04 | 15.43 | 14.20 | 19.67 | 19.62 | 20.59 | |
| 78 | +| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp8 | 29.73 | 30.23 | 26.37 | 36.74 | 36.60 | 40.71 | |
| 79 | +| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp16 | 56.55 | 58.55 | 45.95 | 61.51 | 61.39 | 41.17 | |
| 80 | +| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp32 | 84.81 | 91.54 | 83.38 | 81.09 | 117.81 | 75.68 | |
| 81 | +| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp48 | 90.43 | 114.77 | 116.55 | 114.14 | 163.96 | 106.00 | |
| 82 | +| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp64 | 85.45 | 137.17 | 139.46 | 142.46 | 200.57 | 132.83 | |
| 83 | +| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp128 | 103.68 | 152.59 | 195.33 | 192.79 | 277.65 | 150.98 | |
| 84 | +| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp192 | 107.07 | 183.30 | 215.62 | 217.06 | 294.23 | 159.43 | |
| 85 | +| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp256 | 107.43 | 185.74 | 235.19 | 233.90 | 304.70 | 164.52 | |
| 86 | +| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp384 | 106.74 | 213.56 | 230.65 | 229.00 | 316.93 | 168.15 | |
| 87 | +| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp512 | 104.39 | 203.01 | 232.16 | 231.73 | 306.62 | 167.31 | |
| 88 | +| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp768 | 104.19 | 194.98 | 225.46 | 225.09 | 290.90 | 165.74 | |
| 89 | +| llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | tg16 | 3.88 | 3.88 | 4.87 | 5.21 | 5.21 | 5.36 | |
| 90 | + |
| 91 | + |
| 92 | +## Mistral-Nemo-Instruct-2407/BF16.gguf |
| 93 | +| model | size | params | type_k | type_v | test | CPU | V1 | V2 | V4 | Vulkan | |
| 94 | +| --------------- | ---------: | -------: | -----: | -----: | ----: | -----: | -----: | -----: | -----: | -----: | |
| 95 | +| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp1 | 2.52 | 2.76 | 3.16 | 3.39 | 3.47 | |
| 96 | +| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp2 | 4.94 | 5.49 | 4.90 | 6.59 | 6.89 | |
| 97 | +| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp4 | 9.82 | 10.92 | 9.42 | 12.85 | 13.38 | |
| 98 | +| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp8 | 19.40 | 21.60 | 17.56 | 23.92 | 25.51 | |
| 99 | +| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp16 | 36.85 | 42.03 | 30.77 | 40.88 | 12.83 | |
| 100 | +| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp32 | 50.40 | 65.33 | 56.43 | 55.22 | 22.44 | |
| 101 | +| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp48 | 52.77 | 77.46 | 76.93 | 75.94 | 37.75 | |
| 102 | +| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp64 | 54.65 | 94.48 | 93.57 | 94.02 | 48.15 | |
| 103 | +| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp128 | 65.72 | 103.87 | 127.90 | 128.54 | 51.19 | |
| 104 | +| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp192 | 67.66 | 121.43 | 143.60 | 147.41 | 54.16 | |
| 105 | +| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp256 | 68.45 | 130.03 | 156.00 | 155.52 | 54.07 | |
| 106 | +| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp384 | 67.64 | 142.89 | 154.52 | 153.33 | 54.42 | |
| 107 | +| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp512 | 67.02 | 136.18 | 156.22 | 156.51 | 46.71 | |
| 108 | +| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp768 | 66.74 | 130.78 | 151.59 | 151.78 | 46.73 | |
| 109 | +| llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | tg16 | 2.52 | 2.76 | 3.16 | 3.39 | 3.48 | |
| 110 | + |
| 111 | + |
| 112 | +## Mistral-Small-24B-Instruct-2501/BF16.gguf |
| 113 | +| model | size | params | type_k | type_v | test | CPU | V1 | V2 | V4 | |
| 114 | +| --------------- | ---------: | -------: | -----: | -----: | ----: | -----: | -----: | -----: | ------: | |
| 115 | +| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp1 | 1.28 | 1.39 | 1.64 | 1.73 | |
| 116 | +| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp2 | 2.52 | 2.76 | 2.71 | 3.40 | |
| 117 | +| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp4 | 5.02 | 5.50 | 5.26 | 6.63 | |
| 118 | +| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp8 | 9.87 | 10.89 | 9.94 | 12.52 | |
| 119 | +| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp16 | 18.32 | 21.32 | 17.86 | 22.36 | |
| 120 | +| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp32 | 25.53 | 34.65 | 31.50 | 30.18 | |
| 121 | +| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp48 | 24.53 | 36.05 | 43.93 | 43.43 | |
| 122 | +| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp64 | 25.88 | 47.87 | 53.96 | 53.73 | |
| 123 | +| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp128 | 29.69 | 52.03 | 69.64 | 65.84 | |
| 124 | +| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp192 | 29.99 | 61.00 | 79.73 | 80.14 | |
| 125 | +| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp256 | 30.94 | 63.11 | 87.30 | 87.01 | |
| 126 | +| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp384 | 32.51 | 75.00 | 86.26 | - | |
| 127 | +| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp512 | 32.28 | 71.11 | 88.11 | - | |
| 128 | +| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp768 | 32.02 | 67.33 | 85.47 | - | |
| 129 | +| llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | tg16 | 1.28 | 1.38 | 1.62 | - | |
| 130 | + |
| 131 | +------------------------------- |
| 132 | + |
1 | 133 | # llama.cpp
|
2 | 134 |
|
3 | 135 | 
|
|
0 commit comments