Skip to content

Commit 3e61258

Browse files
committed
Merge branch 'concedo_experimental' into _Frankenstein
2 parents b918325 + 15deabd commit 3e61258

18 files changed

+1004
-1280
lines changed

README_sycl.md renamed to README-sycl.md

Lines changed: 184 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,14 @@
88

99
[Linux](#linux)
1010

11+
[Windows](#windows)
12+
1113
[Environment Variable](#environment-variable)
1214

1315
[Known Issue](#known-issue)
1416

17+
[Q&A](#q&a)
18+
1519
[Todo](#todo)
1620

1721
## Background
@@ -33,7 +37,7 @@ For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building).
3337
|OS|Status|Verified|
3438
|-|-|-|
3539
|Linux|Support|Ubuntu 22.04|
36-
|Windows|Ongoing| |
40+
|Windows|Support|Windows 11|
3741

3842

3943
## Intel GPU
@@ -42,7 +46,7 @@ For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building).
4246
|-|-|-|
4347
|Intel Data Center Max Series| Support| Max 1550|
4448
|Intel Data Center Flex Series| Support| Flex 170|
45-
|Intel Arc Series| Support| Arc 770|
49+
|Intel Arc Series| Support| Arc 770, 730M|
4650
|Intel built-in Arc GPU| Support| built-in Arc GPU in Meteor Lake|
4751
|Intel iGPU| Support| iGPU in i5-1250P, i7-1165G7|
4852

@@ -131,6 +135,7 @@ cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
131135
#build all binary
132136
cmake --build . --config Release -v
133137
138+
cd ..
134139
```
135140

136141
or
@@ -195,7 +200,7 @@ GGML_SYCL_DEVICE=0 ./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "Building
195200
or run by script:
196201

197202
```
198-
./examples/sycl/run_llama2.sh
203+
./examples/sycl/run-llama2.sh
199204
```
200205

201206
Note:
@@ -205,11 +210,175 @@ Note:
205210

206211
5. Check the device ID in output
207212

208-
Like
213+
Like:
209214
```
210215
Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
211216
```
212217

218+
## Windows
219+
220+
### Setup Environment
221+
222+
1. Install Intel GPU driver.
223+
224+
Please install Intel GPU driver by official guide: [Install GPU Drivers](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/arc/software/drivers.html).
225+
226+
2. Install Intel® oneAPI Base toolkit.
227+
228+
a. Please follow the procedure in [Get the Intel® oneAPI Base Toolkit ](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html).
229+
230+
Recommend to install to default folder: **/opt/intel/oneapi**.
231+
232+
Following guide uses the default folder as example. If you use other folder, please modify the following guide info with your folder.
233+
234+
b. Enable oneAPI running environment:
235+
236+
- In Search, input 'oneAPI'.
237+
238+
Search & open "Intel oneAPI command prompt for Intel 64 for Visual Studio 2022"
239+
240+
- In Run:
241+
242+
In CMD:
243+
```
244+
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64
245+
```
246+
247+
c. Check GPU
248+
249+
In oneAPI command line:
250+
251+
```
252+
sycl-ls
253+
```
254+
255+
There should be one or more level-zero devices. Like **[ext_oneapi_level_zero:gpu:0]**.
256+
257+
Output (example):
258+
```
259+
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
260+
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
261+
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO [31.0.101.5186]
262+
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 1.3 [1.3.28044]
263+
264+
```
265+
266+
3. Install cmake & make
267+
268+
a. Download & install cmake for windows: https://cmake.org/download/
269+
270+
b. Download & install make for windows provided by mingw-w64: https://www.mingw-w64.org/downloads/
271+
272+
273+
### Build locally:
274+
275+
In oneAPI command line window:
276+
277+
```
278+
mkdir -p build
279+
cd build
280+
@call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 --force
281+
282+
:: for FP16
283+
:: faster for long-prompt inference
284+
:: cmake -G "MinGW Makefiles" .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icx -DCMAKE_BUILD_TYPE=Release -DLLAMA_SYCL_F16=ON
285+
286+
:: for FP32
287+
cmake -G "MinGW Makefiles" .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icx -DCMAKE_BUILD_TYPE=Release
288+
289+
290+
:: build example/main only
291+
:: make main
292+
293+
:: build all binary
294+
make -j
295+
cd ..
296+
```
297+
298+
or
299+
300+
```
301+
.\examples\sycl\win-build-sycl.bat
302+
```
303+
304+
Note:
305+
306+
- By default, it will build for all binary files. It will take more time. To reduce the time, we recommend to build for **example/main** only.
307+
308+
### Run
309+
310+
1. Put model file to folder **models**
311+
312+
2. Enable oneAPI running environment
313+
314+
- In Search, input 'oneAPI'.
315+
316+
Search & open "Intel oneAPI command prompt for Intel 64 for Visual Studio 2022"
317+
318+
- In Run:
319+
320+
In CMD:
321+
```
322+
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64
323+
```
324+
325+
3. List device ID
326+
327+
Run without parameter:
328+
329+
```
330+
build\bin\ls-sycl-device.exe
331+
332+
or
333+
334+
build\bin\main.exe
335+
```
336+
337+
Check the ID in startup log, like:
338+
339+
```
340+
found 4 SYCL devices:
341+
Device 0: Intel(R) Arc(TM) A770 Graphics, compute capability 1.3,
342+
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
343+
Device 1: Intel(R) FPGA Emulation Device, compute capability 1.2,
344+
max compute_units 24, max work group size 67108864, max sub group size 64, global mem size 67065057280
345+
Device 2: 13th Gen Intel(R) Core(TM) i7-13700K, compute capability 3.0,
346+
max compute_units 24, max work group size 8192, max sub group size 64, global mem size 67065057280
347+
Device 3: Intel(R) Arc(TM) A770 Graphics, compute capability 3.0,
348+
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
349+
350+
```
351+
352+
|Attribute|Note|
353+
|-|-|
354+
|compute capability 1.3|Level-zero running time, recommended |
355+
|compute capability 3.0|OpenCL running time, slower than level-zero in most cases|
356+
357+
4. Set device ID and execute llama.cpp
358+
359+
Set device ID = 0 by **set GGML_SYCL_DEVICE=0**
360+
361+
```
362+
set GGML_SYCL_DEVICE=0
363+
build\bin\main.exe -m models\llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 33 -s 0
364+
```
365+
or run by script:
366+
367+
```
368+
.\examples\sycl\win-run-llama2.bat
369+
```
370+
371+
Note:
372+
373+
- By default, mmap is used to read model file. In some cases, it leads to the hang issue. Recommend to use parameter **--no-mmap** to disable mmap() to skip this issue.
374+
375+
376+
5. Check the device ID in output
377+
378+
Like:
379+
```
380+
Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
381+
```
213382

214383
## Environment Variable
215384

@@ -220,7 +389,7 @@ Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
220389
|LLAMA_SYCL|ON (mandatory)|Enable build with SYCL code path. <br>For FP32/FP16, LLAMA_SYCL=ON is mandatory.|
221390
|LLAMA_SYCL_F16|ON (optional)|Enable FP16 build with SYCL code path. Faster for long-prompt inference. <br>For FP32, not set it.|
222391
|CMAKE_C_COMPILER|icx|Use icx compiler for SYCL code path|
223-
|CMAKE_CXX_COMPILER|icpx|use icpx for SYCL code path|
392+
|CMAKE_CXX_COMPILER|icpx (Linux), icx (Windows)|use icpx/icx for SYCL code path|
224393

225394
#### Running
226395

@@ -232,18 +401,23 @@ Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
232401

233402
## Known Issue
234403

404+
- Hang during startup
405+
406+
llama.cpp use mmap as default way to read model file and copy to GPU. In some system, memcpy will be abnormal and block.
407+
408+
Solution: add **--no-mmap**.
409+
410+
## Q&A
411+
235412
- Error: `error while loading shared libraries: libsycl.so.7: cannot open shared object file: No such file or directory`.
236413

237414
Miss to enable oneAPI running environment.
238415

239416
Install oneAPI base toolkit and enable it by: `source /opt/intel/oneapi/setvars.sh`.
240417

418+
- In Windows, no result, not error.
241419

242-
- Hang during startup
243-
244-
llama.cpp use mmap as default way to read model file and copy to GPU. In some system, memcpy will be abnormal and block.
245-
246-
Solution: add **--no-mmap**.
420+
Miss to enable oneAPI running environment.
247421

248422
## Todo
249423

common/common.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1521,7 +1521,9 @@ void dump_non_result_info_yaml(FILE * stream, const gpt_params & params, const l
15211521
fprintf(stream, "cpu_has_avx512_vbmi: %s\n", ggml_cpu_has_avx512_vbmi() ? "true" : "false");
15221522
fprintf(stream, "cpu_has_avx512_vnni: %s\n", ggml_cpu_has_avx512_vnni() ? "true" : "false");
15231523
fprintf(stream, "cpu_has_cublas: %s\n", ggml_cpu_has_cublas() ? "true" : "false");
1524+
fprintf(stream, "cpu_has_vulkan: %s\n", ggml_cpu_has_vulkan() ? "true" : "false");
15241525
fprintf(stream, "cpu_has_clblast: %s\n", ggml_cpu_has_clblast() ? "true" : "false");
1526+
fprintf(stream, "cpu_has_kompute: %s\n", ggml_cpu_has_kompute() ? "true" : "false");
15251527
fprintf(stream, "cpu_has_fma: %s\n", ggml_cpu_has_fma() ? "true" : "false");
15261528
fprintf(stream, "cpu_has_gpublas: %s\n", ggml_cpu_has_gpublas() ? "true" : "false");
15271529
fprintf(stream, "cpu_has_neon: %s\n", ggml_cpu_has_neon() ? "true" : "false");

examples/llama-bench/llama-bench.cpp

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -564,6 +564,7 @@ struct test {
564564
static const bool cuda;
565565
static const bool opencl;
566566
static const bool vulkan;
567+
static const bool kompute;
567568
static const bool metal;
568569
static const bool gpu_blas;
569570
static const bool blas;
@@ -648,6 +649,9 @@ struct test {
648649
if (vulkan) {
649650
return "Vulkan";
650651
}
652+
if (kompute) {
653+
return "Kompute";
654+
}
651655
if (metal) {
652656
return "Metal";
653657
}
@@ -663,7 +667,7 @@ struct test {
663667
static const std::vector<std::string> & get_fields() {
664668
static const std::vector<std::string> fields = {
665669
"build_commit", "build_number",
666-
"cuda", "opencl", "vulkan", "metal", "gpu_blas", "blas",
670+
"cuda", "opencl", "vulkan", "kompute", "metal", "gpu_blas", "blas",
667671
"cpu_info", "gpu_info",
668672
"model_filename", "model_type", "model_size", "model_n_params",
669673
"n_batch", "n_threads", "type_k", "type_v",
@@ -687,8 +691,9 @@ struct test {
687691
field == "avg_ns" || field == "stddev_ns") {
688692
return INT;
689693
}
690-
if (field == "cuda" || field == "opencl" || field == "vulkan"|| field == "metal" || field == "gpu_blas" || field == "blas" ||
691-
field == "f16_kv" || field == "no_kv_offload" || field == "mul_mat_q") {
694+
if (field == "cuda" || field == "opencl" || field == "vulkan" || field == "kompute" || field == "metal" ||
695+
field == "gpu_blas" || field == "blas" || field == "f16_kv" || field == "no_kv_offload" ||
696+
field == "mul_mat_q") {
692697
return BOOL;
693698
}
694699
if (field == "avg_ts" || field == "stddev_ts") {
@@ -715,7 +720,8 @@ struct test {
715720
}
716721
std::vector<std::string> values = {
717722
build_commit, std::to_string(build_number),
718-
std::to_string(cuda), std::to_string(opencl), std::to_string(vulkan), std::to_string(metal), std::to_string(gpu_blas), std::to_string(blas),
723+
std::to_string(cuda), std::to_string(opencl), std::to_string(vulkan), std::to_string(vulkan),
724+
std::to_string(metal), std::to_string(gpu_blas), std::to_string(blas),
719725
cpu_info, gpu_info,
720726
model_filename, model_type, std::to_string(model_size), std::to_string(model_n_params),
721727
std::to_string(n_batch), std::to_string(n_threads), ggml_type_name(type_k), ggml_type_name(type_v),
@@ -744,6 +750,7 @@ const int test::build_number = LLAMA_BUILD_NUMBER;
744750
const bool test::cuda = !!ggml_cpu_has_cublas();
745751
const bool test::opencl = !!ggml_cpu_has_clblast();
746752
const bool test::vulkan = !!ggml_cpu_has_vulkan();
753+
const bool test::kompute = !!ggml_cpu_has_kompute();
747754
const bool test::metal = !!ggml_cpu_has_metal();
748755
const bool test::gpu_blas = !!ggml_cpu_has_gpublas();
749756
const bool test::blas = !!ggml_cpu_has_blas();

examples/server/chat.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ chat_completion() {
4848
top_p: 0.9,
4949
n_keep: $n_keep,
5050
n_predict: 256,
51+
cache_prompt: true,
5152
stop: ["\n### Human:"],
5253
stream: true
5354
}')"

0 commit comments

Comments
 (0)