Skip to content

Commit 2a1eb0f

Browse files
s-Nickinfil00p
authored andcommitted
sycl : Overcoming workaround for mmap() allocation on Windows (ggml-org#13482)
* Remove mmap workaround on windows After some testing I found that mmap is supported on windows and for many GPUs on Linux. Therefore I remove the workaround for windows since it is not necessary. * Update llama-bench README SYCL backend introduced a workaround that allows execution of llama-bench also without specifying `--mmp 0` flag
1 parent eddc6ee commit 2a1eb0f

File tree

2 files changed

+7
-10
lines changed

2 files changed

+7
-10
lines changed

ggml/src/ggml-sycl/ggml-sycl.cpp

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -385,16 +385,17 @@ static void ggml_backend_sycl_buffer_set_tensor(ggml_backend_buffer_t buffer,
385385
ggml_backend_sycl_buffer_context * ctx = ( ggml_backend_sycl_buffer_context *)buffer->context;
386386
ggml_sycl_set_device(ctx->device);
387387
auto stream = &(dpct::dev_mgr::instance().get_device(ctx->device).default_queue());
388-
SYCL_CHECK(
389-
CHECK_TRY_ERROR(dpct::dev_mgr::instance().get_device(ctx->device).queues_wait_and_throw()));
388+
SYCL_CHECK(CHECK_TRY_ERROR(dpct::dev_mgr::instance().get_device(ctx->device).queues_wait_and_throw()));
389+
#ifndef _WIN32
390390
// Note: Use host buffer to save the data from mmap(), then copy to device. It's workaround for mmap() issue on PVC GPU.
391391
// This function will be called during load model from disk. Use memory buffer replace dynamic won't save more time and brings potential memory leak risk here.
392-
char* host_buf = (char*)malloc(size);
392+
char * host_buf = (char *) malloc(size);
393393
memcpy(host_buf, data, size);
394-
SYCL_CHECK(
395-
CHECK_TRY_ERROR((*stream).memcpy((char *)tensor->data + offset, host_buf, size)
396-
.wait()));
394+
SYCL_CHECK(CHECK_TRY_ERROR((*stream).memcpy((char *) tensor->data + offset, host_buf, size).wait()));
397395
free(host_buf);
396+
#else
397+
SYCL_CHECK(CHECK_TRY_ERROR((*stream).memcpy((char *) tensor->data + offset, data, size).wait()));
398+
#endif
398399
}
399400
catch (sycl::exception const &exc) {
400401
std::cerr << exc.what() << "Exception caught at file:" << __FILE__

tools/llama-bench/README.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -80,10 +80,6 @@ Using the `-d <n>` option, each test can be run at a specified context depth, pr
8080

8181
For a description of the other options, see the [main example](../main/README.md).
8282

83-
Note:
84-
85-
- When using SYCL backend, there would be hang issue in some cases. Please set `--mmp 0`.
86-
8783
## Examples
8884

8985
### Text generation with different models

0 commit comments

Comments
 (0)