-
Notifications
You must be signed in to change notification settings - Fork 11.9k
sycl : Overcoming workaround for mmap() allocation on Windows #13482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@s-Nick Could you clear other code change in this PR? |
0e1009f
to
083f56b
Compare
All wait() in SYCL backend have been confirmed with the value. |
Thank your for your review @NeoZhangJianyu. I modified the description adding many logs of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not confident we can remove that many waits unfortunately. Hopefully reverting them will not add back the waits that you saw being removed in the models used?
I have tested every wait() when I handled an issue before. |
After some testing I found that mmap is supported on windows and for many GPUs on Linux. Therefore I remove the workaround for windows since it is not necessary.
SYCL backend introduced a workaround that allows execution of llama-bench also without specifying `--mmp 0` flag
88252ac
to
a2afcb3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, you can edit the PR title since it's not removing waits anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank your work to make code better!
…rg#13482) * Remove mmap workaround on windows After some testing I found that mmap is supported on windows and for many GPUs on Linux. Therefore I remove the workaround for windows since it is not necessary. * Update llama-bench README SYCL backend introduced a workaround that allows execution of llama-bench also without specifying `--mmp 0` flag
This PR removes the usage of a workaround for mmap bug on some Intel GPUs on Linux. The bug is not present on Windows, so there is no meaning of having it in place.
This causes a small split in the codebase according to the OS in use, but it shows good performance improvements.
The work introduced here is based on #13109
N.B All numbers assessed with
GGML_SYCL_DISABLE_OPT=0
Lunar Lake's performance (this PR)
build: 0e1009f (5334)
Lunar Lake's performance (#13109)
build: f7e7d2a (5331)
Battlemage(B580) performance (this PR)
build: 0e1009f (5334)
Battlemage(B580) performance(#13109 )
build: f7e7d2a (5331)
LOG for different GPUs on Linux
In this section there are many logs about this patch working on Linux without affecting performance and or correctness.
Lunar Lake
lnl-test.txt
lnl_bench.txt
master_lnl.txt
Battlemage B580
bmg-test.txt
bmg_bench.txt
master_bmg.txt
PVC
pvc-test.txt
pvc_bench.txt
master_pvc.txt
ARC A770
arc-test.txt
arc_bench.txt
master_arc.txt
llama-cli output
bmg_cli_output.txt
lnl_cli_output.txt
pvc_cli_output.txt
arc_cli_output.txt