-
Notifications
You must be signed in to change notification settings - Fork 171
pragma unroll not always respected #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello Axel, We have a fix for this in our internal code review. With this fix the user will be able force unroll (with #pragma unroll ) even in the case where a spill occurs. I will update this thread when the fix comes in. |
Thank you for the fix. When the fix is pushed, I'll close the issue. |
Hello Axel, An update on this issue filed, for it has been stale for some time. We have seen in some internal and external workloads performance issues when changing how our compiler handles pragma unroll. It has been adopted the assumption that when we spill we disregard the pragma unrolls because of memory space concerns. Thus there will be no change in behavior of this pragma, but if this is still a concern we can possibly work on another solution. Please feel free to reopen issue if you would like to further discuss this. Thanks, |
Given the low register count of intel gpus, some very expensive shaders have to spill some values. #pragma unroll is totally neccessary when accessing __private tables to have them stored in registers. It is unfortunate all the pragma unrolls stop working when hitting the first spill, which causes __private tables in a loop to get stored in global memory and destroy performance. I don't plan to do manual spilling to prevent the compiler from detecting spills and will just recommand in my code's README to try another vendor if the case is hit. |
With recent version of the drivers, our code cannot run anymore at all (timeout), even with trying to use the lowest settings. I suspect this issue again, combined with probably some driver changes that increased the register usage and made the problem easier to trigger. Would it be possible to consider not disabling loop unrolling when the driver detects that spilling is increased else ? Since the unrolling doesn't happen, the __private tables are stored in memory, and should be counted as spilled registers. |
Going to take another stab at this. I have another idea to try to honor the unroll pragmas |
Hello Axel, Could you send me (email in profile) the current kernel you are compiling that contains the pragma unrolls. I want to make sure my solution is working for your case. Thanks, |
…rnel by adding a new pass titled "DisableLoopUnrollOnRetry". This fixes issue #16 Change-Id: Iac50a17a08629e7d0324b0574fd9f4bef295ac84
Thanks, I will try. I had sent you my code, I guess based on your reply it must have been in the spam box... |
Hey Axel, I tried it out on your code and it was a success for me. It honored the existing pragma unrolls. Just wanted you to try it out in case i missed anything |
Any update on checking if the new commit above satisfies your request? |
Hi, I couldn't test yet as I am at a conference this week. I will test when I am back. |
Hello @axeldavy, I am going to close this issue. If you have any issues when testing it please reopen and add a comment, or contact me via my email in my profile. Thanks, |
I'm working on some code, and a small change triggered the shader to take much more time.
I found out adding '-cl-std=CL2.0' to the build options would solve the issue.
I looked at the difference in shader generation with and without the option and found that some loops were no longer unrolled (even with my #pragma unroll), which caused some __private tables to be stored in memory instead of registers.
This is not the first time I notice '-cl-std=CL2.0' generates better code.
I use the driver version 18.26.10987
The text was updated successfully, but these errors were encountered: