-
Notifications
You must be signed in to change notification settings - Fork 171
Over generation of 'mov' instruction for some kernels #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks Axel for posting this. I will have someone on our team get to this shortly. |
Hi Axel, I have managed to generate the assembly file using your code.cl file
I'll continue the investigation and keep you posted. |
It is not conclusive that excessive 'mov' instructions are generated after comparing with 'phi' operations in LLVM file. One experiment that I did is to disable loop unroll in IGC, I can see that 'mov's are reduced significantly reduced as expected. |
If I compare the csv I sent you and the code, I can guess which block corresponds to what. In pseudo code: On the generated csv: I counted: Thus the 19 items that are shifted inside a table (which are the 19 unavoidable movs) are somehow moved in a lot of different registers in all these different blocks. Only the movs in block 17 seem neccessary and all the others seem avoidable. |
Hi Axel, can you share sample code that got worked around with CL2.0? |
I'm not able to reproduce any issue I had when removing '-cl-std=CL2.0'. I used to have some codes which would over-generate movs when removing the flag. Either my codes won't reproduce the issue because of the modifications I added since, or because it was fixed in the driver. I will bisect trying to find that answer. |
I have managed to reproduce the issue by adapting older version of my code. For some reason my more recent code won't generate the issue, I will send the code to paigeale. |
Hello Axel. Thank you for the simple reproducer. I have identified the extra mov's you have reported and have identified the source of these mov's. During our DeSSA pass we evaluate phi instruction using congruent classes to determine if we can potentially coalesce the operands of the phi. In the case you sent me we are seeing a lot of interference when trying to combine the phi operands thus you are seeing these extra mov's being created in the asm. I am working on a way to improve our DeSSA pass to better construct these congruent classes . Thank you for your patience. |
Hello Axel. After investigating further into our DeSSA coalescing algorithm there is not much we can do on this case to improve the overall mov count. Phi-Elimination is np complete which means we cannot guarantee the best outcome for each individual program, but our algorithm works well in most cases. In your case what is unique is the phi looping that is being done, the phi that is chosen to be the lead node of the congruence class ends up interfering with many different phi's thus we end up isolating each of the phi's which creates these additional movs. We cannot handle the additional mov's after the fact for we do not do any global coalescing due to the structure of our compiler (not having an intermediate representation between llvm and virtual asm). I would advise changing the kernel code at this point in time. If you need any recommendations on what to change feel free to contact me via email. Thank you again for posting this issue, we look forward to hearing more from you! |
Co-authored-by: Jacek Jankowski <[email protected]>
Co-authored-by: Jacek Jankowski <[email protected]>
Hi,
For some kernels, a lot of useless mov operations are generated (usually after an if conditional). The mov operations can be avoided by just writing to the correct registers in the if condition or equivalent.
I noticed specifying 'cl-std=CL2.0' helps workaround the issue in some cases, however not in all cases.
I've sent an example by mail to Alexander Paige, but I open this bug report to keep track of the issue.
The text was updated successfully, but these errors were encountered: