Over generation of 'mov' instruction for some kernels #18

axeldavy · 2018-07-16T23:39:49Z

Hi,

For some kernels, a lot of useless mov operations are generated (usually after an if conditional). The mov operations can be avoided by just writing to the correct registers in the if condition or equivalent.
I noticed specifying 'cl-std=CL2.0' helps workaround the issue in some cases, however not in all cases.
I've sent an example by mail to Alexander Paige, but I open this bug report to keep track of the issue.

paigeale · 2018-07-20T17:02:44Z

Thanks Axel for posting this. I will have someone on our team get to this shortly.

iwwu · 2018-08-08T18:37:44Z

Hi Axel,

I have managed to generate the assembly file using your code.cl file

My .asm file matches your .csv file visually. I specifically compared blocks 1 to 17, and they match with your description. Block 14, 15, 16 contain movs mostly.
I also confirmed that –cl-std=CL2.0 doesn’t alter the assembly output. You mentioned that CL2.0 worked around for some kernels, but not with this sample code.

I'll continue the investigation and keep you posted.

iwwu · 2018-08-15T00:21:07Z

It is not conclusive that excessive 'mov' instructions are generated after comparing with 'phi' operations in LLVM file. One experiment that I did is to disable loop unroll in IGC, I can see that 'mov's are reduced significantly reduced as expected.

axeldavy · 2018-08-15T08:08:17Z

If I compare the csv I sent you and the code, I can guess which block corresponds to what.

In pseudo code:
1: loop on dx
2: if{}
3: loop on dy
4: if{} with 19 unavoidable mov operations
5: barrier
6: if{}
7: barrier
8: if {if {}}
9: end of loop dy
10: end of loop dx

On the generated csv:
1: block 9 (for the init) then 10
2: blocks 11, 12, 13
3: block 14 (for the init) then 15
4: end of block 15, blocks 16, 17, 18, 19
5: middle of block 19
6: end of block 19, block 20 and block 21
7: block 22
8: blocks 23, 24, 25 and 26
9: blocks 27 28 and 29
10: blocks 30 31 and 32

I counted:
18 movs: block 19
19 movs: blocks 10, 11, 12, 15, 16, 28, 31
20 movs: block 17
21 movs: block 14

Thus the 19 items that are shifted inside a table (which are the 19 unavoidable movs) are somehow moved in a lot of different registers in all these different blocks. Only the movs in block 17 seem neccessary and all the others seem avoidable.
If you disable loop unrolling, the table will stop being stored in registers (it is declared private) and thus will remove the movs, but for performance it should be stored in registers.

iwwu · 2018-08-15T23:52:56Z

Hi Axel, can you share sample code that got worked around with CL2.0?

axeldavy · 2018-08-16T09:01:33Z

I'm not able to reproduce any issue I had when removing '-cl-std=CL2.0'. I used to have some codes which would over-generate movs when removing the flag.

Either my codes won't reproduce the issue because of the modifications I added since, or because it was fixed in the driver. I will bisect trying to find that answer.

axeldavy · 2018-08-16T09:27:13Z

I have managed to reproduce the issue by adapting older version of my code. For some reason my more recent code won't generate the issue, I will send the code to paigeale.

paigeale · 2018-08-29T18:06:14Z

Hello Axel. Thank you for the simple reproducer. I have identified the extra mov's you have reported and have identified the source of these mov's. During our DeSSA pass we evaluate phi instruction using congruent classes to determine if we can potentially coalesce the operands of the phi. In the case you sent me we are seeing a lot of interference when trying to combine the phi operands thus you are seeing these extra mov's being created in the asm. I am working on a way to improve our DeSSA pass to better construct these congruent classes . Thank you for your patience.

paigeale · 2018-09-20T23:15:46Z

Hello Axel. After investigating further into our DeSSA coalescing algorithm there is not much we can do on this case to improve the overall mov count. Phi-Elimination is np complete which means we cannot guarantee the best outcome for each individual program, but our algorithm works well in most cases. In your case what is unique is the phi looping that is being done, the phi that is chosen to be the lead node of the congruence class ends up interfering with many different phi's thus we end up isolating each of the phi's which creates these additional movs. We cannot handle the additional mov's after the fact for we do not do any global coalescing due to the structure of our compiler (not having an intermediate representation between llvm and virtual asm). I would advise changing the kernel code at this point in time. If you need any recommendations on what to change feel free to contact me via email. Thank you again for posting this issue, we look forward to hearing more from you!

Co-authored-by: Jacek Jankowski <[email protected]>

paigeale closed this as completed Sep 25, 2018

VPG-SWE-Github pushed a commit that referenced this issue Nov 2, 2020

IMF LA open-sourcing. FP64 erfc. (#18)

60d0715

Co-authored-by: Jacek Jankowski <[email protected]>

VPG-SWE-Github pushed a commit that referenced this issue Nov 3, 2020

IMF LA open-sourcing. FP64 erfc. (#18)

4d782ed

Co-authored-by: Jacek Jankowski <[email protected]>

VPG-SWE-Github pushed a commit that referenced this issue Nov 3, 2020

IMF LA open-sourcing. FP64 erfc. (#18)

434fadf

VPG-SWE-Github pushed a commit that referenced this issue Nov 3, 2020

IMF LA open-sourcing. FP64 erfc. (#18)

1dfa352

VPG-SWE-Github pushed a commit that referenced this issue Nov 3, 2020

IMF LA open-sourcing. FP64 erfc. (#18)

465ee8a

VPG-SWE-Github pushed a commit that referenced this issue Nov 5, 2020

IMF LA open-sourcing. FP64 erfc. (#18)

cc21d75

haonanya mentioned this issue Jan 25, 2022

Compiler crashing on my OpenCL kernel #207

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Over generation of 'mov' instruction for some kernels #18

Over generation of 'mov' instruction for some kernels #18

axeldavy commented Jul 16, 2018

paigeale commented Jul 20, 2018

iwwu commented Aug 8, 2018

iwwu commented Aug 15, 2018

axeldavy commented Aug 15, 2018

iwwu commented Aug 15, 2018

axeldavy commented Aug 16, 2018

axeldavy commented Aug 16, 2018

paigeale commented Aug 29, 2018

paigeale commented Sep 20, 2018

Over generation of 'mov' instruction for some kernels #18

Over generation of 'mov' instruction for some kernels #18

Comments

axeldavy commented Jul 16, 2018

paigeale commented Jul 20, 2018

iwwu commented Aug 8, 2018

iwwu commented Aug 15, 2018

axeldavy commented Aug 15, 2018

iwwu commented Aug 15, 2018

axeldavy commented Aug 16, 2018

axeldavy commented Aug 16, 2018

paigeale commented Aug 29, 2018

paigeale commented Sep 20, 2018