Skip to content

Certain loops do not get recorded in optLoopTable #43713

Closed
@kunalspathak

Description

@kunalspathak

Today, we fail to capture certain loops inside optLoopTable. optLoopTable is a data structure that should ideally contain all the loops so we can make further loop optimizations like loop unrolling, loop cloning or hoist loop code.

Below is the code taken from QuickSortSpan benchmark.

for (i = lo, j = hi, pivot = data[hi]; i < j;)
{
    while (i < j && data[i] <= pivot)
    {
        ++i;
    }
    while (j > i && data[j] >= pivot)
    {
        --j;
    }
    if (i < j)
    {
        temp = data[i];
        data[i] = data[j];
        data[j] = temp;
    }
}

In the assembly code, the 2 inner while loops have this kind of representation and hence do not make it to optLoopTable.

Assembly code
; Assembly listing for method Span.Sorting:TestQuickSortSpan(System.Span`1[Int32])
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments

; Lcl frame size = 56

G_M37982_IG01:              ;; offset=0000H
 00007ff8`a20ebd20        57                   push     rdi
 00007ff8`a20ebd21        56                   push     rsi
 00007ff8`a20ebd22        55                   push     rbp
 00007ff8`a20ebd23        53                   push     rbx
 00007ff8`a20ebd24        4883EC38             sub      rsp, 56
 00007ff8`a20ebd28        33C0                 xor      rax, rax
 00007ff8`a20ebd2a        4889442428           mov      qword ptr [rsp+28H], rax
 00007ff8`a20ebd2f        4889442430           mov      qword ptr [rsp+30H], rax
						;; bbWeight=1    PerfScore 6.50
G_M37982_IG02:              ;; offset=0014H
 00007ff8`a20ebd34        488B31               mov      rsi, bword ptr [rcx]
 00007ff8`a20ebd37        8B7908               mov      edi, dword ptr [rcx+8]
						;; bbWeight=1    PerfScore 4.00
G_M37982_IG03:              ;; offset=001AH
 00007ff8`a20ebd3a        83FF01               cmp      edi, 1
 00007ff8`a20ebd3d        7F09                 jg       SHORT G_M37982_IG05
						;; bbWeight=1    PerfScore 1.25
G_M37982_IG04:              ;; offset=001FH
 00007ff8`a20ebd3f        4883C438             add      rsp, 56
; =========================== 32B boundary ===========================
 00007ff8`a20ebd43        5B                   pop      rbx
 00007ff8`a20ebd44        5D                   pop      rbp
 00007ff8`a20ebd45        5E                   pop      rsi
 00007ff8`a20ebd46        5F                   pop      rdi
 00007ff8`a20ebd47        C3                   ret      
						;; bbWeight=0.50 PerfScore 1.63
G_M37982_IG05:              ;; offset=0028H
 00007ff8`a20ebd48        8D4FFF               lea      ecx, [rdi-1]
 00007ff8`a20ebd4b        33DB                 xor      ebx, ebx
 00007ff8`a20ebd4d        8BD1                 mov      edx, ecx
 00007ff8`a20ebd4f        3BD7                 cmp      edx, edi
 00007ff8`a20ebd51        0F8315010000         jae      G_M37982_IG23
 00007ff8`a20ebd57        4863C2               movsxd   rax, edx
 00007ff8`a20ebd5a        8B0486               mov      eax, dword ptr [rsi+4*rax]
 00007ff8`a20ebd5d        EB60                 jmp      SHORT G_M37982_IG12
						;; bbWeight=0.50 PerfScore 3.25
G_M37982_IG06:              ;; offset=003FH
 00007ff8`a20ebd5f        FFC3                 inc      ebx
; =========================== 32B boundary ===========================
						;; bbWeight=2    PerfScore 0.50
G_M37982_IG07:              ;; offset=0041H
 00007ff8`a20ebd61        3BDA                 cmp      ebx, edx
 00007ff8`a20ebd63        7D15                 jge      SHORT G_M37982_IG10
 00007ff8`a20ebd65        3BDF                 cmp      ebx, edi
 00007ff8`a20ebd67        0F83FF000000         jae      G_M37982_IG23
 00007ff8`a20ebd6d        4C63C3               movsxd   r8, ebx
 00007ff8`a20ebd70        42390486             cmp      dword ptr [rsi+4*r8], eax
 00007ff8`a20ebd74        7EE9                 jle      SHORT G_M37982_IG06
						;; bbWeight=16    PerfScore 92.00
G_M37982_IG08:              ;; offset=0056H
 00007ff8`a20ebd76        EB02                 jmp      SHORT G_M37982_IG10
						;; bbWeight=2    PerfScore 4.00
G_M37982_IG09:              ;; offset=0058H
 00007ff8`a20ebd78        FFCA                 dec      edx
						;; bbWeight=8    PerfScore 2.00
G_M37982_IG10:              ;; offset=005AH
 00007ff8`a20ebd7a        3BD3                 cmp      edx, ebx
 00007ff8`a20ebd7c        7E11                 jle      SHORT G_M37982_IG11
 00007ff8`a20ebd7e        3BD7                 cmp      edx, edi
; =========================== 32B boundary ===========================
 00007ff8`a20ebd80        0F83E6000000         jae      G_M37982_IG23
 00007ff8`a20ebd86        4C63C2               movsxd   r8, edx
 00007ff8`a20ebd89        42390486             cmp      dword ptr [rsi+4*r8], eax
 00007ff8`a20ebd8d        7DE9                 jge      SHORT G_M37982_IG09
						;; bbWeight=16    PerfScore 92.00
G_M37982_IG11:              ;; offset=006FH
 00007ff8`a20ebd8f        3BDA                 cmp      ebx, edx
 00007ff8`a20ebd91        7D2C                 jge      SHORT G_M37982_IG12
 00007ff8`a20ebd93        3BDF                 cmp      ebx, edi
 00007ff8`a20ebd95        0F83D1000000         jae      G_M37982_IG23
 00007ff8`a20ebd9b        4C63C3               movsxd   r8, ebx
 00007ff8`a20ebd9e        468B0486             mov      r8d, dword ptr [rsi+4*r8]
; =========================== 32B boundary ===========================
 00007ff8`a20ebda2        4C63CB               movsxd   r9, ebx
 00007ff8`a20ebda5        3BD7                 cmp      edx, edi
 00007ff8`a20ebda7        0F83BF000000         jae      G_M37982_IG23
 00007ff8`a20ebdad        4C63D2               movsxd   r10, edx
 00007ff8`a20ebdb0        468B1496             mov      r10d, dword ptr [rsi+4*r10]
 00007ff8`a20ebdb4        4689148E             mov      dword ptr [rsi+4*r9], r10d
 00007ff8`a20ebdb8        4C63CA               movsxd   r9, edx
 00007ff8`a20ebdbb        4689048E             mov      dword ptr [rsi+4*r9], r8d
						;; bbWeight=2    PerfScore 21.50
G_M37982_IG12:              ;; offset=009FH
 00007ff8`a20ebdbf        3BDA                 cmp      ebx, edx
; =========================== 32B boundary ===========================
 00007ff8`a20ebdc1        7C9E                 jl       SHORT G_M37982_IG07
						;; bbWeight=4    PerfScore 5.00
G_M37982_IG13:              ;; offset=00A3H
 00007ff8`a20ebdc3        3BD9                 cmp      ebx, ecx
 00007ff8`a20ebdc5        741C                 je       SHORT G_M37982_IG14
 00007ff8`a20ebdc7        3BDF                 cmp      ebx, edi
 00007ff8`a20ebdc9        0F839D000000         jae      G_M37982_IG23
 00007ff8`a20ebdcf        4C63C3               movsxd   r8, ebx
 00007ff8`a20ebdd2        468B0486             mov      r8d, dword ptr [rsi+4*r8]
 00007ff8`a20ebdd6        4863D3               movsxd   rdx, ebx
 00007ff8`a20ebdd9        890496               mov      dword ptr [rsi+4*rdx], eax
 00007ff8`a20ebddc        4863C9               movsxd   rcx, ecx
 00007ff8`a20ebddf        4489048E             mov      dword ptr [rsi+4*rcx], r8d
; =========================== 32B boundary ===========================
						;; bbWeight=0.50 PerfScore 3.63
G_M37982_IG14:              ;; offset=00C3H
 00007ff8`a20ebde3        8BCB                 mov      ecx, ebx
 00007ff8`a20ebde5        8BD7                 mov      edx, edi
 00007ff8`a20ebde7        483BCA               cmp      rcx, rdx
 00007ff8`a20ebdea        777A                 ja       SHORT G_M37982_IG22
						;; bbWeight=0.50 PerfScore 0.88
G_M37982_IG15:              ;; offset=00CCH
 00007ff8`a20ebdec        48B960300EB402020000 mov      rcx, 0x202B40E3060
 00007ff8`a20ebdf6        488B29               mov      rbp, gword ptr [rcx]
 00007ff8`a20ebdf9        488BD5               mov      rdx, rbp
 00007ff8`a20ebdfc        4889542420           mov      gword ptr [rsp+20H], rdx
; =========================== 32B boundary ===========================
 00007ff8`a20ebe01        488BCA               mov      rcx, rdx
 00007ff8`a20ebe04        85DB                 test     ebx, ebx
 00007ff8`a20ebe06        7D08                 jge      SHORT G_M37982_IG17
						;; bbWeight=0.50 PerfScore 2.50
G_M37982_IG16:              ;; offset=00E8H
 00007ff8`a20ebe08        488BD1               mov      rdx, rcx
 00007ff8`a20ebe0b        E8B8FE8DFF           call     System.Diagnostics.Debug:Fail(System.String,System.String)
						;; bbWeight=0.25 PerfScore 0.31
G_M37982_IG17:              ;; offset=00F0H
 00007ff8`a20ebe10        8BCB                 mov      ecx, ebx
 00007ff8`a20ebe12        488D442428           lea      rax, bword ptr [rsp+28H]
 00007ff8`a20ebe17        488930               mov      bword ptr [rax], rsi
 00007ff8`a20ebe1a        894808               mov      dword ptr [rax+8], ecx
 00007ff8`a20ebe1d        488D4C2428           lea      rcx, bword ptr [rsp+28H]
; =========================== 32B boundary ===========================
 00007ff8`a20ebe22        E889CA8FFF           call     Span.Sorting:TestQuickSortSpan(System.Span`1[Int32])
 00007ff8`a20ebe27        FFC3                 inc      ebx
 00007ff8`a20ebe29        3BDF                 cmp      ebx, edi
 00007ff8`a20ebe2b        7739                 ja       SHORT G_M37982_IG22
						;; bbWeight=0.50 PerfScore 2.88
G_M37982_IG18:              ;; offset=010DH
 00007ff8`a20ebe2d        2BFB                 sub      edi, ebx
 00007ff8`a20ebe2f        488B542420           mov      rdx, gword ptr [rsp+20H]
 00007ff8`a20ebe34        85FF                 test     edi, edi
 00007ff8`a20ebe36        7D08                 jge      SHORT G_M37982_IG20
						;; bbWeight=0.50 PerfScore 1.25
G_M37982_IG19:              ;; offset=0118H
 00007ff8`a20ebe38        488BCA               mov      rcx, rdx
 00007ff8`a20ebe3b        E888FE8DFF           call     System.Diagnostics.Debug:Fail(System.String,System.String)
; =========================== 32B boundary ===========================
						;; bbWeight=0.25 PerfScore 0.31
G_M37982_IG20:              ;; offset=0120H
 00007ff8`a20ebe40        4863CB               movsxd   rcx, ebx
 00007ff8`a20ebe43        488D0C8E             lea      rcx, bword ptr [rsi+4*rcx]
 00007ff8`a20ebe47        488D442428           lea      rax, bword ptr [rsp+28H]
 00007ff8`a20ebe4c        488908               mov      bword ptr [rax], rcx
 00007ff8`a20ebe4f        897808               mov      dword ptr [rax+8], edi
 00007ff8`a20ebe52        488D4C2428           lea      rcx, bword ptr [rsp+28H]
 00007ff8`a20ebe57        E854CA8FFF           call     Span.Sorting:TestQuickSortSpan(System.Span`1[Int32])
 00007ff8`a20ebe5c        90                   nop      
						;; bbWeight=0.50 PerfScore 2.50
G_M37982_IG21:              ;; offset=013DH
 00007ff8`a20ebe5d        4883C438             add      rsp, 56
; =========================== 32B boundary ===========================
 00007ff8`a20ebe61        5B                   pop      rbx
 00007ff8`a20ebe62        5D                   pop      rbp
 00007ff8`a20ebe63        5E                   pop      rsi
 00007ff8`a20ebe64        5F                   pop      rdi
 00007ff8`a20ebe65        C3                   ret      
						;; bbWeight=0.50 PerfScore 1.63
G_M37982_IG22:              ;; offset=0146H
 00007ff8`a20ebe66        E83DEB8EFF           call     System.ThrowHelper:ThrowArgumentOutOfRangeException()
 00007ff8`a20ebe6b        CC                   int3     
						;; bbWeight=0    PerfScore 0.00
G_M37982_IG23:              ;; offset=014CH
 00007ff8`a20ebe6c        E8AF87065F           call     CORINFO_HELP_RNGCHKFAIL
 00007ff8`a20ebe71        CC                   int3     
						;; bbWeight=0    PerfScore 0.00

; Total bytes of code 338, prolog size 20, PerfScore 283.30, instruction count 115 (MethodHash=bb696ba1) for method Span.Sorting:TestQuickSortSpan(System.Span`1[Int32])
; ============================================================

Here, none of the 3 loops (outer for and 2 inner while) loops are present in optLoopTable. On investigation, @BruceForstall and I found out that it happens because the entry to the loop is below the bottom of the loop and from the entry it jumps up to the top block, however the bottom's backedge jumps to the first block of the loop.

Here is the pictorial representation. Ideally, we favor below loops. The diagram is taken from optimizer.cpp.

        |
        v
      head
        |
        |  top/first <--+
        |       |       |
        |      ...      |
        |       |       |
        |       v       |
        +---> entry     |
                |       |
               ...      |
                |       |
                v       |
         +-- exit/tail  |
         |      |       |
         |     ...      |
         |      |       |
         |      v       |
         |    bottom ---+
         |
         +------+
                |
                v

However, in this case, the loop is little different and hence we reject it in FindNaturalLoops() because the code FindEntry returns nullptr and hence we don't recognize it as a loop.

        |
        v
      head
        |
        |   +--> first
        |   |      |
        |   |      |
        |   |      v
        |   |     top <---+
        |   |     ...     |
        |   |      |      |
        |   |      v      |
        |   +--- bottom   |
        |          |      |
        |         ...     |
        |          |      |
        |          v      |
        +------> entry ---+
        |
        +------+
                |
                v

We should fix it to identify such loops or at least make an entry to optLoopTable to take advantage of other loop optimizations.

This should be one of the part of #43549.

category:cq
theme:loop-opt
skill-level:expert
cost:large
impact:medium

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions