Skip to content

JIT: should be able to remove redundant bounds-checks even if array contents change #6518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JosephTremoulet opened this issue Aug 18, 2016 · 4 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Milestone

Comments

@JosephTremoulet
Copy link
Contributor

JosephTremoulet commented Aug 18, 2016

Consider this program:

using System.Runtime.CompilerServices;

namespace N
{
    public class C
    {
        int[] a;
        int x;

        [MethodImpl(MethodImplOptions.NoInlining)]
        void array_instance(int s, int d1, int d2)
        {
            a[d1] = a[s] + 1;
            a[d2] = a[s];
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        void field(int s, int d1, int d2)
        {
            x = a[s] + 1;
            a[d2] = a[s];
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        static void array_static(int[] a, int s, int d1, int d2)
        {
            a[d1] = a[s] + 1;
            a[d2] = a[s];
        }

        public static int Main(string[] args)
        {
            C c = new C();
            var a = new int[] { 1, 2, 3, 4, 5, 6, 7, 8 };
            c.a = a;
            C.array_static(a, 3, 5, 7);
            c.array_instance(5, 3, 7);
            c.field(5, 3, 7);
            return 100;
        }
    }
}

In array_instance, we currently generate bounds-checks for both ldelems of a[s], even though we CSE the load of array a. The bounds-checks themselves are not CSE candidates (because they have type void, and even if that checked passed they're not in the list of CSE-candidate opcodes), so CSE doesn't remove them. If we could CSE the ldelems wholesale, we'd include the bounds-check in that CSE, as we do in method field, but we can't do that in array_instance because the store to a[d1] might overwrite a[s]. Assertion prop doesn't help either, because the to loads of array a have different conservative value numbers, and assertion prop uses the conservative numbers for fear of breaking safety by removing a redundant bounds-check but redundantly fetching the array (not that the fear is warranted in this case, as CSE removes the redundant load). By contrast, in method array_static, where the array a is an argument, the array portion of the two ldelems has the same conservative value number, so assertion prop removes the second a[s] bounds check even though it has to leave in the second a[s] load due to the possibility that the store to a[d1] overwrote a[s].

This same effect prevents hoisting invariant bounds-checks out of loops where the loop mutates the array. That happens in this code from benchmark JIT/Performance/CodeQuality/BenchI/BubbleSort2's inner loop:

            for (int j = i; j <= Bound; j++) {
                if (x[i] > x[j]) {
                    int temp = x[j];
                    x[j] = x[i];
                    x[i] = temp;
                }
            }

Here, x and i are invariant w.r.t. this loop, but x[i] gets stored to and so the loads of x[i] can't be hoisted out of the loop. The bounds check on the load x[i], however, should be hoistable. Loop hoisting doesn't hoist it because it's a CSE candidate, but even if it did, neither CSE nor assertion prop would remove the check in the loop (because of the same reasons as in array_instance above).

category:cq
theme:bounds-checks
skill-level:expert
cost:medium
impact:small

@russellhadley
Copy link
Contributor

@JosephTremoulet Reading this I'm thinking that we should instrument the compiler to not insert any bounds checks and see what the win is in the benchmarks, that would give us an upper bound on the cost bounds checks and let us prioritize this better.

@briansull
Copy link
Contributor

briansull commented Aug 22, 2016

We already have the ability to generate code that omit all of the array bounds checks:

set ComPlus_JitNoRngChks=1

#ifdef FEATURE_ENABLE_NO_RANGE_CHECKS
RETAIL_CONFIG_DWORD_INFO(PRIVATE_JitNoRangeChks, W("JitNoRngChks"), 0, "If 1, don't generate range checks")
#endif

@JosephTremoulet
Copy link
Contributor Author

Currently we do remove the bounds check on the second a[s] in array_instance (due presumably to dotnet/coreclr#9451). However, we allocate three separate registers for array a, and we redundantly sign-extend s at both a[s] accesses, so the codegen still isn't as good as array_static. To wit, array_static currently looks like this:

G_M9659_IG01:
       4883EC28             sub      rsp, 40

G_M9659_IG02:
       8B4108               mov      eax, dword ptr [rcx+8]
       3BD0                 cmp      edx, eax
       732F                 jae      SHORT G_M9659_IG04
       4863D2               movsxd   rdx, edx
       448B549110           mov      r10d, dword ptr [rcx+4*rdx+16]
       41FFC2               inc      r10d
       443BC0               cmp      r8d, eax
       731F                 jae      SHORT G_M9659_IG04
       4D63C0               movsxd   r8, r8d
       4689548110           mov      dword ptr [rcx+4*r8+16], r10d
       8B549110             mov      edx, dword ptr [rcx+4*rdx+16]
       443BC8               cmp      r9d, eax
       730E                 jae      SHORT G_M9659_IG04
       4963C1               movsxd   rax, r9d
       89548110             mov      dword ptr [rcx+4*rax+16], edx

G_M9659_IG03:
       4883C428             add      rsp, 40
       C3                   ret

G_M9659_IG04:
       E8D376265F           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3

whilst array_instance currently looks like this:

G_M18826_IG01:
       56                   push     rsi
       4883EC20             sub      rsp, 32

G_M18826_IG02:
       488B4108             mov      rax, gword ptr [rcx+8]
       488BC8               mov      rcx, rax
       4C8BD0               mov      r10, rax
       458B5A08             mov      r11d, dword ptr [r10+8]
       413BD3               cmp      edx, r11d
       7336                 jae      SHORT G_M18826_IG04
       4863F2               movsxd   rsi, edx
       458B54B210           mov      r10d, dword ptr [r10+4*rsi+16]
       41FFC2               inc      r10d
       453BC3               cmp      r8d, r11d
       7326                 jae      SHORT G_M18826_IG04
       4D63C0               movsxd   r8, r8d
       4689548110           mov      dword ptr [rcx+4*r8+16], r10d
       488BC8               mov      rcx, rax
       4863D2               movsxd   rdx, edx
       8B449010             mov      eax, dword ptr [rax+4*rdx+16]
       453BCB               cmp      r9d, r11d
       730F                 jae      SHORT G_M18826_IG04
       4963D1               movsxd   rdx, r9d
       89449110             mov      dword ptr [rcx+4*rdx+16], eax

G_M18826_IG03:
       4883C420             add      rsp, 32
       5E                   pop      rsi
       C3                   ret

G_M18826_IG04:
       E85F76275F           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3

@mikedn
Copy link
Contributor

mikedn commented Apr 14, 2017

and we redundantly sign-extend s at both a[s] accesses

Funny, I've just ran into something that looks like the opposite. Dictionary's FindEntry generates something like this:

G_M16822_IG05:
       mov      rcx, gword ptr [rsi+16]
; range check
       cmp      r15d, dword ptr [rcx+8]
       jae      G_M16822_IG13
; partial CSE def (r12) of array element address 
       movsxd   rdx, r15d
       lea      r12, [rdx+2*rdx]
; entry hashCode load
       cmp      dword ptr [rcx+8*r12+32], ebx
       jne      SHORT G_M16822_IG09
       mov      r13, gword ptr [rsi+24]
       mov      rcx, gword ptr [rsi+16]
; range check again!
       cmp      r15d, dword ptr [rcx+8]
       jae      G_M16822_IG13
; CSE use (r12)
       mov      rax, gword ptr [rcx+8*r12+16]

Not sure if it's related, in FindEntry the array doesn't change between the 2 range checks.

In your example AFAIR the reason why the sign extend cast isn't CSEd is that it's not considered worthwhile.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

6 participants