-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[CodeGen][AArch64] Set min jump table entries to 13 for AArch64 targets #71166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There are some workloads that are negatively impacted by using jump tables when the number of entries is small. The SPEC2017 perlbench benchmark is one example of this, where increasing the threshold to around 13 gives a ~1.5% improvement on neoverse-v1. I chose the minimum threshold based on empirical evidence rather than science, and just manually increased the threshold until I got the best performance without impacting other workloads. For neoverse-v1 I saw around ~0.2% improvement in the SPEC2017 integer geomean, and no overall change for neoverse-n1. If we find issues with this threshold later on we can always revisit this. The most significant SPEC2017 score changes on neoverse-v1 were: 500.perlbench_r: +1.6% 520.omnetpp_r: +0.6% and the rest saw changes < 0.5%. I updated CodeGen/AArch64/min-jump-table.ll to reflect the new threshold. For most of the affected tests I manually set the min number of entries back to 4 on the RUN line because the tests seem to rely upon this behaviour.
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-llvm-globalisel Author: David Sherwood (david-arm) ChangesThere are some workloads that are negatively impacted by using jump tables when the number of entries is small. The SPEC2017 perlbench benchmark is one example of this, where increasing the threshold to around 13 gives a ~1.5% improvement on neoverse-v1. I chose the minimum threshold based on empirical evidence rather than science, and just manually increased the threshold until I got the best performance without impacting other workloads. For neoverse-v1 I saw around ~0.2% improvement in the SPEC2017 integer geomean, and no overall change for neoverse-n1. If we find issues with this threshold later on we can always revisit this. The most significant SPEC2017 score changes on neoverse-v1 were: 500.perlbench_r: +1.6% and the rest saw changes < 0.5%. I updated CodeGen/AArch64/min-jump-table.ll to reflect the new threshold. For most of the affected tests I manually set the min number of entries back to 4 on the RUN line because the tests seem to rely upon this behaviour. Patch is 23.65 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/71166.diff 17 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 291f0c8c5d991c6..002b95c68272f5c 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -144,6 +144,10 @@ static cl::opt<bool> EnableExtToTBL("aarch64-enable-ext-to-tbl", cl::Hidden,
static cl::opt<unsigned> MaxXors("aarch64-max-xors", cl::init(16), cl::Hidden,
cl::desc("Maximum of xors"));
+static cl::opt<unsigned> AArch64MinimumJumpTableEntries(
+ "aarch64-min-jump-table-entries", cl::init(13), cl::Hidden,
+ cl::desc("Set minimum number of entries to use a jump table on AArch64"));
+
/// Value type used for condition codes.
static const MVT MVT_CC = MVT::i32;
@@ -1653,6 +1657,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
PredictableSelectIsExpensive = Subtarget->predictableSelectIsExpensive();
IsStrictFPEnabled = true;
+
+ setMinimumJumpTableEntries(AArch64MinimumJumpTableEntries);
}
void AArch64TargetLowering::addTypeForNEON(MVT VT) {
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-switch.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-switch.ll
index 4e4297b7a5e22f8..9371edd439fe49c 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-switch.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-switch.ll
@@ -1,5 +1,5 @@
; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
-; RUN: llc -global-isel -mtriple aarch64 -O0 -aarch64-enable-atomic-cfg-tidy=0 -stop-after=irtranslator -simplify-mir -verify-machineinstrs %s -o - 2>&1 | FileCheck %s
+; RUN: llc -global-isel -mtriple aarch64 -aarch64-min-jump-table-entries=4 -O0 -aarch64-enable-atomic-cfg-tidy=0 -stop-after=irtranslator -simplify-mir -verify-machineinstrs %s -o - 2>&1 | FileCheck %s
define i32 @switch(i32 %argc) {
; CHECK-LABEL: name: switch
diff --git a/llvm/test/CodeGen/AArch64/arm64-jumptable.ll b/llvm/test/CodeGen/AArch64/arm64-jumptable.ll
index d4ac9e72b28ff62..7d9adf92a6a95c5 100644
--- a/llvm/test/CodeGen/AArch64/arm64-jumptable.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-jumptable.ll
@@ -1,5 +1,5 @@
-; RUN: llc -mtriple=arm64-apple-ios < %s | FileCheck %s
-; RUN: llc -mtriple=arm64-linux-gnu < %s | FileCheck %s --check-prefix=CHECK-LINUX
+; RUN: llc -mtriple=arm64-apple-ios -aarch64-min-jump-table-entries=4 < %s | FileCheck %s
+; RUN: llc -mtriple=arm64-linux-gnu -aarch64-min-jump-table-entries=4 < %s | FileCheck %s --check-prefix=CHECK-LINUX
; <rdar://11417675>
define void @sum(i32 %a, ptr %to, i32 %c) {
diff --git a/llvm/test/CodeGen/AArch64/bti-branch-relaxation.ll b/llvm/test/CodeGen/AArch64/bti-branch-relaxation.ll
index d9ed54fb86bf151..e3e7f42526f785c 100644
--- a/llvm/test/CodeGen/AArch64/bti-branch-relaxation.ll
+++ b/llvm/test/CodeGen/AArch64/bti-branch-relaxation.ll
@@ -1,4 +1,4 @@
-; RUN: llc %s -o - | FileCheck %s
+; RUN: llc %s -aarch64-min-jump-table-entries=4 -o - | FileCheck %s
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64"
diff --git a/llvm/test/CodeGen/AArch64/implicit-def-subreg-to-reg-regression.ll b/llvm/test/CodeGen/AArch64/implicit-def-subreg-to-reg-regression.ll
index 73b180dc4ae765f..0f208f8ed905241 100644
--- a/llvm/test/CodeGen/AArch64/implicit-def-subreg-to-reg-regression.ll
+++ b/llvm/test/CodeGen/AArch64/implicit-def-subreg-to-reg-regression.ll
@@ -1,5 +1,5 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
-; RUN: llc -mtriple=arm64-apple-ios < %s | FileCheck %s
+; RUN: llc -aarch64-min-jump-table-entries=4 -mtriple=arm64-apple-ios < %s | FileCheck %s
; Check there's no assert in spilling from implicit-def operands on an
; IMPLICIT_DEF.
diff --git a/llvm/test/CodeGen/AArch64/jump-table-32.ll b/llvm/test/CodeGen/AArch64/jump-table-32.ll
index d8572e901af29ec..e65813de8943e3b 100644
--- a/llvm/test/CodeGen/AArch64/jump-table-32.ll
+++ b/llvm/test/CodeGen/AArch64/jump-table-32.ll
@@ -1,4 +1,4 @@
-; RUN: llc -verify-machineinstrs -o - %s -mtriple=arm64_32-apple-ios7.0 -aarch64-enable-atomic-cfg-tidy=0 | FileCheck %s
+; RUN: llc -verify-machineinstrs -o - %s -aarch64-min-jump-table-entries=4 -mtriple=arm64_32-apple-ios7.0 -aarch64-enable-atomic-cfg-tidy=0 | FileCheck %s
define i32 @test_jumptable(i32 %in) {
; CHECK: test_jumptable
diff --git a/llvm/test/CodeGen/AArch64/jump-table-exynos.ll b/llvm/test/CodeGen/AArch64/jump-table-exynos.ll
index b5b400ecfbffcac..61b0df5de2af3a4 100644
--- a/llvm/test/CodeGen/AArch64/jump-table-exynos.ll
+++ b/llvm/test/CodeGen/AArch64/jump-table-exynos.ll
@@ -1,5 +1,5 @@
-; RUN: llc -o - %s -mtriple=aarch64-none-linux-gnu -mattr=+force-32bit-jump-tables -aarch64-enable-atomic-cfg-tidy=0 | FileCheck %s
-; RUN: llc -o - %s -mtriple=aarch64-none-linux-gnu -mcpu=exynos-m3 -aarch64-enable-atomic-cfg-tidy=0 | FileCheck %s
+; RUN: llc -o - %s -mtriple=aarch64-none-linux-gnu -mattr=+force-32bit-jump-tables -aarch64-min-jump-table-entries=4 -aarch64-enable-atomic-cfg-tidy=0 | FileCheck %s
+; RUN: llc -o - %s -mtriple=aarch64-none-linux-gnu -mcpu=exynos-m3 -aarch64-min-jump-table-entries=4 -aarch64-enable-atomic-cfg-tidy=0 | FileCheck %s
; Exynos doesn't want jump tables to be compressed for now.
diff --git a/llvm/test/CodeGen/AArch64/jump-table.ll b/llvm/test/CodeGen/AArch64/jump-table.ll
index 6c32444657542db..bb6c74b3b9df6c7 100644
--- a/llvm/test/CodeGen/AArch64/jump-table.ll
+++ b/llvm/test/CodeGen/AArch64/jump-table.ll
@@ -1,9 +1,9 @@
-; RUN: llc -no-integrated-as -verify-machineinstrs -o - %s -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 | FileCheck %s
-; RUN: llc -no-integrated-as -code-model=large -verify-machineinstrs -o - %s -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 | FileCheck --check-prefix=CHECK-LARGE %s
-; RUN: llc -no-integrated-as -code-model=large -relocation-model=pic -o - %s -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 | FileCheck --check-prefix=CHECK-PIC %s
-; RUN: llc -no-integrated-as -mtriple=aarch64-none-linux-gnu -verify-machineinstrs -relocation-model=pic -aarch64-enable-atomic-cfg-tidy=0 -o - %s | FileCheck --check-prefix=CHECK-PIC %s
-; RUN: llc -no-integrated-as -verify-machineinstrs -o - %s -mtriple=arm64-apple-ios -aarch64-enable-atomic-cfg-tidy=0 | FileCheck --check-prefix=CHECK-IOS %s
-; RUN: llc -no-integrated-as -code-model=tiny -verify-machineinstrs -o - %s -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 | FileCheck --check-prefix=CHECK-TINY %s
+; RUN: llc -no-integrated-as -verify-machineinstrs -o - %s -aarch64-min-jump-table-entries=4 -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 | FileCheck %s
+; RUN: llc -no-integrated-as -code-model=large -verify-machineinstrs -o - %s -aarch64-min-jump-table-entries=4 -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 | FileCheck --check-prefix=CHECK-LARGE %s
+; RUN: llc -no-integrated-as -code-model=large -relocation-model=pic -o - %s -aarch64-min-jump-table-entries=4 -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 | FileCheck --check-prefix=CHECK-PIC %s
+; RUN: llc -no-integrated-as -mtriple=aarch64-none-linux-gnu -verify-machineinstrs -relocation-model=pic -aarch64-min-jump-table-entries=4 -aarch64-enable-atomic-cfg-tidy=0 -o - %s | FileCheck --check-prefix=CHECK-PIC %s
+; RUN: llc -no-integrated-as -verify-machineinstrs -o - %s -mtriple=arm64-apple-ios -aarch64-min-jump-table-entries=4 -aarch64-enable-atomic-cfg-tidy=0 | FileCheck --check-prefix=CHECK-IOS %s
+; RUN: llc -no-integrated-as -code-model=tiny -verify-machineinstrs -o - %s -aarch64-min-jump-table-entries=4 -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 | FileCheck --check-prefix=CHECK-TINY %s
define i32 @test_jumptable(i32 %in) {
; CHECK: test_jumptable
diff --git a/llvm/test/CodeGen/AArch64/max-jump-table.ll b/llvm/test/CodeGen/AArch64/max-jump-table.ll
index d01924a9a54272e..1268f5e6167ad8d 100644
--- a/llvm/test/CodeGen/AArch64/max-jump-table.ll
+++ b/llvm/test/CodeGen/AArch64/max-jump-table.ll
@@ -1,8 +1,8 @@
-; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK0 < %t
-; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -max-jump-table-size=4 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK4 < %t
-; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -max-jump-table-size=8 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK8 < %t
-; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -max-jump-table-size=16 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK16 < %t
-; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -mcpu=exynos-m3 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECKM3 < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -aarch64-min-jump-table-entries=4 -jump-table-density=40 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK0 < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -aarch64-min-jump-table-entries=4 -jump-table-density=40 -max-jump-table-size=4 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK4 < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -aarch64-min-jump-table-entries=4 -jump-table-density=40 -max-jump-table-size=8 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK8 < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -aarch64-min-jump-table-entries=4 -jump-table-density=40 -max-jump-table-size=16 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK16 < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -aarch64-min-jump-table-entries=4 -jump-table-density=40 -mcpu=exynos-m3 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECKM3 < %t
declare void @ext(i32, i32)
diff --git a/llvm/test/CodeGen/AArch64/min-jump-table.ll b/llvm/test/CodeGen/AArch64/min-jump-table.ll
index bf6bddf7b9c4f73..3b3f79746de0ab7 100644
--- a/llvm/test/CodeGen/AArch64/min-jump-table.ll
+++ b/llvm/test/CodeGen/AArch64/min-jump-table.ll
@@ -1,7 +1,9 @@
-; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -min-jump-table-entries=0 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK0 < %t
-; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -min-jump-table-entries=2 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK2 < %t
-; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -min-jump-table-entries=4 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK4 < %t
-; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -min-jump-table-entries=8 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK8 < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -aarch64-min-jump-table-entries=0 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK0 < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -aarch64-min-jump-table-entries=2 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK2 < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -aarch64-min-jump-table-entries=4 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK4 < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -aarch64-min-jump-table-entries=8 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK8 < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -aarch64-min-jump-table-entries=12 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK12 < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK-DEFAULT < %t
declare void @ext(i32, i32)
@@ -16,6 +18,8 @@ entry:
; CHECK2-NEXT: Jump Tables:
; CHECK4-NOT: {{^}}Jump Tables:
; CHECK8-NOT: {{^}}Jump Tables:
+; CHECK12-NOT: {{^}}Jump Tables:
+; CHECK-DEFAULT-NOT: {{^}}Jump Tables:
bb1: tail call void @ext(i32 1, i32 0) br label %return
bb2: tail call void @ext(i32 2, i32 2) br label %return
@@ -36,6 +40,8 @@ entry:
; CHECK2-NEXT: Jump Tables:
; CHECK4-NEXT: Jump Tables:
; CHECK8-NOT: {{^}}Jump Tables:
+; CHECK12-NOT: {{^}}Jump Tables:
+; CHECK-DEFAULT-NOT: {{^}}Jump Tables:
bb1: tail call void @ext(i32 1, i32 0) br label %return
bb2: tail call void @ext(i32 3, i32 2) br label %return
@@ -58,6 +64,83 @@ entry:
i32 9, label %bb8
]
; CHECK-LABEL: function jt8:
+; CHECK0-NEXT: Jump Tables:
+; CHECK2-NEXT: Jump Tables:
+; CHECK4-NEXT: Jump Tables:
+; CHECK8-NEXT: Jump Tables:
+; CHECK12-NOT: Jump Tables:
+; CHECK-DEFAULT-NOT: {{^}}Jump Tables:
+
+bb1: tail call void @ext(i32 1, i32 0) br label %return
+bb2: tail call void @ext(i32 2, i32 2) br label %return
+bb3: tail call void @ext(i32 3, i32 4) br label %return
+bb4: tail call void @ext(i32 4, i32 6) br label %return
+bb5: tail call void @ext(i32 5, i32 8) br label %return
+bb6: tail call void @ext(i32 6, i32 10) br label %return
+bb7: tail call void @ext(i32 7, i32 12) br label %return
+bb8: tail call void @ext(i32 8, i32 14) br label %return
+
+return: ret i32 %b
+}
+
+define i32 @jt12(i32 %a, i32 %b) {
+entry:
+ switch i32 %a, label %return [
+ i32 1, label %bb1
+ i32 2, label %bb2
+ i32 3, label %bb3
+ i32 4, label %bb4
+ i32 5, label %bb5
+ i32 6, label %bb6
+ i32 7, label %bb7
+ i32 8, label %bb8
+ i32 9, label %bb9
+ i32 10, label %bb10
+ i32 11, label %bb11
+ i32 12, label %bb12
+ ]
+; CHECK-LABEL: function jt12:
+; CHECK0-NEXT: Jump Tables:
+; CHECK2-NEXT: Jump Tables:
+; CHECK4-NEXT: Jump Tables:
+; CHECK8-NEXT: Jump Tables:
+; CHECK12-NEXT: Jump Tables:
+; CHECK-DEFAULT-NOT: {{^}}Jump Tables:
+
+bb1: tail call void @ext(i32 1, i32 0) br label %return
+bb2: tail call void @ext(i32 2, i32 2) br label %return
+bb3: tail call void @ext(i32 3, i32 4) br label %return
+bb4: tail call void @ext(i32 4, i32 6) br label %return
+bb5: tail call void @ext(i32 5, i32 8) br label %return
+bb6: tail call void @ext(i32 6, i32 10) br label %return
+bb7: tail call void @ext(i32 7, i32 12) br label %return
+bb8: tail call void @ext(i32 8, i32 14) br label %return
+bb9: tail call void @ext(i32 9, i32 16) br label %return
+bb10: tail call void @ext(i32 10, i32 18) br label %return
+bb11: tail call void @ext(i32 11, i32 20) br label %return
+bb12: tail call void @ext(i32 12, i32 22) br label %return
+
+return: ret i32 %b
+}
+
+define i32 @jt13(i32 %a, i32 %b) {
+entry:
+ switch i32 %a, label %return [
+ i32 1, label %bb1
+ i32 2, label %bb2
+ i32 3, label %bb3
+ i32 4, label %bb4
+ i32 5, label %bb5
+ i32 6, label %bb6
+ i32 7, label %bb7
+ i32 8, label %bb8
+ i32 9, label %bb9
+ i32 10, label %bb10
+ i32 11, label %bb11
+ i32 12, label %bb12
+ i32 13, label %bb13
+ ]
+; CHECK-LABEL: function jt13:
; CHECK-NEXT: Jump Tables:
bb1: tail call void @ext(i32 1, i32 0) br label %return
@@ -68,6 +151,11 @@ bb5: tail call void @ext(i32 5, i32 8) br label %return
bb6: tail call void @ext(i32 6, i32 10) br label %return
bb7: tail call void @ext(i32 7, i32 12) br label %return
bb8: tail call void @ext(i32 8, i32 14) br label %return
+bb9: tail call void @ext(i32 9, i32 16) br label %return
+bb10: tail call void @ext(i32 10, i32 18) br label %return
+bb11: tail call void @ext(i32 11, i32 20) br label %return
+bb12: tail call void @ext(i32 12, i32 22) br label %return
+bb13: tail call void @ext(i32 13, i32 24) br label %return
return: ret i32 %b
}
diff --git a/llvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll b/llvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll
index 15657730c2cdcb6..106f6bb856b6b8a 100644
--- a/llvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll
+++ b/llvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll
@@ -1,4 +1,4 @@
-; RUN: llc -mtriple=aarch64 %s -o - | FileCheck %s
+; RUN: llc -mtriple=aarch64 -aarch64-min-jump-table-entries=4 %s -o - | FileCheck %s
define void @f0() "patchable-function-entry"="0" "branch-target-enforcement"="true" {
; CHECK-LABEL: f0:
diff --git a/llvm/test/CodeGen/AArch64/redundant-mov-from-zero-extend.ll b/llvm/test/CodeGen/AArch64/redundant-mov-from-zero-extend.ll
index c150cb889313ac9..caa28b31a6f6539 100644
--- a/llvm/test/CodeGen/AArch64/redundant-mov-from-zero-extend.ll
+++ b/llvm/test/CodeGen/AArch64/redundant-mov-from-zero-extend.ll
@@ -11,33 +11,37 @@ define i32 @test(i32 %input, i32 %n, i32 %a) {
; CHECK-NEXT: .LBB0_2: // %bb.0
; CHECK-NEXT: add w8, w0, w1
; CHECK-NEXT: mov w0, #100 // =0x64
-; CHECK-NEXT: cmp w8, #4
-; CHECK-NEXT: b.hi .LBB0_5
+; CHECK-NEXT: cmp w8, #1
+; CHECK-NEXT: b.le .LBB0_7
; CHECK-NEXT: // %bb.3: // %bb.0
-; CHECK-NEXT: adrp x9, .LJTI0_0
-; CHECK-NEXT: add x9, x9, :lo12:.LJTI0_0
-; CHECK-NEXT: adr x10, .LBB0_4
-; CHECK-NEXT: ldrb w11, [x9, x8]
-; CHECK-NEXT: add x10, x10, x11, lsl #2
-; CHECK-NEXT: br x10
-; CHECK-NEXT: .LBB0_4: // %sw.bb
-; CHECK-NEXT: add w0, w2, #1
-; CHECK-NEXT: ret
-; CHECK-NEXT: .LBB0_5: // %bb.0
+; CHECK-NEXT: cmp w8, #2
+; CHECK-NEXT: b.eq .LBB0_10
+; CHECK-NEXT: // %bb.4: // %bb.0
+; CHECK-NEXT: cmp w8, #4
+; CHECK-NEXT: b.eq .LBB0_11
+; CHECK-NEXT: // %bb.5: // %bb.0
; CHECK-NEXT: cmp w8, #200
-; CHECK-NEXT: b.ne .LBB0_9
+; CHECK-NEXT: b.ne .LBB0_12
; CHECK-NEXT: // %bb.6: // %sw.bb7
; CHECK-NEXT: add w0, w2, #7
; CHECK-NEXT: ret
-; CHECK-NEXT: .LBB0_7: // %sw.bb3
+; CHECK-NEXT: .LBB0_7: // %bb.0
+; CHECK-NEXT: cbz w8, .LBB0_13
+; CHECK-NEXT: // %bb.8: // %bb.0
+; CHECK-NEXT: cmp w8, #1
+; CHECK-NEXT: b.ne .LBB0_12
+; CHECK-NEXT: // %bb.9: // %sw.bb1
+; CHECK-NEXT: add w0, w2, #3
+; CHECK-NEXT: ret
+; CHECK-NEXT: .LBB0_10: // %sw.bb3
; CHECK-NEXT: add w0, w2, #4
; CHECK-NEXT: ret
-; CHECK-NEXT: .LBB0_8: // %sw.bb5
+; CHECK-NEXT: .LBB0_11: // %sw.bb5
; CHECK-NEXT: add w0, w2, #5
-; CHECK-NEXT: .LBB0_9: // %return
+; CHECK-NEXT: .LBB0_12: // %return
; CHECK-NEXT: ret
-; CHECK-NEXT: .LBB0_10: // %sw.bb1
-; CHECK-NEXT: add w0, w2, #3
+; CHECK-NEXT: .LBB0_13: // %sw.bb
+; CHECK-NEXT: add w0, w2, #1
; CHECK-NEXT: ret
entry:
%b = add nsw i32 %input, %n
@@ -77,3 +81,4 @@ return:
%retval.0 = phi i32 [ %add8, %sw.bb7 ], [ %add6, %sw.bb5 ], [ %add4, %sw.bb3 ], [ %add2, %sw.bb1 ], [ %add, %sw.bb ], [ 100, %bb.0 ], [ 0, %entry ]
ret i32 %retval.0
}
+
diff --git a/llvm/test/CodeGen/AArch64/switch-unreachable-default.ll b/llvm/test/CodeGen/AArch64/switch-unreachable-default.ll
index 9acc5a150b630f2..949485bf52f2e8d 100644
--- a/llvm/test/CodeGen/AArch64/switch-unreachable-default.ll
+++ b/llvm/test/CodeGen/AArch64/switch-unreachable-default.ll
@@ -1,4 +1,4 @@
-; RUN: llc -O3 -o - %s | FileCheck %s
+; RUN: llc -O3 -aarch64-min-jump-table-entries=4 -o - %s | FileCheck %s
; Test that the output in the presence of an unreachable default does not have
; a compare and branch at the top of the switch to handle the default case.
diff --git a/llvm/test/CodeGen/AArch64/win64-jumptable.ll b/llvm/test/CodeGen/AArch64/win64-jumptable.ll
index 0b9b7deceae1138..f9f2b0bf0ca5cf5 100644
--- a/llvm/test/CodeGen/AArch64/win64-jumptable.ll
+++ b/llvm/test/CodeGen/AArch64/win64-jumptable.ll
@@ -1,5 +1,5 @@
-; RUN: llc -o - %s -mtriple=aarch64-windows -aarch64-enable-compress-jump-tables=0 | FileCheck %s
-; RUN: llc -o - %s -mtriple=aarch64-windows -aarch64-enable-compress-jump-tables=0 -filetype=obj | llvm-readobj --unwind - | FileCheck %s -check-prefix=UNWIND
+; RUN: llc -o - %s -mtriple=aarch64-windows -aarch64-min-jump-table-entries=4 -aarch64-enable-compress-jump-tables=0 | FileCheck %s
+; RUN: llc -o - %s -mtriple=aarch64-windows -aarch64-min-jump-table-entries=4 -aarch64-enable-compress-jump-tables=0 -filetype=obj | llvm-readobj --...
[truncated]
|
As with last time this came up I have no strong opinion on whether one value is better than another, so this sounds OK. What happens with minsize? Should that have a lower limit? |
I've changed getMinimumJumpTableEntries to now take a Function parameter so that we can query the minsize attribute and only increase the threshold if minsize is absent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you're making MinimumJumpTableEntries
a per function property perhaps it’s better stored in AArch64Subtarget
and then you don’t need to change the existing getMinimumJumpTableEntries
interface. This will be more in keeping with how other per function settings are handled.
Instead of querying the function attributes every time, we can let the subtarget decide once at construction.
Done! |
✅ With the latest revision this PR passed the C/C++ code formatter. |
Thanks. It looks like some of the formatting could be redone? The failing test at least looks unrelated. I see the Arm backend does the same thing with adding MinSize to the subtarget. This sounds OK to me if the formatting is happy. |
I ran make check-all using HEAD of LLVM and tests pass. |
How does this affect performance in cases where the last switch case is usually taken? |
Honestly speaking I don't know! As mentioned in the commit message, I just changed this number until I got the best performance results, so it's quite possible that in some benchmarks that I've not tested this does make things worse. If we discover issues later on we can revisit this and perhaps refine the algorithm. |
If I understand what you are saying, it should generate a tree of branches so the positions of the cases shouldn't matter a huge amount. It can depend on the differences between the cases and the branch targets though, they might not match. Let us know if you have reason to think this would be worse in some way, it's not a very easy thing to measure accurately. A load from a constant pool into an indirect jump can be worse than a series of branches that can each be independently branch predicted. |
I was mainly just wondering about what the worst case impact here would be for an unlucky switch. If I understand correctly, previously the switch (without jump table) would need at most two branches and now it needs at most four. Does that sound correct? (I was thinking in terms of a linear chain previously -- with a binary tree this doesn't sound so bad.) |
FWIW: this gives < 0.5% on the Neoverse V2 for Perlbench. Might even be noise because I haven't done a lot of runs. |
That's interesting - and an improvement, even if minor, is still good news! It is also possible to tweak the flag per CPU if a higher or lower number would help. |
I think it's actually noise, don't think I see improvements on the V2. |
I ran this quite a few times on neoverse-v1 hardware and consistently saw an improvement, particularly with #72132 applied as well. Although I appreciate there is a lot of noise in perlbench. From observation it seems that GCC also takes a similar approach when deciding to create jump tables for perlbench. |
What I also saw was a gradual improvement in performance as you increased the threshold for creating jump tables. For example, I'd get consistently better results with a minimum threshold of 6, then even better with 10, then the best around 13. When I saw that trend of improving performance with higher threshold it made me think this wasn't just luck or coincidence, and that there is a genuine correlation. |
Ok, that sounds good. |
Check the instruction count, cache and branch misses. Much of the gain comes from a huge reduction in instruction count, the rest from avoiding cache misses from loading the table entries and indirect branch misses. As indirect branch prediction improves, a lower value may be slightly better in future cores, however I don't believe it ever makes sense to use a table with less than 8 cases. |
…ts (llvm#71166) There are some workloads that are negatively impacted by using jump tables when the number of entries is small. The SPEC2017 perlbench benchmark is one example of this, where increasing the threshold to around 13 gives a ~1.5% improvement on neoverse-v1. I chose the minimum threshold based on empirical evidence rather than science, and just manually increased the threshold until I got the best performance without impacting other workloads. For neoverse-v1 I saw around ~0.2% improvement in the SPEC2017 integer geomean, and no overall change for neoverse-n1. If we find issues with this threshold later on we can always revisit this. The most significant SPEC2017 score changes on neoverse-v1 were: 500.perlbench_r: +1.6% 520.omnetpp_r: +0.6% and the rest saw changes < 0.5%. I updated CodeGen/AArch64/min-jump-table.ll to reflect the new threshold. For most of the affected tests I manually set the min number of entries back to 4 on the RUN line because the tests seem to rely upon this behaviour.
This is like what AArch64 has done in llvm#71166 except that we don't handle `HasMinSize` case now.
This is like what AArch64 has done in #71166 except that we don't handle `HasMinSize` case now.
The previous experiment in llvm#71166 set to 13 but the results werent significantly better. Setting to 9
There are some workloads that are negatively impacted by using jump tables when the number of entries is small. The SPEC2017 perlbench benchmark is one example of this, where increasing the threshold to around 13 gives a ~1.5% improvement on neoverse-v1. I chose the minimum threshold based on empirical evidence rather than science, and just manually increased the threshold until I got the best performance without impacting other workloads. For neoverse-v1 I saw around ~0.2% improvement in the SPEC2017 integer geomean, and no overall change for neoverse-n1. If we find issues with this threshold later on we can always revisit this.
The most significant SPEC2017 score changes on neoverse-v1 were:
500.perlbench_r: +1.6%
520.omnetpp_r: +0.6%
and the rest saw changes < 0.5%.
I updated CodeGen/AArch64/min-jump-table.ll to reflect the new threshold. For most of the affected tests I manually set the min number of entries back to 4 on the RUN line because the tests seem to rely upon this behaviour.