-
Notifications
You must be signed in to change notification settings - Fork 14k
[llvm][CodeGen] Intrinsic llvm.powi.*
code gen for vector arguments
#118242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In some backends, the i32 type is illegal and will be promoted. This causes exponent type check to fail when ISD::FOWI node generates a libcall.
@llvm/pr-subscribers-backend-loongarch Author: Zhaoxin Yang (ylzsx) ChangesIn some backends, the i32 type is illegal and will be promoted. This causes exponent type check to fail when ISD::FOWI node generates a libcall. Fix #118079 Patch is 66.95 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118242.diff 4 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index 63536336e96228..2829bbaef83100 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -4648,6 +4648,24 @@ void SelectionDAGLegalize::ConvertNodeToLibcall(SDNode *Node) {
bool ExponentHasSizeOfInt =
DAG.getLibInfo().getIntSize() ==
Node->getOperand(1 + Offset).getValueType().getSizeInBits();
+ if (!ExponentHasSizeOfInt) {
+ // In some backends, such as RISCV64 and LoongArch64, the i32 type is
+ // illegal and is promoted by previous process. For such cases, the
+ // exponent actually matches with sizeof(int) and a libcall should be
+ // generated.
+ SDNode *ExponentNode = Node->getOperand(1 + Offset).getNode();
+ unsigned LibIntSize = DAG.getLibInfo().getIntSize();
+ if (ExponentNode->getOpcode() == ISD::SIGN_EXTEND_INREG ||
+ ExponentNode->getOpcode() == ISD::AssertSext ||
+ ExponentNode->getOpcode() == ISD::AssertZext) {
+ EVT InnerType = cast<VTSDNode>(ExponentNode->getOperand(1))->getVT();
+ ExponentHasSizeOfInt = LibIntSize == InnerType.getSizeInBits();
+ } else if (ISD::isExtOpcode(ExponentNode->getOpcode())) {
+ ExponentHasSizeOfInt =
+ LibIntSize ==
+ ExponentNode->getOperand(0).getValueType().getSizeInBits();
+ }
+ }
if (!ExponentHasSizeOfInt) {
// If the exponent does not match with sizeof(int) a libcall to
// RTLIB::POWI would use the wrong type for the argument.
diff --git a/llvm/test/CodeGen/LoongArch/lasx/intrinsic-fpowi.ll b/llvm/test/CodeGen/LoongArch/lasx/intrinsic-fpowi.ll
new file mode 100644
index 00000000000000..f6b14a9bb000fd
--- /dev/null
+++ b/llvm/test/CodeGen/LoongArch/lasx/intrinsic-fpowi.ll
@@ -0,0 +1,142 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc --mtriple=loongarch64 --mattr=+lasx < %s | FileCheck %s
+
+declare <8 x float> @llvm.powi.v8f32.i32(<8 x float>, i32)
+
+define <8 x float> @powi_v8f32(<8 x float> %va, i32 %b) nounwind {
+; CHECK-LABEL: powi_v8f32:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: addi.d $sp, $sp, -80
+; CHECK-NEXT: st.d $ra, $sp, 72 # 8-byte Folded Spill
+; CHECK-NEXT: st.d $fp, $sp, 64 # 8-byte Folded Spill
+; CHECK-NEXT: xvst $xr0, $sp, 0 # 32-byte Folded Spill
+; CHECK-NEXT: addi.w $fp, $a0, 0
+; CHECK-NEXT: xvpickve2gr.w $a0, $xr0, 0
+; CHECK-NEXT: movgr2fr.w $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: xvinsgr2vr.w $xr0, $a0, 0
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.w $a0, $xr0, 1
+; CHECK-NEXT: movgr2fr.w $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.w $xr0, $a0, 1
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.w $a0, $xr0, 2
+; CHECK-NEXT: movgr2fr.w $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.w $xr0, $a0, 2
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.w $a0, $xr0, 3
+; CHECK-NEXT: movgr2fr.w $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.w $xr0, $a0, 3
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.w $a0, $xr0, 4
+; CHECK-NEXT: movgr2fr.w $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.w $xr0, $a0, 4
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.w $a0, $xr0, 5
+; CHECK-NEXT: movgr2fr.w $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.w $xr0, $a0, 5
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.w $a0, $xr0, 6
+; CHECK-NEXT: movgr2fr.w $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.w $xr0, $a0, 6
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.w $a0, $xr0, 7
+; CHECK-NEXT: movgr2fr.w $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.w $xr0, $a0, 7
+; CHECK-NEXT: ld.d $fp, $sp, 64 # 8-byte Folded Reload
+; CHECK-NEXT: ld.d $ra, $sp, 72 # 8-byte Folded Reload
+; CHECK-NEXT: addi.d $sp, $sp, 80
+; CHECK-NEXT: ret
+entry:
+ %res = call <8 x float> @llvm.powi.v8f32.i32(<8 x float> %va, i32 %b)
+ ret <8 x float> %res
+}
+
+declare <4 x double> @llvm.powi.v4f64.i32(<4 x double>, i32)
+
+define <4 x double> @powi_v4f64(<4 x double> %va, i32 %b) nounwind {
+; CHECK-LABEL: powi_v4f64:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: addi.d $sp, $sp, -80
+; CHECK-NEXT: st.d $ra, $sp, 72 # 8-byte Folded Spill
+; CHECK-NEXT: st.d $fp, $sp, 64 # 8-byte Folded Spill
+; CHECK-NEXT: xvst $xr0, $sp, 0 # 32-byte Folded Spill
+; CHECK-NEXT: addi.w $fp, $a0, 0
+; CHECK-NEXT: xvpickve2gr.d $a0, $xr0, 0
+; CHECK-NEXT: movgr2fr.d $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powidf2)
+; CHECK-NEXT: movfr2gr.d $a0, $fa0
+; CHECK-NEXT: xvinsgr2vr.d $xr0, $a0, 0
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.d $a0, $xr0, 1
+; CHECK-NEXT: movgr2fr.d $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powidf2)
+; CHECK-NEXT: movfr2gr.d $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.d $xr0, $a0, 1
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.d $a0, $xr0, 2
+; CHECK-NEXT: movgr2fr.d $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powidf2)
+; CHECK-NEXT: movfr2gr.d $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.d $xr0, $a0, 2
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.d $a0, $xr0, 3
+; CHECK-NEXT: movgr2fr.d $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powidf2)
+; CHECK-NEXT: movfr2gr.d $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.d $xr0, $a0, 3
+; CHECK-NEXT: ld.d $fp, $sp, 64 # 8-byte Folded Reload
+; CHECK-NEXT: ld.d $ra, $sp, 72 # 8-byte Folded Reload
+; CHECK-NEXT: addi.d $sp, $sp, 80
+; CHECK-NEXT: ret
+entry:
+ %res = call <4 x double> @llvm.powi.v4f64.i32(<4 x double> %va, i32 %b)
+ ret <4 x double> %res
+}
diff --git a/llvm/test/CodeGen/LoongArch/lsx/intrinsic-fpowi.ll b/llvm/test/CodeGen/LoongArch/lsx/intrinsic-fpowi.ll
new file mode 100644
index 00000000000000..b0f54e78c7a442
--- /dev/null
+++ b/llvm/test/CodeGen/LoongArch/lsx/intrinsic-fpowi.ll
@@ -0,0 +1,88 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc --mtriple=loongarch64 --mattr=+lsx < %s | FileCheck %s
+
+declare <4 x float> @llvm.powi.v4f32.i32(<4 x float>, i32)
+
+define <4 x float> @powi_v4f32(<4 x float> %va, i32 %b) nounwind {
+; CHECK-LABEL: powi_v4f32:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: addi.d $sp, $sp, -48
+; CHECK-NEXT: st.d $ra, $sp, 40 # 8-byte Folded Spill
+; CHECK-NEXT: st.d $fp, $sp, 32 # 8-byte Folded Spill
+; CHECK-NEXT: vst $vr0, $sp, 0 # 16-byte Folded Spill
+; CHECK-NEXT: addi.w $fp, $a0, 0
+; CHECK-NEXT: vreplvei.w $vr0, $vr0, 0
+; CHECK-NEXT: # kill: def $f0 killed $f0 killed $vr0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: vinsgr2vr.w $vr0, $a0, 0
+; CHECK-NEXT: vst $vr0, $sp, 16 # 16-byte Folded Spill
+; CHECK-NEXT: vld $vr0, $sp, 0 # 16-byte Folded Reload
+; CHECK-NEXT: vreplvei.w $vr0, $vr0, 1
+; CHECK-NEXT: # kill: def $f0 killed $f0 killed $vr0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: vld $vr0, $sp, 16 # 16-byte Folded Reload
+; CHECK-NEXT: vinsgr2vr.w $vr0, $a0, 1
+; CHECK-NEXT: vst $vr0, $sp, 16 # 16-byte Folded Spill
+; CHECK-NEXT: vld $vr0, $sp, 0 # 16-byte Folded Reload
+; CHECK-NEXT: vreplvei.w $vr0, $vr0, 2
+; CHECK-NEXT: # kill: def $f0 killed $f0 killed $vr0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: vld $vr0, $sp, 16 # 16-byte Folded Reload
+; CHECK-NEXT: vinsgr2vr.w $vr0, $a0, 2
+; CHECK-NEXT: vst $vr0, $sp, 16 # 16-byte Folded Spill
+; CHECK-NEXT: vld $vr0, $sp, 0 # 16-byte Folded Reload
+; CHECK-NEXT: vreplvei.w $vr0, $vr0, 3
+; CHECK-NEXT: # kill: def $f0 killed $f0 killed $vr0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: vld $vr0, $sp, 16 # 16-byte Folded Reload
+; CHECK-NEXT: vinsgr2vr.w $vr0, $a0, 3
+; CHECK-NEXT: ld.d $fp, $sp, 32 # 8-byte Folded Reload
+; CHECK-NEXT: ld.d $ra, $sp, 40 # 8-byte Folded Reload
+; CHECK-NEXT: addi.d $sp, $sp, 48
+; CHECK-NEXT: ret
+entry:
+ %res = call <4 x float> @llvm.powi.v4f32.i32(<4 x float> %va, i32 %b)
+ ret <4 x float> %res
+}
+
+declare <2 x double> @llvm.powi.v2f64.i32(<2 x double>, i32)
+
+define <2 x double> @powi_v2f64(<2 x double> %va, i32 %b) nounwind {
+; CHECK-LABEL: powi_v2f64:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: addi.d $sp, $sp, -48
+; CHECK-NEXT: st.d $ra, $sp, 40 # 8-byte Folded Spill
+; CHECK-NEXT: st.d $fp, $sp, 32 # 8-byte Folded Spill
+; CHECK-NEXT: vst $vr0, $sp, 0 # 16-byte Folded Spill
+; CHECK-NEXT: addi.w $fp, $a0, 0
+; CHECK-NEXT: vreplvei.d $vr0, $vr0, 0
+; CHECK-NEXT: # kill: def $f0_64 killed $f0_64 killed $vr0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powidf2)
+; CHECK-NEXT: movfr2gr.d $a0, $fa0
+; CHECK-NEXT: vinsgr2vr.d $vr0, $a0, 0
+; CHECK-NEXT: vst $vr0, $sp, 16 # 16-byte Folded Spill
+; CHECK-NEXT: vld $vr0, $sp, 0 # 16-byte Folded Reload
+; CHECK-NEXT: vreplvei.d $vr0, $vr0, 1
+; CHECK-NEXT: # kill: def $f0_64 killed $f0_64 killed $vr0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powidf2)
+; CHECK-NEXT: movfr2gr.d $a0, $fa0
+; CHECK-NEXT: vld $vr0, $sp, 16 # 16-byte Folded Reload
+; CHECK-NEXT: vinsgr2vr.d $vr0, $a0, 1
+; CHECK-NEXT: ld.d $fp, $sp, 32 # 8-byte Folded Reload
+; CHECK-NEXT: ld.d $ra, $sp, 40 # 8-byte Folded Reload
+; CHECK-NEXT: addi.d $sp, $sp, 48
+; CHECK-NEXT: ret
+entry:
+ %res = call <2 x double> @llvm.powi.v2f64.i32(<2 x double> %va, i32 %b)
+ ret <2 x double> %res
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpowi.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpowi.ll
new file mode 100644
index 00000000000000..d99feb5fdd921c
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpowi.ll
@@ -0,0 +1,1427 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv32 -mattr=+v,+f,+d -target-abi=ilp32d -verify-machineinstrs < %s \
+; RUN: | FileCheck %s --check-prefix=RV32
+; RUN: llc -mtriple=riscv64 -mattr=+v,+f,+d -target-abi=lp64d -verify-machineinstrs < %s \
+; RUN: | FileCheck %s --check-prefix=RV64
+
+define <1 x float> @powi_v1f32(<1 x float> %x, i32 %y) {
+; RV32-LABEL: powi_v1f32:
+; RV32: # %bb.0:
+; RV32-NEXT: addi sp, sp, -16
+; RV32-NEXT: .cfi_def_cfa_offset 16
+; RV32-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
+; RV32-NEXT: .cfi_offset ra, -4
+; RV32-NEXT: vsetivli zero, 1, e32, m1, ta, ma
+; RV32-NEXT: vfmv.f.s fa0, v8
+; RV32-NEXT: call __powisf2
+; RV32-NEXT: vsetivli zero, 1, e32, m1, ta, ma
+; RV32-NEXT: vfmv.s.f v8, fa0
+; RV32-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
+; RV32-NEXT: .cfi_restore ra
+; RV32-NEXT: addi sp, sp, 16
+; RV32-NEXT: .cfi_def_cfa_offset 0
+; RV32-NEXT: ret
+;
+; RV64-LABEL: powi_v1f32:
+; RV64: # %bb.0:
+; RV64-NEXT: addi sp, sp, -16
+; RV64-NEXT: .cfi_def_cfa_offset 16
+; RV64-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
+; RV64-NEXT: .cfi_offset ra, -8
+; RV64-NEXT: sext.w a0, a0
+; RV64-NEXT: vsetivli zero, 1, e32, m1, ta, ma
+; RV64-NEXT: vfmv.f.s fa0, v8
+; RV64-NEXT: call __powisf2
+; RV64-NEXT: vsetivli zero, 1, e32, m1, ta, ma
+; RV64-NEXT: vfmv.s.f v8, fa0
+; RV64-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
+; RV64-NEXT: .cfi_restore ra
+; RV64-NEXT: addi sp, sp, 16
+; RV64-NEXT: .cfi_def_cfa_offset 0
+; RV64-NEXT: ret
+ %a = call <1 x float> @llvm.powi.v1f32.i32(<1 x float> %x, i32 %y)
+ ret <1 x float> %a
+}
+declare <1 x float> @llvm.powi.v1f32.i32(<1 x float>, i32)
+
+define <2 x float> @powi_v2f32(<2 x float> %x, i32 %y) {
+; RV32-LABEL: powi_v2f32:
+; RV32: # %bb.0:
+; RV32-NEXT: addi sp, sp, -32
+; RV32-NEXT: .cfi_def_cfa_offset 32
+; RV32-NEXT: sw ra, 28(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s0, 24(sp) # 4-byte Folded Spill
+; RV32-NEXT: fsd fs0, 16(sp) # 8-byte Folded Spill
+; RV32-NEXT: .cfi_offset ra, -4
+; RV32-NEXT: .cfi_offset s0, -8
+; RV32-NEXT: .cfi_offset fs0, -16
+; RV32-NEXT: csrr a1, vlenb
+; RV32-NEXT: sub sp, sp, a1
+; RV32-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x20, 0x22, 0x11, 0x01, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 32 + 1 * vlenb
+; RV32-NEXT: mv s0, a0
+; RV32-NEXT: addi a1, sp, 16
+; RV32-NEXT: vs1r.v v8, (a1) # Unknown-size Folded Spill
+; RV32-NEXT: vsetivli zero, 1, e32, mf2, ta, ma
+; RV32-NEXT: vslidedown.vi v9, v8, 1
+; RV32-NEXT: vfmv.f.s fa0, v9
+; RV32-NEXT: call __powisf2
+; RV32-NEXT: fmv.s fs0, fa0
+; RV32-NEXT: flw fa0, 16(sp) # 8-byte Folded Reload
+; RV32-NEXT: mv a0, s0
+; RV32-NEXT: call __powisf2
+; RV32-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
+; RV32-NEXT: vfmv.v.f v8, fa0
+; RV32-NEXT: vfslide1down.vf v8, v8, fs0
+; RV32-NEXT: csrr a0, vlenb
+; RV32-NEXT: add sp, sp, a0
+; RV32-NEXT: .cfi_def_cfa sp, 32
+; RV32-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
+; RV32-NEXT: lw s0, 24(sp) # 4-byte Folded Reload
+; RV32-NEXT: fld fs0, 16(sp) # 8-byte Folded Reload
+; RV32-NEXT: .cfi_restore ra
+; RV32-NEXT: .cfi_restore s0
+; RV32-NEXT: .cfi_restore fs0
+; RV32-NEXT: addi sp, sp, 32
+; RV32-NEXT: .cfi_def_cfa_offset 0
+; RV32-NEXT: ret
+;
+; RV64-LABEL: powi_v2f32:
+; RV64: # %bb.0:
+; RV64-NEXT: addi sp, sp, -64
+; RV64-NEXT: .cfi_def_cfa_offset 64
+; RV64-NEXT: sd ra, 56(sp) # 8-byte Folded Spill
+; RV64-NEXT: sd s0, 48(sp) # 8-byte Folded Spill
+; RV64-NEXT: fsd fs0, 40(sp) # 8-byte Folded Spill
+; RV64-NEXT: .cfi_offset ra, -8
+; RV64-NEXT: .cfi_offset s0, -16
+; RV64-NEXT: .cfi_offset fs0, -24
+; RV64-NEXT: csrr a1, vlenb
+; RV64-NEXT: sub sp, sp, a1
+; RV64-NEXT: .cfi_escape 0x0f, 0x0e, 0x72, 0x00, 0x11, 0xc0, 0x00, 0x22, 0x11, 0x01, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 64 + 1 * vlenb
+; RV64-NEXT: addi a1, sp, 32
+; RV64-NEXT: vs1r.v v8, (a1) # Unknown-size Folded Spill
+; RV64-NEXT: sext.w s0, a0
+; RV64-NEXT: vsetivli zero, 1, e32, mf2, ta, ma
+; RV64-NEXT: vslidedown.vi v9, v8, 1
+; RV64-NEXT: vfmv.f.s fa0, v9
+; RV64-NEXT: mv a0, s0
+; RV64-NEXT: call __powisf2
+; RV64-NEXT: fmv.s fs0, fa0
+; RV64-NEXT: flw fa0, 32(sp) # 8-byte Folded Reload
+; RV64-NEXT: mv a0, s0
+; RV64-NEXT: call __powisf2
+; RV64-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
+; RV64-NEXT: vfmv.v.f v8, fa0
+; RV64-NEXT: vfslide1down.vf v8, v8, fs0
+; RV64-NEXT: csrr a0, vlenb
+; RV64-NEXT: add sp, sp, a0
+; RV64-NEXT: .cfi_def_cfa sp, 64
+; RV64-NEXT: ld ra, 56(sp) # 8-byte Folded Reload
+; RV64-NEXT: ld s0, 48(sp) # 8-byte Folded Reload
+; RV64-NEXT: fld fs0, 40(sp) # 8-byte Folded Reload
+; RV64-NEXT: .cfi_restore ra
+; RV64-NEXT: .cfi_restore s0
+; RV64-NEXT: .cfi_restore fs0
+; RV64-NEXT: addi sp, sp, 64
+; RV64-NEXT: .cfi_def_cfa_offset 0
+; RV64-NEXT: ret
+ %a = call <2 x float> @llvm.powi.v2f32.i32(<2 x float> %x, i32 %y)
+ ret <2 x float> %a
+}
+declare <2 x float> @llvm.powi.v2f32.i32(<2 x float>, i32)
+
+define <3 x float> @powi_v3f32(<3 x float> %x, i32 %y) {
+; RV32-LABEL: powi_v3f32:
+; RV32: # %bb.0:
+; RV32-NEXT: addi sp, sp, -32
+; RV32-NEXT: .cfi_def_cfa_offset 32
+; RV32-NEXT: sw ra, 28(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s0, 24(sp) # 4-byte Folded Spill
+; RV32-NEXT: fsd fs0, 16(sp) # 8-byte Folded Spill
+; RV32-NEXT: .cfi_offset ra, -4
+; RV32-NEXT: .cfi_offset s0, -8
+; RV32-NEXT: .cfi_offset fs0, -16
+; RV32-NEXT: csrr a1, vlenb
+; RV32-NEXT: slli a1, a1, 1
+; RV32-NEXT: sub sp, sp, a1
+; RV32-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x20, 0x22, 0x11, 0x02, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 32 + 2 * vlenb
+; RV32-NEXT: mv s0, a0
+; RV32-NEXT: csrr a1, vlenb
+; RV32-NEXT: add a1, sp, a1
+; RV32-NEXT: addi a1, a1, 16
+; RV32-NEXT: vs1r.v v8, (a1) # Unknown-size Folded Spill
+; RV32-NEXT: vsetivli zero, 1, e32, m1, ta, ma
+; RV32-NEXT: vslidedown.vi v9, v8, 1
+; RV32-NEXT: vfmv.f.s fa0, v9
+; RV32-NEXT: call __powisf2
+; RV32-NEXT: fmv.s fs0, fa0
+; RV32-NEXT: csrr a0, vlenb
+; RV32-NEXT: add a0, sp, a0
+; RV32-NEXT: flw fa0, 16(a0) # 8-byte Folded Reload
+; RV32-NEXT: mv a0, s0
+; RV32-NEXT: call __powisf2
+; RV32-NEXT: vsetivli zero, 4, e32, m1, ta, ma
+; RV32-NEXT: vfmv.v.f v8, fa0
+; RV32-NEXT: vfslide1down.vf v8, v8, fs0
+; RV32-NEXT: addi a0, sp, 16
+; RV32-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
+; RV32-NEXT: csrr a0, vlenb
+; RV32-NEXT: add a0, sp, a0
+; RV32-NEXT: addi a0, a0, 16
+; RV32-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
+; RV32-NEXT: vslidedown.vi v8, v8, 2
+; RV32-NEXT: vfmv.f.s fa0, v8
+; RV32-NEXT: mv a0, s0
+; RV32-NEXT: call __powisf2
+; RV32-NEXT: addi a0, sp, 16
+; RV32-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
+; RV32-NEXT: vsetivli zero, 4, e32, m1, ta, ma
+; RV32-NEXT: vfslide1down.vf v8, v8, fa0
+; RV32-NEXT: vslidedown.vi v8, v8, 1
+; RV32-NEXT: csrr a0, vlenb
+; RV32-NEXT: slli a0, a0, 1
+; RV32-NEXT: add sp, sp, a0
+; RV32-NEXT: .cfi_def_cfa sp, 32
+; RV32-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
+; RV32-NEXT: lw s0, 24(sp) # 4-byte Folded Reload
+; RV32-NEXT: fld fs0, 16(sp) # 8-byte Folded Reload
+; RV32-NEXT: .cfi_restore ra
+; RV32-NEXT: .cfi_restore s0
+; RV32-NEXT: .cfi_restore fs0
+; RV32-NEXT: addi sp, sp, 32
+; RV32-NEXT: .cfi_def_cfa_offset 0
+; RV32-NEXT: ret
+;
+; RV64-LABEL: powi_v3f32:
+; RV64: # %bb.0:
+; RV64-NEXT: addi sp, sp, -64
+; RV64-NEXT: .cfi_def_cfa_offset 64
+; R...
[truncated]
|
@llvm/pr-subscribers-llvm-selectiondag Author: Zhaoxin Yang (ylzsx) ChangesIn some backends, the i32 type is illegal and will be promoted. This causes exponent type check to fail when ISD::FOWI node generates a libcall. Fix #118079 Patch is 66.95 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118242.diff 4 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index 63536336e96228..2829bbaef83100 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -4648,6 +4648,24 @@ void SelectionDAGLegalize::ConvertNodeToLibcall(SDNode *Node) {
bool ExponentHasSizeOfInt =
DAG.getLibInfo().getIntSize() ==
Node->getOperand(1 + Offset).getValueType().getSizeInBits();
+ if (!ExponentHasSizeOfInt) {
+ // In some backends, such as RISCV64 and LoongArch64, the i32 type is
+ // illegal and is promoted by previous process. For such cases, the
+ // exponent actually matches with sizeof(int) and a libcall should be
+ // generated.
+ SDNode *ExponentNode = Node->getOperand(1 + Offset).getNode();
+ unsigned LibIntSize = DAG.getLibInfo().getIntSize();
+ if (ExponentNode->getOpcode() == ISD::SIGN_EXTEND_INREG ||
+ ExponentNode->getOpcode() == ISD::AssertSext ||
+ ExponentNode->getOpcode() == ISD::AssertZext) {
+ EVT InnerType = cast<VTSDNode>(ExponentNode->getOperand(1))->getVT();
+ ExponentHasSizeOfInt = LibIntSize == InnerType.getSizeInBits();
+ } else if (ISD::isExtOpcode(ExponentNode->getOpcode())) {
+ ExponentHasSizeOfInt =
+ LibIntSize ==
+ ExponentNode->getOperand(0).getValueType().getSizeInBits();
+ }
+ }
if (!ExponentHasSizeOfInt) {
// If the exponent does not match with sizeof(int) a libcall to
// RTLIB::POWI would use the wrong type for the argument.
diff --git a/llvm/test/CodeGen/LoongArch/lasx/intrinsic-fpowi.ll b/llvm/test/CodeGen/LoongArch/lasx/intrinsic-fpowi.ll
new file mode 100644
index 00000000000000..f6b14a9bb000fd
--- /dev/null
+++ b/llvm/test/CodeGen/LoongArch/lasx/intrinsic-fpowi.ll
@@ -0,0 +1,142 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc --mtriple=loongarch64 --mattr=+lasx < %s | FileCheck %s
+
+declare <8 x float> @llvm.powi.v8f32.i32(<8 x float>, i32)
+
+define <8 x float> @powi_v8f32(<8 x float> %va, i32 %b) nounwind {
+; CHECK-LABEL: powi_v8f32:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: addi.d $sp, $sp, -80
+; CHECK-NEXT: st.d $ra, $sp, 72 # 8-byte Folded Spill
+; CHECK-NEXT: st.d $fp, $sp, 64 # 8-byte Folded Spill
+; CHECK-NEXT: xvst $xr0, $sp, 0 # 32-byte Folded Spill
+; CHECK-NEXT: addi.w $fp, $a0, 0
+; CHECK-NEXT: xvpickve2gr.w $a0, $xr0, 0
+; CHECK-NEXT: movgr2fr.w $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: xvinsgr2vr.w $xr0, $a0, 0
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.w $a0, $xr0, 1
+; CHECK-NEXT: movgr2fr.w $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.w $xr0, $a0, 1
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.w $a0, $xr0, 2
+; CHECK-NEXT: movgr2fr.w $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.w $xr0, $a0, 2
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.w $a0, $xr0, 3
+; CHECK-NEXT: movgr2fr.w $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.w $xr0, $a0, 3
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.w $a0, $xr0, 4
+; CHECK-NEXT: movgr2fr.w $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.w $xr0, $a0, 4
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.w $a0, $xr0, 5
+; CHECK-NEXT: movgr2fr.w $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.w $xr0, $a0, 5
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.w $a0, $xr0, 6
+; CHECK-NEXT: movgr2fr.w $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.w $xr0, $a0, 6
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.w $a0, $xr0, 7
+; CHECK-NEXT: movgr2fr.w $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.w $xr0, $a0, 7
+; CHECK-NEXT: ld.d $fp, $sp, 64 # 8-byte Folded Reload
+; CHECK-NEXT: ld.d $ra, $sp, 72 # 8-byte Folded Reload
+; CHECK-NEXT: addi.d $sp, $sp, 80
+; CHECK-NEXT: ret
+entry:
+ %res = call <8 x float> @llvm.powi.v8f32.i32(<8 x float> %va, i32 %b)
+ ret <8 x float> %res
+}
+
+declare <4 x double> @llvm.powi.v4f64.i32(<4 x double>, i32)
+
+define <4 x double> @powi_v4f64(<4 x double> %va, i32 %b) nounwind {
+; CHECK-LABEL: powi_v4f64:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: addi.d $sp, $sp, -80
+; CHECK-NEXT: st.d $ra, $sp, 72 # 8-byte Folded Spill
+; CHECK-NEXT: st.d $fp, $sp, 64 # 8-byte Folded Spill
+; CHECK-NEXT: xvst $xr0, $sp, 0 # 32-byte Folded Spill
+; CHECK-NEXT: addi.w $fp, $a0, 0
+; CHECK-NEXT: xvpickve2gr.d $a0, $xr0, 0
+; CHECK-NEXT: movgr2fr.d $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powidf2)
+; CHECK-NEXT: movfr2gr.d $a0, $fa0
+; CHECK-NEXT: xvinsgr2vr.d $xr0, $a0, 0
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.d $a0, $xr0, 1
+; CHECK-NEXT: movgr2fr.d $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powidf2)
+; CHECK-NEXT: movfr2gr.d $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.d $xr0, $a0, 1
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.d $a0, $xr0, 2
+; CHECK-NEXT: movgr2fr.d $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powidf2)
+; CHECK-NEXT: movfr2gr.d $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.d $xr0, $a0, 2
+; CHECK-NEXT: xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT: xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT: xvpickve2gr.d $a0, $xr0, 3
+; CHECK-NEXT: movgr2fr.d $fa0, $a0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powidf2)
+; CHECK-NEXT: movfr2gr.d $a0, $fa0
+; CHECK-NEXT: xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT: xvinsgr2vr.d $xr0, $a0, 3
+; CHECK-NEXT: ld.d $fp, $sp, 64 # 8-byte Folded Reload
+; CHECK-NEXT: ld.d $ra, $sp, 72 # 8-byte Folded Reload
+; CHECK-NEXT: addi.d $sp, $sp, 80
+; CHECK-NEXT: ret
+entry:
+ %res = call <4 x double> @llvm.powi.v4f64.i32(<4 x double> %va, i32 %b)
+ ret <4 x double> %res
+}
diff --git a/llvm/test/CodeGen/LoongArch/lsx/intrinsic-fpowi.ll b/llvm/test/CodeGen/LoongArch/lsx/intrinsic-fpowi.ll
new file mode 100644
index 00000000000000..b0f54e78c7a442
--- /dev/null
+++ b/llvm/test/CodeGen/LoongArch/lsx/intrinsic-fpowi.ll
@@ -0,0 +1,88 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc --mtriple=loongarch64 --mattr=+lsx < %s | FileCheck %s
+
+declare <4 x float> @llvm.powi.v4f32.i32(<4 x float>, i32)
+
+define <4 x float> @powi_v4f32(<4 x float> %va, i32 %b) nounwind {
+; CHECK-LABEL: powi_v4f32:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: addi.d $sp, $sp, -48
+; CHECK-NEXT: st.d $ra, $sp, 40 # 8-byte Folded Spill
+; CHECK-NEXT: st.d $fp, $sp, 32 # 8-byte Folded Spill
+; CHECK-NEXT: vst $vr0, $sp, 0 # 16-byte Folded Spill
+; CHECK-NEXT: addi.w $fp, $a0, 0
+; CHECK-NEXT: vreplvei.w $vr0, $vr0, 0
+; CHECK-NEXT: # kill: def $f0 killed $f0 killed $vr0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: vinsgr2vr.w $vr0, $a0, 0
+; CHECK-NEXT: vst $vr0, $sp, 16 # 16-byte Folded Spill
+; CHECK-NEXT: vld $vr0, $sp, 0 # 16-byte Folded Reload
+; CHECK-NEXT: vreplvei.w $vr0, $vr0, 1
+; CHECK-NEXT: # kill: def $f0 killed $f0 killed $vr0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: vld $vr0, $sp, 16 # 16-byte Folded Reload
+; CHECK-NEXT: vinsgr2vr.w $vr0, $a0, 1
+; CHECK-NEXT: vst $vr0, $sp, 16 # 16-byte Folded Spill
+; CHECK-NEXT: vld $vr0, $sp, 0 # 16-byte Folded Reload
+; CHECK-NEXT: vreplvei.w $vr0, $vr0, 2
+; CHECK-NEXT: # kill: def $f0 killed $f0 killed $vr0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: vld $vr0, $sp, 16 # 16-byte Folded Reload
+; CHECK-NEXT: vinsgr2vr.w $vr0, $a0, 2
+; CHECK-NEXT: vst $vr0, $sp, 16 # 16-byte Folded Spill
+; CHECK-NEXT: vld $vr0, $sp, 0 # 16-byte Folded Reload
+; CHECK-NEXT: vreplvei.w $vr0, $vr0, 3
+; CHECK-NEXT: # kill: def $f0 killed $f0 killed $vr0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powisf2)
+; CHECK-NEXT: movfr2gr.s $a0, $fa0
+; CHECK-NEXT: vld $vr0, $sp, 16 # 16-byte Folded Reload
+; CHECK-NEXT: vinsgr2vr.w $vr0, $a0, 3
+; CHECK-NEXT: ld.d $fp, $sp, 32 # 8-byte Folded Reload
+; CHECK-NEXT: ld.d $ra, $sp, 40 # 8-byte Folded Reload
+; CHECK-NEXT: addi.d $sp, $sp, 48
+; CHECK-NEXT: ret
+entry:
+ %res = call <4 x float> @llvm.powi.v4f32.i32(<4 x float> %va, i32 %b)
+ ret <4 x float> %res
+}
+
+declare <2 x double> @llvm.powi.v2f64.i32(<2 x double>, i32)
+
+define <2 x double> @powi_v2f64(<2 x double> %va, i32 %b) nounwind {
+; CHECK-LABEL: powi_v2f64:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: addi.d $sp, $sp, -48
+; CHECK-NEXT: st.d $ra, $sp, 40 # 8-byte Folded Spill
+; CHECK-NEXT: st.d $fp, $sp, 32 # 8-byte Folded Spill
+; CHECK-NEXT: vst $vr0, $sp, 0 # 16-byte Folded Spill
+; CHECK-NEXT: addi.w $fp, $a0, 0
+; CHECK-NEXT: vreplvei.d $vr0, $vr0, 0
+; CHECK-NEXT: # kill: def $f0_64 killed $f0_64 killed $vr0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powidf2)
+; CHECK-NEXT: movfr2gr.d $a0, $fa0
+; CHECK-NEXT: vinsgr2vr.d $vr0, $a0, 0
+; CHECK-NEXT: vst $vr0, $sp, 16 # 16-byte Folded Spill
+; CHECK-NEXT: vld $vr0, $sp, 0 # 16-byte Folded Reload
+; CHECK-NEXT: vreplvei.d $vr0, $vr0, 1
+; CHECK-NEXT: # kill: def $f0_64 killed $f0_64 killed $vr0
+; CHECK-NEXT: move $a0, $fp
+; CHECK-NEXT: bl %plt(__powidf2)
+; CHECK-NEXT: movfr2gr.d $a0, $fa0
+; CHECK-NEXT: vld $vr0, $sp, 16 # 16-byte Folded Reload
+; CHECK-NEXT: vinsgr2vr.d $vr0, $a0, 1
+; CHECK-NEXT: ld.d $fp, $sp, 32 # 8-byte Folded Reload
+; CHECK-NEXT: ld.d $ra, $sp, 40 # 8-byte Folded Reload
+; CHECK-NEXT: addi.d $sp, $sp, 48
+; CHECK-NEXT: ret
+entry:
+ %res = call <2 x double> @llvm.powi.v2f64.i32(<2 x double> %va, i32 %b)
+ ret <2 x double> %res
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpowi.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpowi.ll
new file mode 100644
index 00000000000000..d99feb5fdd921c
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpowi.ll
@@ -0,0 +1,1427 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv32 -mattr=+v,+f,+d -target-abi=ilp32d -verify-machineinstrs < %s \
+; RUN: | FileCheck %s --check-prefix=RV32
+; RUN: llc -mtriple=riscv64 -mattr=+v,+f,+d -target-abi=lp64d -verify-machineinstrs < %s \
+; RUN: | FileCheck %s --check-prefix=RV64
+
+define <1 x float> @powi_v1f32(<1 x float> %x, i32 %y) {
+; RV32-LABEL: powi_v1f32:
+; RV32: # %bb.0:
+; RV32-NEXT: addi sp, sp, -16
+; RV32-NEXT: .cfi_def_cfa_offset 16
+; RV32-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
+; RV32-NEXT: .cfi_offset ra, -4
+; RV32-NEXT: vsetivli zero, 1, e32, m1, ta, ma
+; RV32-NEXT: vfmv.f.s fa0, v8
+; RV32-NEXT: call __powisf2
+; RV32-NEXT: vsetivli zero, 1, e32, m1, ta, ma
+; RV32-NEXT: vfmv.s.f v8, fa0
+; RV32-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
+; RV32-NEXT: .cfi_restore ra
+; RV32-NEXT: addi sp, sp, 16
+; RV32-NEXT: .cfi_def_cfa_offset 0
+; RV32-NEXT: ret
+;
+; RV64-LABEL: powi_v1f32:
+; RV64: # %bb.0:
+; RV64-NEXT: addi sp, sp, -16
+; RV64-NEXT: .cfi_def_cfa_offset 16
+; RV64-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
+; RV64-NEXT: .cfi_offset ra, -8
+; RV64-NEXT: sext.w a0, a0
+; RV64-NEXT: vsetivli zero, 1, e32, m1, ta, ma
+; RV64-NEXT: vfmv.f.s fa0, v8
+; RV64-NEXT: call __powisf2
+; RV64-NEXT: vsetivli zero, 1, e32, m1, ta, ma
+; RV64-NEXT: vfmv.s.f v8, fa0
+; RV64-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
+; RV64-NEXT: .cfi_restore ra
+; RV64-NEXT: addi sp, sp, 16
+; RV64-NEXT: .cfi_def_cfa_offset 0
+; RV64-NEXT: ret
+ %a = call <1 x float> @llvm.powi.v1f32.i32(<1 x float> %x, i32 %y)
+ ret <1 x float> %a
+}
+declare <1 x float> @llvm.powi.v1f32.i32(<1 x float>, i32)
+
+define <2 x float> @powi_v2f32(<2 x float> %x, i32 %y) {
+; RV32-LABEL: powi_v2f32:
+; RV32: # %bb.0:
+; RV32-NEXT: addi sp, sp, -32
+; RV32-NEXT: .cfi_def_cfa_offset 32
+; RV32-NEXT: sw ra, 28(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s0, 24(sp) # 4-byte Folded Spill
+; RV32-NEXT: fsd fs0, 16(sp) # 8-byte Folded Spill
+; RV32-NEXT: .cfi_offset ra, -4
+; RV32-NEXT: .cfi_offset s0, -8
+; RV32-NEXT: .cfi_offset fs0, -16
+; RV32-NEXT: csrr a1, vlenb
+; RV32-NEXT: sub sp, sp, a1
+; RV32-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x20, 0x22, 0x11, 0x01, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 32 + 1 * vlenb
+; RV32-NEXT: mv s0, a0
+; RV32-NEXT: addi a1, sp, 16
+; RV32-NEXT: vs1r.v v8, (a1) # Unknown-size Folded Spill
+; RV32-NEXT: vsetivli zero, 1, e32, mf2, ta, ma
+; RV32-NEXT: vslidedown.vi v9, v8, 1
+; RV32-NEXT: vfmv.f.s fa0, v9
+; RV32-NEXT: call __powisf2
+; RV32-NEXT: fmv.s fs0, fa0
+; RV32-NEXT: flw fa0, 16(sp) # 8-byte Folded Reload
+; RV32-NEXT: mv a0, s0
+; RV32-NEXT: call __powisf2
+; RV32-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
+; RV32-NEXT: vfmv.v.f v8, fa0
+; RV32-NEXT: vfslide1down.vf v8, v8, fs0
+; RV32-NEXT: csrr a0, vlenb
+; RV32-NEXT: add sp, sp, a0
+; RV32-NEXT: .cfi_def_cfa sp, 32
+; RV32-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
+; RV32-NEXT: lw s0, 24(sp) # 4-byte Folded Reload
+; RV32-NEXT: fld fs0, 16(sp) # 8-byte Folded Reload
+; RV32-NEXT: .cfi_restore ra
+; RV32-NEXT: .cfi_restore s0
+; RV32-NEXT: .cfi_restore fs0
+; RV32-NEXT: addi sp, sp, 32
+; RV32-NEXT: .cfi_def_cfa_offset 0
+; RV32-NEXT: ret
+;
+; RV64-LABEL: powi_v2f32:
+; RV64: # %bb.0:
+; RV64-NEXT: addi sp, sp, -64
+; RV64-NEXT: .cfi_def_cfa_offset 64
+; RV64-NEXT: sd ra, 56(sp) # 8-byte Folded Spill
+; RV64-NEXT: sd s0, 48(sp) # 8-byte Folded Spill
+; RV64-NEXT: fsd fs0, 40(sp) # 8-byte Folded Spill
+; RV64-NEXT: .cfi_offset ra, -8
+; RV64-NEXT: .cfi_offset s0, -16
+; RV64-NEXT: .cfi_offset fs0, -24
+; RV64-NEXT: csrr a1, vlenb
+; RV64-NEXT: sub sp, sp, a1
+; RV64-NEXT: .cfi_escape 0x0f, 0x0e, 0x72, 0x00, 0x11, 0xc0, 0x00, 0x22, 0x11, 0x01, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 64 + 1 * vlenb
+; RV64-NEXT: addi a1, sp, 32
+; RV64-NEXT: vs1r.v v8, (a1) # Unknown-size Folded Spill
+; RV64-NEXT: sext.w s0, a0
+; RV64-NEXT: vsetivli zero, 1, e32, mf2, ta, ma
+; RV64-NEXT: vslidedown.vi v9, v8, 1
+; RV64-NEXT: vfmv.f.s fa0, v9
+; RV64-NEXT: mv a0, s0
+; RV64-NEXT: call __powisf2
+; RV64-NEXT: fmv.s fs0, fa0
+; RV64-NEXT: flw fa0, 32(sp) # 8-byte Folded Reload
+; RV64-NEXT: mv a0, s0
+; RV64-NEXT: call __powisf2
+; RV64-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
+; RV64-NEXT: vfmv.v.f v8, fa0
+; RV64-NEXT: vfslide1down.vf v8, v8, fs0
+; RV64-NEXT: csrr a0, vlenb
+; RV64-NEXT: add sp, sp, a0
+; RV64-NEXT: .cfi_def_cfa sp, 64
+; RV64-NEXT: ld ra, 56(sp) # 8-byte Folded Reload
+; RV64-NEXT: ld s0, 48(sp) # 8-byte Folded Reload
+; RV64-NEXT: fld fs0, 40(sp) # 8-byte Folded Reload
+; RV64-NEXT: .cfi_restore ra
+; RV64-NEXT: .cfi_restore s0
+; RV64-NEXT: .cfi_restore fs0
+; RV64-NEXT: addi sp, sp, 64
+; RV64-NEXT: .cfi_def_cfa_offset 0
+; RV64-NEXT: ret
+ %a = call <2 x float> @llvm.powi.v2f32.i32(<2 x float> %x, i32 %y)
+ ret <2 x float> %a
+}
+declare <2 x float> @llvm.powi.v2f32.i32(<2 x float>, i32)
+
+define <3 x float> @powi_v3f32(<3 x float> %x, i32 %y) {
+; RV32-LABEL: powi_v3f32:
+; RV32: # %bb.0:
+; RV32-NEXT: addi sp, sp, -32
+; RV32-NEXT: .cfi_def_cfa_offset 32
+; RV32-NEXT: sw ra, 28(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s0, 24(sp) # 4-byte Folded Spill
+; RV32-NEXT: fsd fs0, 16(sp) # 8-byte Folded Spill
+; RV32-NEXT: .cfi_offset ra, -4
+; RV32-NEXT: .cfi_offset s0, -8
+; RV32-NEXT: .cfi_offset fs0, -16
+; RV32-NEXT: csrr a1, vlenb
+; RV32-NEXT: slli a1, a1, 1
+; RV32-NEXT: sub sp, sp, a1
+; RV32-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x20, 0x22, 0x11, 0x02, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 32 + 2 * vlenb
+; RV32-NEXT: mv s0, a0
+; RV32-NEXT: csrr a1, vlenb
+; RV32-NEXT: add a1, sp, a1
+; RV32-NEXT: addi a1, a1, 16
+; RV32-NEXT: vs1r.v v8, (a1) # Unknown-size Folded Spill
+; RV32-NEXT: vsetivli zero, 1, e32, m1, ta, ma
+; RV32-NEXT: vslidedown.vi v9, v8, 1
+; RV32-NEXT: vfmv.f.s fa0, v9
+; RV32-NEXT: call __powisf2
+; RV32-NEXT: fmv.s fs0, fa0
+; RV32-NEXT: csrr a0, vlenb
+; RV32-NEXT: add a0, sp, a0
+; RV32-NEXT: flw fa0, 16(a0) # 8-byte Folded Reload
+; RV32-NEXT: mv a0, s0
+; RV32-NEXT: call __powisf2
+; RV32-NEXT: vsetivli zero, 4, e32, m1, ta, ma
+; RV32-NEXT: vfmv.v.f v8, fa0
+; RV32-NEXT: vfslide1down.vf v8, v8, fs0
+; RV32-NEXT: addi a0, sp, 16
+; RV32-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
+; RV32-NEXT: csrr a0, vlenb
+; RV32-NEXT: add a0, sp, a0
+; RV32-NEXT: addi a0, a0, 16
+; RV32-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
+; RV32-NEXT: vslidedown.vi v8, v8, 2
+; RV32-NEXT: vfmv.f.s fa0, v8
+; RV32-NEXT: mv a0, s0
+; RV32-NEXT: call __powisf2
+; RV32-NEXT: addi a0, sp, 16
+; RV32-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
+; RV32-NEXT: vsetivli zero, 4, e32, m1, ta, ma
+; RV32-NEXT: vfslide1down.vf v8, v8, fa0
+; RV32-NEXT: vslidedown.vi v8, v8, 1
+; RV32-NEXT: csrr a0, vlenb
+; RV32-NEXT: slli a0, a0, 1
+; RV32-NEXT: add sp, sp, a0
+; RV32-NEXT: .cfi_def_cfa sp, 32
+; RV32-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
+; RV32-NEXT: lw s0, 24(sp) # 4-byte Folded Reload
+; RV32-NEXT: fld fs0, 16(sp) # 8-byte Folded Reload
+; RV32-NEXT: .cfi_restore ra
+; RV32-NEXT: .cfi_restore s0
+; RV32-NEXT: .cfi_restore fs0
+; RV32-NEXT: addi sp, sp, 32
+; RV32-NEXT: .cfi_def_cfa_offset 0
+; RV32-NEXT: ret
+;
+; RV64-LABEL: powi_v3f32:
+; RV64: # %bb.0:
+; RV64-NEXT: addi sp, sp, -64
+; RV64-NEXT: .cfi_def_cfa_offset 64
+; R...
[truncated]
|
// In some backends, such as RISCV64 and LoongArch64, the i32 type is | ||
// illegal and is promoted by previous process. For such cases, the | ||
// exponent actually matches with sizeof(int) and a libcall should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So DAG.getLibInfo().getIntSize() should be 8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would it be 8? getIntSize is in bits and int
on RISCV64 is 32 bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume getIntSize() would refer to the legalized parameter type for a libcall using int. Is there a second version for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I know of.
I don't think we should let the vector powi get thisfar . The promotion of a libcall argument is the responsibility of call lowering and the information in TargetLowering::MakeLibCallOptions. We shouldn't promote and then try to fix it.
if (ExponentNode->getOpcode() == ISD::SIGN_EXTEND_INREG || | ||
ExponentNode->getOpcode() == ISD::AssertSext || | ||
ExponentNode->getOpcode() == ISD::AssertZext) { | ||
EVT InnerType = cast<VTSDNode>(ExponentNode->getOperand(1))->getVT(); | ||
ExponentHasSizeOfInt = LibIntSize == InnerType.getSizeInBits(); | ||
} else if (ISD::isExtOpcode(ExponentNode->getOpcode())) { | ||
ExponentHasSizeOfInt = | ||
LibIntSize == | ||
ExponentNode->getOperand(0).getValueType().getSizeInBits(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code should not be trying to look through extensions (nor SIGN_EXTEND_INREG, or the Asserts*. These aren't really extensions).
I'd expect this to just insert the sext to match the libcall integer type
I kind of think this should be handled by unrolling the vector in This should work
|
Thanks. Your approach is simpler and more reasonable. I will modify it to adopt this solution. |
For nodes such as `ISD::FPOWI`, `ISD::FLDEXP`, if the first operand is a vector operand, since the corresponding library functions do not have vector-type signatures, the vector will be unroll during the type legalization, without promoting the second operand.
llvm.powi.*
code gen for vector argumentsllvm.powi/ldexp.*
code gen for vector arguments
if (N->getValueType(0).isVector()) | ||
return DAG.UnrollVectorOp(N); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably shouldn't be the first thing tried. If a vector libcall happens to be available, that would be preferable. Can you move this down before the scalar call is introduced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will move it into this if
statement to ensure that it won't promote the second operand. Do you think it's reasonable?
if (LC == RTLIB::UNKNOWN_LIBCALL || !TLI.getLibcallName(LC)) {
if (N->getValueType(0).isVector())
return DAG.UnrollVectorOp(N);
SmallVector<SDValue, 3> NewOps(N->ops());
NewOps[1 + OpOffset] = SExtPromotedInteger(N->getOperand(1 + OpOffset));
return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);
}
✅ With the latest revision this PR passed the C/C++ code formatter. |
if (N->getValueType(0).isVector()) | ||
return DAG.UnrollVectorOp(N); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole function is actually the wrong place to split the vector (see how there are no other UnrollVectorOps uses in DAGTypeLegalizer). The description also says your problem is when the libcall is used, so you'd want to change the other path? Does the libcall emission below need to directly handle the vector case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not entirely sure I understand what you mean. Could you please clarify?
Are you referring to the approach used in my first commit 7f5a128 ? If so, in that commit, you mentioned not to check for the SIGN_EXTEND_INREG
node, but this node is generated in this if
statement here.(SExtPromotedInteger
will generate a SIGN_EXTEND_INREG
node).
if (LC == RTLIB::UNKNOWN_LIBCALL || !TLI.getLibcallName(LC)) {
SmallVector<SDValue, 3> NewOps(N->ops());
NewOps[1 + OpOffset] = SExtPromotedInteger(N->getOperand(1 + OpOffset));
return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);
}
Other info:
- RISCV64 and Loongarch64 will enter function
PromoteIntOp_ExpOp
and generate aSIGN_EXTEND_INREG
node, soExponentHasSizeOfInt=false
(32 != 64) and report an error.
bool ExponentHasSizeOfInt =
DAG.getLibInfo().getIntSize() ==
Node->getOperand(1 + Offset).getValueType().getSizeInBits();
Additionally, AArch64 and X86 will get ExponentHasSizeOfInt=true
, because in these backends, i32 is valid and was not promoted to SIGN_EXTEND_INREG
(i64) previously. So, they generate right code sequences.
- In the entire LLVM, only
fpowi
andfldexp
will call thePromoteIntOp_ExpOp
function.(Flow:llvm::DAGTypeLegalizer::PromoteIntegerOperand
->llvm::DAGTypeLegalizer::PromoteIntOp_ExpOp
)
case ISD::FPOWI:
case ISD::STRICT_FPOWI:
case ISD::FLDEXP:
case ISD::STRICT_FLDEXP: Res = PromoteIntOp_ExpOp(N); break;
If not, could you provide more specific modification suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole function is actually the wrong place to split the vector (see how there are no other UnrollVectorOps uses in DAGTypeLegalizer). The description also says your problem is when the libcall is used, so you'd want to change the other path? Does the libcall emission below need to directly handle the vector case?
@topperc what are your thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see how there are no other UnrollVectorOps uses in DAGTypeLegalizer
There are calls to UnrollVectorOps
in several places in LegalizeVectorTypes.cpp
This comment where the libcall below is created seems relevant
// We can't just promote the exponent type in FPOWI, since we want to lower
// the node to a libcall and we if we promote to a type larger than
// sizeof(int) the libcall might not be according to the targets ABI.
My suggestion to unroll here was so that we wouldn't promote past sizeof(int). If we wait until LegalizeDAG.cpp to unroll the operation, the damage to the integer type has already been done. For RISC-V its harmless because signed int
is supposed to be passed sign extended to 64-bits according to the ABI.
Hypothetically, if powi took an unsigned int
as an argument, then type legalization would use zero extend, but the RISC-V ABI wants unsigned int
to be passed sign extended. So LegalizeDAG would need to insert a SIGN_EXTEND_INREG to fix. I guess it would need to use the getIntSize() and shouldSignExtendTypeInLibCall to know what it needs to do in that case.
If we don't unroll here I guess the best fix in LegalizeDAG would also be to use getIntSize()
and shouldSignExtendTypeInLibCall
and use computeNumSignBits to know if a SIGN_EXTEND_INREG needs to be inserted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes more sense to me to handle this when emitting the call, where ABI constraints would naturally be handled. This can be sign extended here, and the call emission can truncate or sext_inreg as required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes more sense to me to handle this when emitting the call, where ABI constraints would naturally be handled. This can be sign extended here, and the call emission can truncate or sext_inreg as required
The POWI code in LegalizeDAG calls SelectionDAGLegalize::ExpandFPLibCall
which will call TargetLowering::makeLibCall
using the promoted type. makeLibCall
calls getTypeForEVT
which will return i64
due to the promotion. That's what will be used by calling lowering, but that's the wrong type for it to do the right thing. We need to get an i32 Type* into call lowering, but we no longer have it. We'll need to call getIntSize() to get the size and pass it along somehow. That requires refactoring several interfaces or adding new ones. Not sure if call lowering would also expect the SDValue for the argument to have i32 type.
My unrolling proposal avoided that by scalarizing it and letting the newly created scalar powi calls get converted to libcalls while we're still in type legalization. That's how we currently handle the scalar powi case. Are you also suggesting we should move the scalar handling to LegalizeDAG as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makeLibCall should really be looking at the signature of the underlying call, but RuntimeLibcalls currently does not record this information. TargetLibraryInfo does, which is separate for some reason. This keeps coming up as a problem, these really need to be merged in some way (cc @jhuber6).
Taking the type from the DAG node isn't strictly correct, it's just where we've ended up. This came up recently for the special case in ExpandFPLibCall to sign extend the integer argument for FLDEXP.
Practically speaking, I don't think any targets will have a vector powi implementation (I at least don't see any in RuntimeLibcalls), so unrolling works out. I guess this could get a fixme and go ahead for now, but it's still a hack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed you are using vector ldexp with a scalar i32 argument. I'm surprised this is considered valid IR, and works as well as it does. It wasn't my intention when I added the intrinsic to handle that case. I expected the number of elements of both arguments would match. Do you only have this issue when using the implicit splat behavior of the scalar operand? powi is weird because it only takes the scalar argument, unlike ldexp.
I suppose we could support the scalar second argument for ldexp, but I'm not sure we handle implicit splats like that in any other operation.
I didn't notice earlier that the second argument of ldexp can accept vector arguments. In fact, |
llvm.powi/ldexp.*
code gen for vector argumentsllvm.powi.*
code gen for vector arguments
ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/81/builds/3208 Here is the relevant piece of the build log for the reference
|
Scalarize vector FPOWI instead of promoting the type. This allows the scalar FPOWIs to be visited and converted to libcalls before promoting the type.
FIXME: This should be done in LegalizeVectorOps/LegalizeDAG, but call lowering needs the unpromoted EVT.
Without this patch, in some backends, such as RISCV64 and LoongArch64, the i32 type is illegal and will be promoted. This causes exponent type check to fail when ISD::FPOWI node generates a libcall.
Fix #118079