[llvm][CodeGen] Intrinsic `llvm.powi.*` code gen for vector arguments #118242

ylzsx · 2024-12-02T02:06:35Z

Scalarize vector FPOWI instead of promoting the type. This allows the scalar FPOWIs to be visited and converted to libcalls before promoting the type.

FIXME: This should be done in LegalizeVectorOps/LegalizeDAG, but call lowering needs the unpromoted EVT.

Without this patch, in some backends, such as RISCV64 and LoongArch64, the i32 type is illegal and will be promoted. This causes exponent type check to fail when ISD::FPOWI node generates a libcall.

Fix #118079

In some backends, the i32 type is illegal and will be promoted. This causes exponent type check to fail when ISD::FOWI node generates a libcall.

llvmbot · 2024-12-02T02:07:13Z

@llvm/pr-subscribers-backend-loongarch

Author: Zhaoxin Yang (ylzsx)

Changes

In some backends, the i32 type is illegal and will be promoted. This causes exponent type check to fail when ISD::FOWI node generates a libcall.

Fix #118079

Patch is 66.95 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118242.diff

4 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (+18)
(added) llvm/test/CodeGen/LoongArch/lasx/intrinsic-fpowi.ll (+142)
(added) llvm/test/CodeGen/LoongArch/lsx/intrinsic-fpowi.ll (+88)
(added) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpowi.ll (+1427)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index 63536336e96228..2829bbaef83100 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -4648,6 +4648,24 @@ void SelectionDAGLegalize::ConvertNodeToLibcall(SDNode *Node) {
     bool ExponentHasSizeOfInt =
         DAG.getLibInfo().getIntSize() ==
         Node->getOperand(1 + Offset).getValueType().getSizeInBits();
+    if (!ExponentHasSizeOfInt) {
+      // In some backends, such as RISCV64 and LoongArch64, the i32 type is
+      // illegal and is promoted by previous process. For such cases, the
+      // exponent actually matches with sizeof(int) and a libcall should be
+      // generated.
+      SDNode *ExponentNode = Node->getOperand(1 + Offset).getNode();
+      unsigned LibIntSize = DAG.getLibInfo().getIntSize();
+      if (ExponentNode->getOpcode() == ISD::SIGN_EXTEND_INREG ||
+          ExponentNode->getOpcode() == ISD::AssertSext ||
+          ExponentNode->getOpcode() == ISD::AssertZext) {
+        EVT InnerType = cast<VTSDNode>(ExponentNode->getOperand(1))->getVT();
+        ExponentHasSizeOfInt = LibIntSize == InnerType.getSizeInBits();
+      } else if (ISD::isExtOpcode(ExponentNode->getOpcode())) {
+        ExponentHasSizeOfInt =
+            LibIntSize ==
+            ExponentNode->getOperand(0).getValueType().getSizeInBits();
+      }
+    }
     if (!ExponentHasSizeOfInt) {
       // If the exponent does not match with sizeof(int) a libcall to
       // RTLIB::POWI would use the wrong type for the argument.
diff --git a/llvm/test/CodeGen/LoongArch/lasx/intrinsic-fpowi.ll b/llvm/test/CodeGen/LoongArch/lasx/intrinsic-fpowi.ll
new file mode 100644
index 00000000000000..f6b14a9bb000fd
--- /dev/null
+++ b/llvm/test/CodeGen/LoongArch/lasx/intrinsic-fpowi.ll
@@ -0,0 +1,142 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc --mtriple=loongarch64 --mattr=+lasx < %s | FileCheck %s
+
+declare <8 x float> @llvm.powi.v8f32.i32(<8 x float>, i32)
+
+define <8 x float> @powi_v8f32(<8 x float> %va, i32 %b) nounwind {
+; CHECK-LABEL: powi_v8f32:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addi.d $sp, $sp, -80
+; CHECK-NEXT:    st.d $ra, $sp, 72 # 8-byte Folded Spill
+; CHECK-NEXT:    st.d $fp, $sp, 64 # 8-byte Folded Spill
+; CHECK-NEXT:    xvst $xr0, $sp, 0 # 32-byte Folded Spill
+; CHECK-NEXT:    addi.w $fp, $a0, 0
+; CHECK-NEXT:    xvpickve2gr.w $a0, $xr0, 0
+; CHECK-NEXT:    movgr2fr.w $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    xvinsgr2vr.w $xr0, $a0, 0
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.w $a0, $xr0, 1
+; CHECK-NEXT:    movgr2fr.w $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.w $xr0, $a0, 1
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.w $a0, $xr0, 2
+; CHECK-NEXT:    movgr2fr.w $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.w $xr0, $a0, 2
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.w $a0, $xr0, 3
+; CHECK-NEXT:    movgr2fr.w $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.w $xr0, $a0, 3
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.w $a0, $xr0, 4
+; CHECK-NEXT:    movgr2fr.w $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.w $xr0, $a0, 4
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.w $a0, $xr0, 5
+; CHECK-NEXT:    movgr2fr.w $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.w $xr0, $a0, 5
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.w $a0, $xr0, 6
+; CHECK-NEXT:    movgr2fr.w $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.w $xr0, $a0, 6
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.w $a0, $xr0, 7
+; CHECK-NEXT:    movgr2fr.w $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.w $xr0, $a0, 7
+; CHECK-NEXT:    ld.d $fp, $sp, 64 # 8-byte Folded Reload
+; CHECK-NEXT:    ld.d $ra, $sp, 72 # 8-byte Folded Reload
+; CHECK-NEXT:    addi.d $sp, $sp, 80
+; CHECK-NEXT:    ret
+entry:
+  %res = call <8 x float> @llvm.powi.v8f32.i32(<8 x float> %va, i32 %b)
+  ret <8 x float> %res
+}
+
+declare <4 x double> @llvm.powi.v4f64.i32(<4 x double>, i32)
+
+define <4 x double> @powi_v4f64(<4 x double> %va, i32 %b) nounwind {
+; CHECK-LABEL: powi_v4f64:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addi.d $sp, $sp, -80
+; CHECK-NEXT:    st.d $ra, $sp, 72 # 8-byte Folded Spill
+; CHECK-NEXT:    st.d $fp, $sp, 64 # 8-byte Folded Spill
+; CHECK-NEXT:    xvst $xr0, $sp, 0 # 32-byte Folded Spill
+; CHECK-NEXT:    addi.w $fp, $a0, 0
+; CHECK-NEXT:    xvpickve2gr.d $a0, $xr0, 0
+; CHECK-NEXT:    movgr2fr.d $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powidf2)
+; CHECK-NEXT:    movfr2gr.d $a0, $fa0
+; CHECK-NEXT:    xvinsgr2vr.d $xr0, $a0, 0
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.d $a0, $xr0, 1
+; CHECK-NEXT:    movgr2fr.d $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powidf2)
+; CHECK-NEXT:    movfr2gr.d $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.d $xr0, $a0, 1
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.d $a0, $xr0, 2
+; CHECK-NEXT:    movgr2fr.d $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powidf2)
+; CHECK-NEXT:    movfr2gr.d $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.d $xr0, $a0, 2
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.d $a0, $xr0, 3
+; CHECK-NEXT:    movgr2fr.d $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powidf2)
+; CHECK-NEXT:    movfr2gr.d $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.d $xr0, $a0, 3
+; CHECK-NEXT:    ld.d $fp, $sp, 64 # 8-byte Folded Reload
+; CHECK-NEXT:    ld.d $ra, $sp, 72 # 8-byte Folded Reload
+; CHECK-NEXT:    addi.d $sp, $sp, 80
+; CHECK-NEXT:    ret
+entry:
+  %res = call <4 x double> @llvm.powi.v4f64.i32(<4 x double> %va, i32 %b)
+  ret <4 x double> %res
+}
diff --git a/llvm/test/CodeGen/LoongArch/lsx/intrinsic-fpowi.ll b/llvm/test/CodeGen/LoongArch/lsx/intrinsic-fpowi.ll
new file mode 100644
index 00000000000000..b0f54e78c7a442
--- /dev/null
+++ b/llvm/test/CodeGen/LoongArch/lsx/intrinsic-fpowi.ll
@@ -0,0 +1,88 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc --mtriple=loongarch64 --mattr=+lsx < %s | FileCheck %s
+
+declare <4 x float> @llvm.powi.v4f32.i32(<4 x float>, i32)
+
+define <4 x float> @powi_v4f32(<4 x float> %va, i32 %b) nounwind {
+; CHECK-LABEL: powi_v4f32:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addi.d $sp, $sp, -48
+; CHECK-NEXT:    st.d $ra, $sp, 40 # 8-byte Folded Spill
+; CHECK-NEXT:    st.d $fp, $sp, 32 # 8-byte Folded Spill
+; CHECK-NEXT:    vst $vr0, $sp, 0 # 16-byte Folded Spill
+; CHECK-NEXT:    addi.w $fp, $a0, 0
+; CHECK-NEXT:    vreplvei.w $vr0, $vr0, 0
+; CHECK-NEXT:    # kill: def $f0 killed $f0 killed $vr0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    vinsgr2vr.w $vr0, $a0, 0
+; CHECK-NEXT:    vst $vr0, $sp, 16 # 16-byte Folded Spill
+; CHECK-NEXT:    vld $vr0, $sp, 0 # 16-byte Folded Reload
+; CHECK-NEXT:    vreplvei.w $vr0, $vr0, 1
+; CHECK-NEXT:    # kill: def $f0 killed $f0 killed $vr0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    vld $vr0, $sp, 16 # 16-byte Folded Reload
+; CHECK-NEXT:    vinsgr2vr.w $vr0, $a0, 1
+; CHECK-NEXT:    vst $vr0, $sp, 16 # 16-byte Folded Spill
+; CHECK-NEXT:    vld $vr0, $sp, 0 # 16-byte Folded Reload
+; CHECK-NEXT:    vreplvei.w $vr0, $vr0, 2
+; CHECK-NEXT:    # kill: def $f0 killed $f0 killed $vr0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    vld $vr0, $sp, 16 # 16-byte Folded Reload
+; CHECK-NEXT:    vinsgr2vr.w $vr0, $a0, 2
+; CHECK-NEXT:    vst $vr0, $sp, 16 # 16-byte Folded Spill
+; CHECK-NEXT:    vld $vr0, $sp, 0 # 16-byte Folded Reload
+; CHECK-NEXT:    vreplvei.w $vr0, $vr0, 3
+; CHECK-NEXT:    # kill: def $f0 killed $f0 killed $vr0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    vld $vr0, $sp, 16 # 16-byte Folded Reload
+; CHECK-NEXT:    vinsgr2vr.w $vr0, $a0, 3
+; CHECK-NEXT:    ld.d $fp, $sp, 32 # 8-byte Folded Reload
+; CHECK-NEXT:    ld.d $ra, $sp, 40 # 8-byte Folded Reload
+; CHECK-NEXT:    addi.d $sp, $sp, 48
+; CHECK-NEXT:    ret
+entry:
+  %res = call <4 x float> @llvm.powi.v4f32.i32(<4 x float> %va, i32 %b)
+  ret <4 x float> %res
+}
+
+declare <2 x double> @llvm.powi.v2f64.i32(<2 x double>, i32)
+
+define <2 x double> @powi_v2f64(<2 x double> %va, i32 %b) nounwind {
+; CHECK-LABEL: powi_v2f64:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addi.d $sp, $sp, -48
+; CHECK-NEXT:    st.d $ra, $sp, 40 # 8-byte Folded Spill
+; CHECK-NEXT:    st.d $fp, $sp, 32 # 8-byte Folded Spill
+; CHECK-NEXT:    vst $vr0, $sp, 0 # 16-byte Folded Spill
+; CHECK-NEXT:    addi.w $fp, $a0, 0
+; CHECK-NEXT:    vreplvei.d $vr0, $vr0, 0
+; CHECK-NEXT:    # kill: def $f0_64 killed $f0_64 killed $vr0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powidf2)
+; CHECK-NEXT:    movfr2gr.d $a0, $fa0
+; CHECK-NEXT:    vinsgr2vr.d $vr0, $a0, 0
+; CHECK-NEXT:    vst $vr0, $sp, 16 # 16-byte Folded Spill
+; CHECK-NEXT:    vld $vr0, $sp, 0 # 16-byte Folded Reload
+; CHECK-NEXT:    vreplvei.d $vr0, $vr0, 1
+; CHECK-NEXT:    # kill: def $f0_64 killed $f0_64 killed $vr0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powidf2)
+; CHECK-NEXT:    movfr2gr.d $a0, $fa0
+; CHECK-NEXT:    vld $vr0, $sp, 16 # 16-byte Folded Reload
+; CHECK-NEXT:    vinsgr2vr.d $vr0, $a0, 1
+; CHECK-NEXT:    ld.d $fp, $sp, 32 # 8-byte Folded Reload
+; CHECK-NEXT:    ld.d $ra, $sp, 40 # 8-byte Folded Reload
+; CHECK-NEXT:    addi.d $sp, $sp, 48
+; CHECK-NEXT:    ret
+entry:
+  %res = call <2 x double> @llvm.powi.v2f64.i32(<2 x double> %va, i32 %b)
+  ret <2 x double> %res
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpowi.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpowi.ll
new file mode 100644
index 00000000000000..d99feb5fdd921c
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpowi.ll
@@ -0,0 +1,1427 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv32 -mattr=+v,+f,+d -target-abi=ilp32d -verify-machineinstrs < %s \
+; RUN:   | FileCheck %s --check-prefix=RV32
+; RUN: llc -mtriple=riscv64 -mattr=+v,+f,+d -target-abi=lp64d -verify-machineinstrs < %s \
+; RUN:   | FileCheck %s --check-prefix=RV64
+
+define <1 x float> @powi_v1f32(<1 x float> %x, i32 %y) {
+; RV32-LABEL: powi_v1f32:
+; RV32:       # %bb.0:
+; RV32-NEXT:    addi sp, sp, -16
+; RV32-NEXT:    .cfi_def_cfa_offset 16
+; RV32-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; RV32-NEXT:    .cfi_offset ra, -4
+; RV32-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
+; RV32-NEXT:    vfmv.f.s fa0, v8
+; RV32-NEXT:    call __powisf2
+; RV32-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
+; RV32-NEXT:    vfmv.s.f v8, fa0
+; RV32-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; RV32-NEXT:    .cfi_restore ra
+; RV32-NEXT:    addi sp, sp, 16
+; RV32-NEXT:    .cfi_def_cfa_offset 0
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: powi_v1f32:
+; RV64:       # %bb.0:
+; RV64-NEXT:    addi sp, sp, -16
+; RV64-NEXT:    .cfi_def_cfa_offset 16
+; RV64-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64-NEXT:    .cfi_offset ra, -8
+; RV64-NEXT:    sext.w a0, a0
+; RV64-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
+; RV64-NEXT:    vfmv.f.s fa0, v8
+; RV64-NEXT:    call __powisf2
+; RV64-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
+; RV64-NEXT:    vfmv.s.f v8, fa0
+; RV64-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64-NEXT:    .cfi_restore ra
+; RV64-NEXT:    addi sp, sp, 16
+; RV64-NEXT:    .cfi_def_cfa_offset 0
+; RV64-NEXT:    ret
+  %a = call <1 x float> @llvm.powi.v1f32.i32(<1 x float> %x, i32 %y)
+  ret <1 x float> %a
+}
+declare <1 x float> @llvm.powi.v1f32.i32(<1 x float>, i32)
+
+define <2 x float> @powi_v2f32(<2 x float> %x, i32 %y) {
+; RV32-LABEL: powi_v2f32:
+; RV32:       # %bb.0:
+; RV32-NEXT:    addi sp, sp, -32
+; RV32-NEXT:    .cfi_def_cfa_offset 32
+; RV32-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32-NEXT:    fsd fs0, 16(sp) # 8-byte Folded Spill
+; RV32-NEXT:    .cfi_offset ra, -4
+; RV32-NEXT:    .cfi_offset s0, -8
+; RV32-NEXT:    .cfi_offset fs0, -16
+; RV32-NEXT:    csrr a1, vlenb
+; RV32-NEXT:    sub sp, sp, a1
+; RV32-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x20, 0x22, 0x11, 0x01, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 32 + 1 * vlenb
+; RV32-NEXT:    mv s0, a0
+; RV32-NEXT:    addi a1, sp, 16
+; RV32-NEXT:    vs1r.v v8, (a1) # Unknown-size Folded Spill
+; RV32-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; RV32-NEXT:    vslidedown.vi v9, v8, 1
+; RV32-NEXT:    vfmv.f.s fa0, v9
+; RV32-NEXT:    call __powisf2
+; RV32-NEXT:    fmv.s fs0, fa0
+; RV32-NEXT:    flw fa0, 16(sp) # 8-byte Folded Reload
+; RV32-NEXT:    mv a0, s0
+; RV32-NEXT:    call __powisf2
+; RV32-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; RV32-NEXT:    vfmv.v.f v8, fa0
+; RV32-NEXT:    vfslide1down.vf v8, v8, fs0
+; RV32-NEXT:    csrr a0, vlenb
+; RV32-NEXT:    add sp, sp, a0
+; RV32-NEXT:    .cfi_def_cfa sp, 32
+; RV32-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32-NEXT:    fld fs0, 16(sp) # 8-byte Folded Reload
+; RV32-NEXT:    .cfi_restore ra
+; RV32-NEXT:    .cfi_restore s0
+; RV32-NEXT:    .cfi_restore fs0
+; RV32-NEXT:    addi sp, sp, 32
+; RV32-NEXT:    .cfi_def_cfa_offset 0
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: powi_v2f32:
+; RV64:       # %bb.0:
+; RV64-NEXT:    addi sp, sp, -64
+; RV64-NEXT:    .cfi_def_cfa_offset 64
+; RV64-NEXT:    sd ra, 56(sp) # 8-byte Folded Spill
+; RV64-NEXT:    sd s0, 48(sp) # 8-byte Folded Spill
+; RV64-NEXT:    fsd fs0, 40(sp) # 8-byte Folded Spill
+; RV64-NEXT:    .cfi_offset ra, -8
+; RV64-NEXT:    .cfi_offset s0, -16
+; RV64-NEXT:    .cfi_offset fs0, -24
+; RV64-NEXT:    csrr a1, vlenb
+; RV64-NEXT:    sub sp, sp, a1
+; RV64-NEXT:    .cfi_escape 0x0f, 0x0e, 0x72, 0x00, 0x11, 0xc0, 0x00, 0x22, 0x11, 0x01, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 64 + 1 * vlenb
+; RV64-NEXT:    addi a1, sp, 32
+; RV64-NEXT:    vs1r.v v8, (a1) # Unknown-size Folded Spill
+; RV64-NEXT:    sext.w s0, a0
+; RV64-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; RV64-NEXT:    vslidedown.vi v9, v8, 1
+; RV64-NEXT:    vfmv.f.s fa0, v9
+; RV64-NEXT:    mv a0, s0
+; RV64-NEXT:    call __powisf2
+; RV64-NEXT:    fmv.s fs0, fa0
+; RV64-NEXT:    flw fa0, 32(sp) # 8-byte Folded Reload
+; RV64-NEXT:    mv a0, s0
+; RV64-NEXT:    call __powisf2
+; RV64-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; RV64-NEXT:    vfmv.v.f v8, fa0
+; RV64-NEXT:    vfslide1down.vf v8, v8, fs0
+; RV64-NEXT:    csrr a0, vlenb
+; RV64-NEXT:    add sp, sp, a0
+; RV64-NEXT:    .cfi_def_cfa sp, 64
+; RV64-NEXT:    ld ra, 56(sp) # 8-byte Folded Reload
+; RV64-NEXT:    ld s0, 48(sp) # 8-byte Folded Reload
+; RV64-NEXT:    fld fs0, 40(sp) # 8-byte Folded Reload
+; RV64-NEXT:    .cfi_restore ra
+; RV64-NEXT:    .cfi_restore s0
+; RV64-NEXT:    .cfi_restore fs0
+; RV64-NEXT:    addi sp, sp, 64
+; RV64-NEXT:    .cfi_def_cfa_offset 0
+; RV64-NEXT:    ret
+  %a = call <2 x float> @llvm.powi.v2f32.i32(<2 x float> %x, i32 %y)
+  ret <2 x float> %a
+}
+declare <2 x float> @llvm.powi.v2f32.i32(<2 x float>, i32)
+
+define <3 x float> @powi_v3f32(<3 x float> %x, i32 %y) {
+; RV32-LABEL: powi_v3f32:
+; RV32:       # %bb.0:
+; RV32-NEXT:    addi sp, sp, -32
+; RV32-NEXT:    .cfi_def_cfa_offset 32
+; RV32-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32-NEXT:    fsd fs0, 16(sp) # 8-byte Folded Spill
+; RV32-NEXT:    .cfi_offset ra, -4
+; RV32-NEXT:    .cfi_offset s0, -8
+; RV32-NEXT:    .cfi_offset fs0, -16
+; RV32-NEXT:    csrr a1, vlenb
+; RV32-NEXT:    slli a1, a1, 1
+; RV32-NEXT:    sub sp, sp, a1
+; RV32-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x20, 0x22, 0x11, 0x02, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 32 + 2 * vlenb
+; RV32-NEXT:    mv s0, a0
+; RV32-NEXT:    csrr a1, vlenb
+; RV32-NEXT:    add a1, sp, a1
+; RV32-NEXT:    addi a1, a1, 16
+; RV32-NEXT:    vs1r.v v8, (a1) # Unknown-size Folded Spill
+; RV32-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
+; RV32-NEXT:    vslidedown.vi v9, v8, 1
+; RV32-NEXT:    vfmv.f.s fa0, v9
+; RV32-NEXT:    call __powisf2
+; RV32-NEXT:    fmv.s fs0, fa0
+; RV32-NEXT:    csrr a0, vlenb
+; RV32-NEXT:    add a0, sp, a0
+; RV32-NEXT:    flw fa0, 16(a0) # 8-byte Folded Reload
+; RV32-NEXT:    mv a0, s0
+; RV32-NEXT:    call __powisf2
+; RV32-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; RV32-NEXT:    vfmv.v.f v8, fa0
+; RV32-NEXT:    vfslide1down.vf v8, v8, fs0
+; RV32-NEXT:    addi a0, sp, 16
+; RV32-NEXT:    vs1r.v v8, (a0) # Unknown-size Folded Spill
+; RV32-NEXT:    csrr a0, vlenb
+; RV32-NEXT:    add a0, sp, a0
+; RV32-NEXT:    addi a0, a0, 16
+; RV32-NEXT:    vl1r.v v8, (a0) # Unknown-size Folded Reload
+; RV32-NEXT:    vslidedown.vi v8, v8, 2
+; RV32-NEXT:    vfmv.f.s fa0, v8
+; RV32-NEXT:    mv a0, s0
+; RV32-NEXT:    call __powisf2
+; RV32-NEXT:    addi a0, sp, 16
+; RV32-NEXT:    vl1r.v v8, (a0) # Unknown-size Folded Reload
+; RV32-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; RV32-NEXT:    vfslide1down.vf v8, v8, fa0
+; RV32-NEXT:    vslidedown.vi v8, v8, 1
+; RV32-NEXT:    csrr a0, vlenb
+; RV32-NEXT:    slli a0, a0, 1
+; RV32-NEXT:    add sp, sp, a0
+; RV32-NEXT:    .cfi_def_cfa sp, 32
+; RV32-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32-NEXT:    fld fs0, 16(sp) # 8-byte Folded Reload
+; RV32-NEXT:    .cfi_restore ra
+; RV32-NEXT:    .cfi_restore s0
+; RV32-NEXT:    .cfi_restore fs0
+; RV32-NEXT:    addi sp, sp, 32
+; RV32-NEXT:    .cfi_def_cfa_offset 0
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: powi_v3f32:
+; RV64:       # %bb.0:
+; RV64-NEXT:    addi sp, sp, -64
+; RV64-NEXT:    .cfi_def_cfa_offset 64
+; R...
[truncated]

llvmbot · 2024-12-02T02:07:14Z

@llvm/pr-subscribers-llvm-selectiondag

Author: Zhaoxin Yang (ylzsx)

Changes

In some backends, the i32 type is illegal and will be promoted. This causes exponent type check to fail when ISD::FOWI node generates a libcall.

Fix #118079

Patch is 66.95 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118242.diff

4 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (+18)
(added) llvm/test/CodeGen/LoongArch/lasx/intrinsic-fpowi.ll (+142)
(added) llvm/test/CodeGen/LoongArch/lsx/intrinsic-fpowi.ll (+88)
(added) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpowi.ll (+1427)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index 63536336e96228..2829bbaef83100 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -4648,6 +4648,24 @@ void SelectionDAGLegalize::ConvertNodeToLibcall(SDNode *Node) {
     bool ExponentHasSizeOfInt =
         DAG.getLibInfo().getIntSize() ==
         Node->getOperand(1 + Offset).getValueType().getSizeInBits();
+    if (!ExponentHasSizeOfInt) {
+      // In some backends, such as RISCV64 and LoongArch64, the i32 type is
+      // illegal and is promoted by previous process. For such cases, the
+      // exponent actually matches with sizeof(int) and a libcall should be
+      // generated.
+      SDNode *ExponentNode = Node->getOperand(1 + Offset).getNode();
+      unsigned LibIntSize = DAG.getLibInfo().getIntSize();
+      if (ExponentNode->getOpcode() == ISD::SIGN_EXTEND_INREG ||
+          ExponentNode->getOpcode() == ISD::AssertSext ||
+          ExponentNode->getOpcode() == ISD::AssertZext) {
+        EVT InnerType = cast<VTSDNode>(ExponentNode->getOperand(1))->getVT();
+        ExponentHasSizeOfInt = LibIntSize == InnerType.getSizeInBits();
+      } else if (ISD::isExtOpcode(ExponentNode->getOpcode())) {
+        ExponentHasSizeOfInt =
+            LibIntSize ==
+            ExponentNode->getOperand(0).getValueType().getSizeInBits();
+      }
+    }
     if (!ExponentHasSizeOfInt) {
       // If the exponent does not match with sizeof(int) a libcall to
       // RTLIB::POWI would use the wrong type for the argument.
diff --git a/llvm/test/CodeGen/LoongArch/lasx/intrinsic-fpowi.ll b/llvm/test/CodeGen/LoongArch/lasx/intrinsic-fpowi.ll
new file mode 100644
index 00000000000000..f6b14a9bb000fd
--- /dev/null
+++ b/llvm/test/CodeGen/LoongArch/lasx/intrinsic-fpowi.ll
@@ -0,0 +1,142 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc --mtriple=loongarch64 --mattr=+lasx < %s | FileCheck %s
+
+declare <8 x float> @llvm.powi.v8f32.i32(<8 x float>, i32)
+
+define <8 x float> @powi_v8f32(<8 x float> %va, i32 %b) nounwind {
+; CHECK-LABEL: powi_v8f32:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addi.d $sp, $sp, -80
+; CHECK-NEXT:    st.d $ra, $sp, 72 # 8-byte Folded Spill
+; CHECK-NEXT:    st.d $fp, $sp, 64 # 8-byte Folded Spill
+; CHECK-NEXT:    xvst $xr0, $sp, 0 # 32-byte Folded Spill
+; CHECK-NEXT:    addi.w $fp, $a0, 0
+; CHECK-NEXT:    xvpickve2gr.w $a0, $xr0, 0
+; CHECK-NEXT:    movgr2fr.w $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    xvinsgr2vr.w $xr0, $a0, 0
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.w $a0, $xr0, 1
+; CHECK-NEXT:    movgr2fr.w $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.w $xr0, $a0, 1
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.w $a0, $xr0, 2
+; CHECK-NEXT:    movgr2fr.w $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.w $xr0, $a0, 2
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.w $a0, $xr0, 3
+; CHECK-NEXT:    movgr2fr.w $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.w $xr0, $a0, 3
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.w $a0, $xr0, 4
+; CHECK-NEXT:    movgr2fr.w $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.w $xr0, $a0, 4
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.w $a0, $xr0, 5
+; CHECK-NEXT:    movgr2fr.w $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.w $xr0, $a0, 5
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.w $a0, $xr0, 6
+; CHECK-NEXT:    movgr2fr.w $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.w $xr0, $a0, 6
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.w $a0, $xr0, 7
+; CHECK-NEXT:    movgr2fr.w $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.w $xr0, $a0, 7
+; CHECK-NEXT:    ld.d $fp, $sp, 64 # 8-byte Folded Reload
+; CHECK-NEXT:    ld.d $ra, $sp, 72 # 8-byte Folded Reload
+; CHECK-NEXT:    addi.d $sp, $sp, 80
+; CHECK-NEXT:    ret
+entry:
+  %res = call <8 x float> @llvm.powi.v8f32.i32(<8 x float> %va, i32 %b)
+  ret <8 x float> %res
+}
+
+declare <4 x double> @llvm.powi.v4f64.i32(<4 x double>, i32)
+
+define <4 x double> @powi_v4f64(<4 x double> %va, i32 %b) nounwind {
+; CHECK-LABEL: powi_v4f64:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addi.d $sp, $sp, -80
+; CHECK-NEXT:    st.d $ra, $sp, 72 # 8-byte Folded Spill
+; CHECK-NEXT:    st.d $fp, $sp, 64 # 8-byte Folded Spill
+; CHECK-NEXT:    xvst $xr0, $sp, 0 # 32-byte Folded Spill
+; CHECK-NEXT:    addi.w $fp, $a0, 0
+; CHECK-NEXT:    xvpickve2gr.d $a0, $xr0, 0
+; CHECK-NEXT:    movgr2fr.d $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powidf2)
+; CHECK-NEXT:    movfr2gr.d $a0, $fa0
+; CHECK-NEXT:    xvinsgr2vr.d $xr0, $a0, 0
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.d $a0, $xr0, 1
+; CHECK-NEXT:    movgr2fr.d $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powidf2)
+; CHECK-NEXT:    movfr2gr.d $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.d $xr0, $a0, 1
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.d $a0, $xr0, 2
+; CHECK-NEXT:    movgr2fr.d $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powidf2)
+; CHECK-NEXT:    movfr2gr.d $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.d $xr0, $a0, 2
+; CHECK-NEXT:    xvst $xr0, $sp, 32 # 32-byte Folded Spill
+; CHECK-NEXT:    xvld $xr0, $sp, 0 # 32-byte Folded Reload
+; CHECK-NEXT:    xvpickve2gr.d $a0, $xr0, 3
+; CHECK-NEXT:    movgr2fr.d $fa0, $a0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powidf2)
+; CHECK-NEXT:    movfr2gr.d $a0, $fa0
+; CHECK-NEXT:    xvld $xr0, $sp, 32 # 32-byte Folded Reload
+; CHECK-NEXT:    xvinsgr2vr.d $xr0, $a0, 3
+; CHECK-NEXT:    ld.d $fp, $sp, 64 # 8-byte Folded Reload
+; CHECK-NEXT:    ld.d $ra, $sp, 72 # 8-byte Folded Reload
+; CHECK-NEXT:    addi.d $sp, $sp, 80
+; CHECK-NEXT:    ret
+entry:
+  %res = call <4 x double> @llvm.powi.v4f64.i32(<4 x double> %va, i32 %b)
+  ret <4 x double> %res
+}
diff --git a/llvm/test/CodeGen/LoongArch/lsx/intrinsic-fpowi.ll b/llvm/test/CodeGen/LoongArch/lsx/intrinsic-fpowi.ll
new file mode 100644
index 00000000000000..b0f54e78c7a442
--- /dev/null
+++ b/llvm/test/CodeGen/LoongArch/lsx/intrinsic-fpowi.ll
@@ -0,0 +1,88 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc --mtriple=loongarch64 --mattr=+lsx < %s | FileCheck %s
+
+declare <4 x float> @llvm.powi.v4f32.i32(<4 x float>, i32)
+
+define <4 x float> @powi_v4f32(<4 x float> %va, i32 %b) nounwind {
+; CHECK-LABEL: powi_v4f32:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addi.d $sp, $sp, -48
+; CHECK-NEXT:    st.d $ra, $sp, 40 # 8-byte Folded Spill
+; CHECK-NEXT:    st.d $fp, $sp, 32 # 8-byte Folded Spill
+; CHECK-NEXT:    vst $vr0, $sp, 0 # 16-byte Folded Spill
+; CHECK-NEXT:    addi.w $fp, $a0, 0
+; CHECK-NEXT:    vreplvei.w $vr0, $vr0, 0
+; CHECK-NEXT:    # kill: def $f0 killed $f0 killed $vr0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    vinsgr2vr.w $vr0, $a0, 0
+; CHECK-NEXT:    vst $vr0, $sp, 16 # 16-byte Folded Spill
+; CHECK-NEXT:    vld $vr0, $sp, 0 # 16-byte Folded Reload
+; CHECK-NEXT:    vreplvei.w $vr0, $vr0, 1
+; CHECK-NEXT:    # kill: def $f0 killed $f0 killed $vr0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    vld $vr0, $sp, 16 # 16-byte Folded Reload
+; CHECK-NEXT:    vinsgr2vr.w $vr0, $a0, 1
+; CHECK-NEXT:    vst $vr0, $sp, 16 # 16-byte Folded Spill
+; CHECK-NEXT:    vld $vr0, $sp, 0 # 16-byte Folded Reload
+; CHECK-NEXT:    vreplvei.w $vr0, $vr0, 2
+; CHECK-NEXT:    # kill: def $f0 killed $f0 killed $vr0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    vld $vr0, $sp, 16 # 16-byte Folded Reload
+; CHECK-NEXT:    vinsgr2vr.w $vr0, $a0, 2
+; CHECK-NEXT:    vst $vr0, $sp, 16 # 16-byte Folded Spill
+; CHECK-NEXT:    vld $vr0, $sp, 0 # 16-byte Folded Reload
+; CHECK-NEXT:    vreplvei.w $vr0, $vr0, 3
+; CHECK-NEXT:    # kill: def $f0 killed $f0 killed $vr0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powisf2)
+; CHECK-NEXT:    movfr2gr.s $a0, $fa0
+; CHECK-NEXT:    vld $vr0, $sp, 16 # 16-byte Folded Reload
+; CHECK-NEXT:    vinsgr2vr.w $vr0, $a0, 3
+; CHECK-NEXT:    ld.d $fp, $sp, 32 # 8-byte Folded Reload
+; CHECK-NEXT:    ld.d $ra, $sp, 40 # 8-byte Folded Reload
+; CHECK-NEXT:    addi.d $sp, $sp, 48
+; CHECK-NEXT:    ret
+entry:
+  %res = call <4 x float> @llvm.powi.v4f32.i32(<4 x float> %va, i32 %b)
+  ret <4 x float> %res
+}
+
+declare <2 x double> @llvm.powi.v2f64.i32(<2 x double>, i32)
+
+define <2 x double> @powi_v2f64(<2 x double> %va, i32 %b) nounwind {
+; CHECK-LABEL: powi_v2f64:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addi.d $sp, $sp, -48
+; CHECK-NEXT:    st.d $ra, $sp, 40 # 8-byte Folded Spill
+; CHECK-NEXT:    st.d $fp, $sp, 32 # 8-byte Folded Spill
+; CHECK-NEXT:    vst $vr0, $sp, 0 # 16-byte Folded Spill
+; CHECK-NEXT:    addi.w $fp, $a0, 0
+; CHECK-NEXT:    vreplvei.d $vr0, $vr0, 0
+; CHECK-NEXT:    # kill: def $f0_64 killed $f0_64 killed $vr0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powidf2)
+; CHECK-NEXT:    movfr2gr.d $a0, $fa0
+; CHECK-NEXT:    vinsgr2vr.d $vr0, $a0, 0
+; CHECK-NEXT:    vst $vr0, $sp, 16 # 16-byte Folded Spill
+; CHECK-NEXT:    vld $vr0, $sp, 0 # 16-byte Folded Reload
+; CHECK-NEXT:    vreplvei.d $vr0, $vr0, 1
+; CHECK-NEXT:    # kill: def $f0_64 killed $f0_64 killed $vr0
+; CHECK-NEXT:    move $a0, $fp
+; CHECK-NEXT:    bl %plt(__powidf2)
+; CHECK-NEXT:    movfr2gr.d $a0, $fa0
+; CHECK-NEXT:    vld $vr0, $sp, 16 # 16-byte Folded Reload
+; CHECK-NEXT:    vinsgr2vr.d $vr0, $a0, 1
+; CHECK-NEXT:    ld.d $fp, $sp, 32 # 8-byte Folded Reload
+; CHECK-NEXT:    ld.d $ra, $sp, 40 # 8-byte Folded Reload
+; CHECK-NEXT:    addi.d $sp, $sp, 48
+; CHECK-NEXT:    ret
+entry:
+  %res = call <2 x double> @llvm.powi.v2f64.i32(<2 x double> %va, i32 %b)
+  ret <2 x double> %res
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpowi.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpowi.ll
new file mode 100644
index 00000000000000..d99feb5fdd921c
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpowi.ll
@@ -0,0 +1,1427 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv32 -mattr=+v,+f,+d -target-abi=ilp32d -verify-machineinstrs < %s \
+; RUN:   | FileCheck %s --check-prefix=RV32
+; RUN: llc -mtriple=riscv64 -mattr=+v,+f,+d -target-abi=lp64d -verify-machineinstrs < %s \
+; RUN:   | FileCheck %s --check-prefix=RV64
+
+define <1 x float> @powi_v1f32(<1 x float> %x, i32 %y) {
+; RV32-LABEL: powi_v1f32:
+; RV32:       # %bb.0:
+; RV32-NEXT:    addi sp, sp, -16
+; RV32-NEXT:    .cfi_def_cfa_offset 16
+; RV32-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; RV32-NEXT:    .cfi_offset ra, -4
+; RV32-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
+; RV32-NEXT:    vfmv.f.s fa0, v8
+; RV32-NEXT:    call __powisf2
+; RV32-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
+; RV32-NEXT:    vfmv.s.f v8, fa0
+; RV32-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; RV32-NEXT:    .cfi_restore ra
+; RV32-NEXT:    addi sp, sp, 16
+; RV32-NEXT:    .cfi_def_cfa_offset 0
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: powi_v1f32:
+; RV64:       # %bb.0:
+; RV64-NEXT:    addi sp, sp, -16
+; RV64-NEXT:    .cfi_def_cfa_offset 16
+; RV64-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64-NEXT:    .cfi_offset ra, -8
+; RV64-NEXT:    sext.w a0, a0
+; RV64-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
+; RV64-NEXT:    vfmv.f.s fa0, v8
+; RV64-NEXT:    call __powisf2
+; RV64-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
+; RV64-NEXT:    vfmv.s.f v8, fa0
+; RV64-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64-NEXT:    .cfi_restore ra
+; RV64-NEXT:    addi sp, sp, 16
+; RV64-NEXT:    .cfi_def_cfa_offset 0
+; RV64-NEXT:    ret
+  %a = call <1 x float> @llvm.powi.v1f32.i32(<1 x float> %x, i32 %y)
+  ret <1 x float> %a
+}
+declare <1 x float> @llvm.powi.v1f32.i32(<1 x float>, i32)
+
+define <2 x float> @powi_v2f32(<2 x float> %x, i32 %y) {
+; RV32-LABEL: powi_v2f32:
+; RV32:       # %bb.0:
+; RV32-NEXT:    addi sp, sp, -32
+; RV32-NEXT:    .cfi_def_cfa_offset 32
+; RV32-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32-NEXT:    fsd fs0, 16(sp) # 8-byte Folded Spill
+; RV32-NEXT:    .cfi_offset ra, -4
+; RV32-NEXT:    .cfi_offset s0, -8
+; RV32-NEXT:    .cfi_offset fs0, -16
+; RV32-NEXT:    csrr a1, vlenb
+; RV32-NEXT:    sub sp, sp, a1
+; RV32-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x20, 0x22, 0x11, 0x01, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 32 + 1 * vlenb
+; RV32-NEXT:    mv s0, a0
+; RV32-NEXT:    addi a1, sp, 16
+; RV32-NEXT:    vs1r.v v8, (a1) # Unknown-size Folded Spill
+; RV32-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; RV32-NEXT:    vslidedown.vi v9, v8, 1
+; RV32-NEXT:    vfmv.f.s fa0, v9
+; RV32-NEXT:    call __powisf2
+; RV32-NEXT:    fmv.s fs0, fa0
+; RV32-NEXT:    flw fa0, 16(sp) # 8-byte Folded Reload
+; RV32-NEXT:    mv a0, s0
+; RV32-NEXT:    call __powisf2
+; RV32-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; RV32-NEXT:    vfmv.v.f v8, fa0
+; RV32-NEXT:    vfslide1down.vf v8, v8, fs0
+; RV32-NEXT:    csrr a0, vlenb
+; RV32-NEXT:    add sp, sp, a0
+; RV32-NEXT:    .cfi_def_cfa sp, 32
+; RV32-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32-NEXT:    fld fs0, 16(sp) # 8-byte Folded Reload
+; RV32-NEXT:    .cfi_restore ra
+; RV32-NEXT:    .cfi_restore s0
+; RV32-NEXT:    .cfi_restore fs0
+; RV32-NEXT:    addi sp, sp, 32
+; RV32-NEXT:    .cfi_def_cfa_offset 0
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: powi_v2f32:
+; RV64:       # %bb.0:
+; RV64-NEXT:    addi sp, sp, -64
+; RV64-NEXT:    .cfi_def_cfa_offset 64
+; RV64-NEXT:    sd ra, 56(sp) # 8-byte Folded Spill
+; RV64-NEXT:    sd s0, 48(sp) # 8-byte Folded Spill
+; RV64-NEXT:    fsd fs0, 40(sp) # 8-byte Folded Spill
+; RV64-NEXT:    .cfi_offset ra, -8
+; RV64-NEXT:    .cfi_offset s0, -16
+; RV64-NEXT:    .cfi_offset fs0, -24
+; RV64-NEXT:    csrr a1, vlenb
+; RV64-NEXT:    sub sp, sp, a1
+; RV64-NEXT:    .cfi_escape 0x0f, 0x0e, 0x72, 0x00, 0x11, 0xc0, 0x00, 0x22, 0x11, 0x01, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 64 + 1 * vlenb
+; RV64-NEXT:    addi a1, sp, 32
+; RV64-NEXT:    vs1r.v v8, (a1) # Unknown-size Folded Spill
+; RV64-NEXT:    sext.w s0, a0
+; RV64-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; RV64-NEXT:    vslidedown.vi v9, v8, 1
+; RV64-NEXT:    vfmv.f.s fa0, v9
+; RV64-NEXT:    mv a0, s0
+; RV64-NEXT:    call __powisf2
+; RV64-NEXT:    fmv.s fs0, fa0
+; RV64-NEXT:    flw fa0, 32(sp) # 8-byte Folded Reload
+; RV64-NEXT:    mv a0, s0
+; RV64-NEXT:    call __powisf2
+; RV64-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; RV64-NEXT:    vfmv.v.f v8, fa0
+; RV64-NEXT:    vfslide1down.vf v8, v8, fs0
+; RV64-NEXT:    csrr a0, vlenb
+; RV64-NEXT:    add sp, sp, a0
+; RV64-NEXT:    .cfi_def_cfa sp, 64
+; RV64-NEXT:    ld ra, 56(sp) # 8-byte Folded Reload
+; RV64-NEXT:    ld s0, 48(sp) # 8-byte Folded Reload
+; RV64-NEXT:    fld fs0, 40(sp) # 8-byte Folded Reload
+; RV64-NEXT:    .cfi_restore ra
+; RV64-NEXT:    .cfi_restore s0
+; RV64-NEXT:    .cfi_restore fs0
+; RV64-NEXT:    addi sp, sp, 64
+; RV64-NEXT:    .cfi_def_cfa_offset 0
+; RV64-NEXT:    ret
+  %a = call <2 x float> @llvm.powi.v2f32.i32(<2 x float> %x, i32 %y)
+  ret <2 x float> %a
+}
+declare <2 x float> @llvm.powi.v2f32.i32(<2 x float>, i32)
+
+define <3 x float> @powi_v3f32(<3 x float> %x, i32 %y) {
+; RV32-LABEL: powi_v3f32:
+; RV32:       # %bb.0:
+; RV32-NEXT:    addi sp, sp, -32
+; RV32-NEXT:    .cfi_def_cfa_offset 32
+; RV32-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32-NEXT:    fsd fs0, 16(sp) # 8-byte Folded Spill
+; RV32-NEXT:    .cfi_offset ra, -4
+; RV32-NEXT:    .cfi_offset s0, -8
+; RV32-NEXT:    .cfi_offset fs0, -16
+; RV32-NEXT:    csrr a1, vlenb
+; RV32-NEXT:    slli a1, a1, 1
+; RV32-NEXT:    sub sp, sp, a1
+; RV32-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x20, 0x22, 0x11, 0x02, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 32 + 2 * vlenb
+; RV32-NEXT:    mv s0, a0
+; RV32-NEXT:    csrr a1, vlenb
+; RV32-NEXT:    add a1, sp, a1
+; RV32-NEXT:    addi a1, a1, 16
+; RV32-NEXT:    vs1r.v v8, (a1) # Unknown-size Folded Spill
+; RV32-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
+; RV32-NEXT:    vslidedown.vi v9, v8, 1
+; RV32-NEXT:    vfmv.f.s fa0, v9
+; RV32-NEXT:    call __powisf2
+; RV32-NEXT:    fmv.s fs0, fa0
+; RV32-NEXT:    csrr a0, vlenb
+; RV32-NEXT:    add a0, sp, a0
+; RV32-NEXT:    flw fa0, 16(a0) # 8-byte Folded Reload
+; RV32-NEXT:    mv a0, s0
+; RV32-NEXT:    call __powisf2
+; RV32-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; RV32-NEXT:    vfmv.v.f v8, fa0
+; RV32-NEXT:    vfslide1down.vf v8, v8, fs0
+; RV32-NEXT:    addi a0, sp, 16
+; RV32-NEXT:    vs1r.v v8, (a0) # Unknown-size Folded Spill
+; RV32-NEXT:    csrr a0, vlenb
+; RV32-NEXT:    add a0, sp, a0
+; RV32-NEXT:    addi a0, a0, 16
+; RV32-NEXT:    vl1r.v v8, (a0) # Unknown-size Folded Reload
+; RV32-NEXT:    vslidedown.vi v8, v8, 2
+; RV32-NEXT:    vfmv.f.s fa0, v8
+; RV32-NEXT:    mv a0, s0
+; RV32-NEXT:    call __powisf2
+; RV32-NEXT:    addi a0, sp, 16
+; RV32-NEXT:    vl1r.v v8, (a0) # Unknown-size Folded Reload
+; RV32-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; RV32-NEXT:    vfslide1down.vf v8, v8, fa0
+; RV32-NEXT:    vslidedown.vi v8, v8, 1
+; RV32-NEXT:    csrr a0, vlenb
+; RV32-NEXT:    slli a0, a0, 1
+; RV32-NEXT:    add sp, sp, a0
+; RV32-NEXT:    .cfi_def_cfa sp, 32
+; RV32-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32-NEXT:    fld fs0, 16(sp) # 8-byte Folded Reload
+; RV32-NEXT:    .cfi_restore ra
+; RV32-NEXT:    .cfi_restore s0
+; RV32-NEXT:    .cfi_restore fs0
+; RV32-NEXT:    addi sp, sp, 32
+; RV32-NEXT:    .cfi_def_cfa_offset 0
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: powi_v3f32:
+; RV64:       # %bb.0:
+; RV64-NEXT:    addi sp, sp, -64
+; RV64-NEXT:    .cfi_def_cfa_offset 64
+; R...
[truncated]

arsenm · 2024-12-03T23:59:08Z

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

+      // In some backends, such as RISCV64 and LoongArch64, the i32 type is
+      // illegal and is promoted by previous process. For such cases, the
+      // exponent actually matches with sizeof(int) and a libcall should be


So DAG.getLibInfo().getIntSize() should be 8?

Why would it be 8? getIntSize is in bits and int on RISCV64 is 32 bits.

I would assume getIntSize() would refer to the legalized parameter type for a libcall using int. Is there a second version for this?

Not that I know of.

I don't think we should let the vector powi get thisfar . The promotion of a libcall argument is the responsibility of call lowering and the information in TargetLowering::MakeLibCallOptions. We shouldn't promote and then try to fix it.

arsenm · 2024-12-04T00:02:09Z

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

+      if (ExponentNode->getOpcode() == ISD::SIGN_EXTEND_INREG ||
+          ExponentNode->getOpcode() == ISD::AssertSext ||
+          ExponentNode->getOpcode() == ISD::AssertZext) {
+        EVT InnerType = cast<VTSDNode>(ExponentNode->getOperand(1))->getVT();
+        ExponentHasSizeOfInt = LibIntSize == InnerType.getSizeInBits();
+      } else if (ISD::isExtOpcode(ExponentNode->getOpcode())) {
+        ExponentHasSizeOfInt =
+            LibIntSize ==
+            ExponentNode->getOperand(0).getValueType().getSizeInBits();
+      }


This code should not be trying to look through extensions (nor SIGN_EXTEND_INREG, or the Asserts*. These aren't really extensions).

I'd expect this to just insert the sext to match the libcall integer type

topperc · 2024-12-04T00:22:52Z

I kind of think this should be handled by unrolling the vector in DAGTypeLegalizer::PromoteIntOp_ExpOp.

This should work

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index 493abfde148c..e386fd510419 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -2579,6 +2579,9 @@ SDValue DAGTypeLegalizer::PromoteIntOp_ExpOp(SDNode *N) {
       N->getOpcode() == ISD::FPOWI || N->getOpcode() == ISD::STRICT_FPOWI;
   unsigned OpOffset = IsStrict ? 1 : 0;

+  if (N->getValueType(0).isVector())
+    return DAG.UnrollVectorOp(N);
+
   // The integer operand is the last operand in FPOWI (or FLDEXP) (so the result
   // and floating point operand is already type legalized).
   RTLIB::Libcall LC = IsPowI ? RTLIB::getPOWI(N->getValueType(0))

ylzsx · 2024-12-04T01:15:56Z

I kind of think this should be handled by unrolling the vector in DAGTypeLegalizer::PromoteIntOp_ExpOp.

This should work

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index 493abfde148c..e386fd510419 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -2579,6 +2579,9 @@ SDValue DAGTypeLegalizer::PromoteIntOp_ExpOp(SDNode *N) {
       N->getOpcode() == ISD::FPOWI || N->getOpcode() == ISD::STRICT_FPOWI;
   unsigned OpOffset = IsStrict ? 1 : 0;

+  if (N->getValueType(0).isVector())
+    return DAG.UnrollVectorOp(N);
+
   // The integer operand is the last operand in FPOWI (or FLDEXP) (so the result
   // and floating point operand is already type legalized).
   RTLIB::Libcall LC = IsPowI ? RTLIB::getPOWI(N->getValueType(0))

Thanks. Your approach is simpler and more reasonable. I will modify it to adopt this solution.

For nodes such as `ISD::FPOWI`, `ISD::FLDEXP`, if the first operand is a vector operand, since the corresponding library functions do not have vector-type signatures, the vector will be unroll during the type legalization, without promoting the second operand.

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fldexp.ll

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpowi.ll

arsenm · 2024-12-04T04:13:16Z

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

+  if (N->getValueType(0).isVector())
+    return DAG.UnrollVectorOp(N);


This probably shouldn't be the first thing tried. If a vector libcall happens to be available, that would be preferable. Can you move this down before the scalar call is introduced?

Thanks, I will move it into this if statement to ensure that it won't promote the second operand. Do you think it's reasonable?

if (LC == RTLIB::UNKNOWN_LIBCALL || !TLI.getLibcallName(LC)) { if (N->getValueType(0).isVector()) return DAG.UnrollVectorOp(N); SmallVector<SDValue, 3> NewOps(N->ops()); NewOps[1 + OpOffset] = SExtPromotedInteger(N->getOperand(1 + OpOffset)); return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0); }

github-actions · 2024-12-04T08:04:49Z

✅ With the latest revision this PR passed the C/C++ code formatter.

arsenm · 2024-12-04T13:30:27Z

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

+    if (N->getValueType(0).isVector())
+      return DAG.UnrollVectorOp(N);


This whole function is actually the wrong place to split the vector (see how there are no other UnrollVectorOps uses in DAGTypeLegalizer). The description also says your problem is when the libcall is used, so you'd want to change the other path? Does the libcall emission below need to directly handle the vector case?

Sorry, I'm not entirely sure I understand what you mean. Could you please clarify?
Are you referring to the approach used in my first commit 7f5a128 ? If so, in that commit, you mentioned not to check for the SIGN_EXTEND_INREG node, but this node is generated in this if statement here.(SExtPromotedInteger will generate a SIGN_EXTEND_INREG node).

if (LC == RTLIB::UNKNOWN_LIBCALL || !TLI.getLibcallName(LC)) { SmallVector<SDValue, 3> NewOps(N->ops()); NewOps[1 + OpOffset] = SExtPromotedInteger(N->getOperand(1 + OpOffset)); return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0); }

Other info:

RISCV64 and Loongarch64 will enter function PromoteIntOp_ExpOp and generate a SIGN_EXTEND_INREG node, so ExponentHasSizeOfInt=false(32 != 64) and report an error.

bool ExponentHasSizeOfInt = DAG.getLibInfo().getIntSize() == Node->getOperand(1 + Offset).getValueType().getSizeInBits();

Additionally, AArch64 and X86 will get ExponentHasSizeOfInt=true, because in these backends, i32 is valid and was not promoted to SIGN_EXTEND_INREG(i64) previously. So, they generate right code sequences.

In the entire LLVM, only fpowi and fldexp will call the PromoteIntOp_ExpOp function.(Flow: llvm::DAGTypeLegalizer::PromoteIntegerOperand -> llvm::DAGTypeLegalizer::PromoteIntOp_ExpOp)

case ISD::FPOWI: case ISD::STRICT_FPOWI: case ISD::FLDEXP: case ISD::STRICT_FLDEXP: Res = PromoteIntOp_ExpOp(N); break;

If not, could you provide more specific modification suggestions？

This whole function is actually the wrong place to split the vector (see how there are no other UnrollVectorOps uses in DAGTypeLegalizer). The description also says your problem is when the libcall is used, so you'd want to change the other path? Does the libcall emission below need to directly handle the vector case?

@topperc what are your thoughts on this?

see how there are no other UnrollVectorOps uses in DAGTypeLegalizer

There are calls to UnrollVectorOps in several places in LegalizeVectorTypes.cpp

This comment where the libcall below is created seems relevant

// We can't just promote the exponent type in FPOWI, since we want to lower // the node to a libcall and we if we promote to a type larger than // sizeof(int) the libcall might not be according to the targets ABI.

My suggestion to unroll here was so that we wouldn't promote past sizeof(int). If we wait until LegalizeDAG.cpp to unroll the operation, the damage to the integer type has already been done. For RISC-V its harmless because signed int is supposed to be passed sign extended to 64-bits according to the ABI.

Hypothetically, if powi took an unsigned int as an argument, then type legalization would use zero extend, but the RISC-V ABI wants unsigned int to be passed sign extended. So LegalizeDAG would need to insert a SIGN_EXTEND_INREG to fix. I guess it would need to use the getIntSize() and shouldSignExtendTypeInLibCall to know what it needs to do in that case.

If we don't unroll here I guess the best fix in LegalizeDAG would also be to use getIntSize() and shouldSignExtendTypeInLibCall and use computeNumSignBits to know if a SIGN_EXTEND_INREG needs to be inserted?

It makes more sense to me to handle this when emitting the call, where ABI constraints would naturally be handled. This can be sign extended here, and the call emission can truncate or sext_inreg as required

It makes more sense to me to handle this when emitting the call, where ABI constraints would naturally be handled. This can be sign extended here, and the call emission can truncate or sext_inreg as required

The POWI code in LegalizeDAG calls SelectionDAGLegalize::ExpandFPLibCall which will call TargetLowering::makeLibCall using the promoted type. makeLibCall calls getTypeForEVT which will return i64 due to the promotion. That's what will be used by calling lowering, but that's the wrong type for it to do the right thing. We need to get an i32 Type* into call lowering, but we no longer have it. We'll need to call getIntSize() to get the size and pass it along somehow. That requires refactoring several interfaces or adding new ones. Not sure if call lowering would also expect the SDValue for the argument to have i32 type.

My unrolling proposal avoided that by scalarizing it and letting the newly created scalar powi calls get converted to libcalls while we're still in type legalization. That's how we currently handle the scalar powi case. Are you also suggesting we should move the scalar handling to LegalizeDAG as well?

makeLibCall should really be looking at the signature of the underlying call, but RuntimeLibcalls currently does not record this information. TargetLibraryInfo does, which is separate for some reason. This keeps coming up as a problem, these really need to be merged in some way (cc @jhuber6).

Taking the type from the DAG node isn't strictly correct, it's just where we've ended up. This came up recently for the special case in ExpandFPLibCall to sign extend the integer argument for FLDEXP.

Practically speaking, I don't think any targets will have a vector powi implementation (I at least don't see any in RuntimeLibcalls), so unrolling works out. I guess this could get a fixme and go ahead for now, but it's still a hack

arsenm

I just noticed you are using vector ldexp with a scalar i32 argument. I'm surprised this is considered valid IR, and works as well as it does. It wasn't my intention when I added the intrinsic to handle that case. I expected the number of elements of both arguments would match. Do you only have this issue when using the implicit splat behavior of the scalar operand? powi is weird because it only takes the scalar argument, unlike ldexp.

I suppose we could support the scalar second argument for ldexp, but I'm not sure we handle implicit splats like that in any other operation.

ylzsx · 2024-12-06T01:58:18Z

I just noticed you are using vector ldexp with a scalar i32 argument. I'm surprised this is considered valid IR, and works as well as it does. It wasn't my intention when I added the intrinsic to handle that case. I expected the number of elements of both arguments would match. Do you only have this issue when using the implicit splat behavior of the scalar operand? powi is weird because it only takes the scalar argument, unlike ldexp.

I suppose we could support the scalar second argument for ldexp, but I'm not sure we handle implicit splats like that in any other operation.

I didn't notice earlier that the second argument of ldexp can accept vector arguments. In fact, ldexp can generate code sequences whether it accepts scalar or vector arguments. The initial issue was only related to powi (#118079).

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

ylzsx · 2024-12-16T07:40:37Z

ping

topperc

LGTM

SixWeining

LGTM

llvm-ci · 2024-12-19T01:01:47Z

LLVM Buildbot has detected a new failure on builder clang-cmake-x86_64-avx512-win running on avx512-intel64-win while building llvm at step 4 "cmake stage 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/81/builds/3208

Here is the relevant piece of the build log for the reference

Step 4 (cmake stage 1) failure: 'cmake -G ...' (failure)
'cmake' is not recognized as an internal or external command,
operable program or batch file.

[llvm][CodeGen] Intrinsic llvm.powi.* code gen for vector arguments

7f5a128

In some backends, the i32 type is illegal and will be promoted. This causes exponent type check to fail when ISD::FOWI node generates a libcall.

llvmbot added backend:loongarch llvm:SelectionDAG SelectionDAGISel as well labels Dec 2, 2024

ylzsx requested review from wangleiat, SixWeining and topperc December 2, 2024 02:08

Modify test file name. NFC

73281bc

arsenm reviewed Dec 4, 2024

View reviewed changes

Fixes for reviews.

e2c155d

For nodes such as `ISD::FPOWI`, `ISD::FLDEXP`, if the first operand is a vector operand, since the corresponding library functions do not have vector-type signatures, the vector will be unroll during the type legalization, without promoting the second operand.

ylzsx changed the title ~~[llvm][CodeGen] Intrinsic llvm.powi.* code gen for vector arguments~~ [llvm][CodeGen] Intrinsic llvm.powi/ldexp.* code gen for vector arguments Dec 4, 2024

topperc reviewed Dec 4, 2024

View reviewed changes

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fldexp.ll Outdated Show resolved Hide resolved

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpowi.ll Outdated Show resolved Hide resolved

arsenm reviewed Dec 4, 2024

View reviewed changes

ylzsx added 2 commits December 4, 2024 15:22

Fixes for review.

685192d

code formatter

2576758

arsenm reviewed Dec 4, 2024

View reviewed changes

arsenm reviewed Dec 6, 2024

View reviewed changes

Intrinsic llvm.ldexp.* should not be affected.

5abe500

ylzsx changed the title ~~[llvm][CodeGen] Intrinsic llvm.powi/ldexp.* code gen for vector arguments~~ [llvm][CodeGen] Intrinsic llvm.powi.* code gen for vector arguments Dec 9, 2024

topperc reviewed Dec 9, 2024

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp Show resolved Hide resolved

Add comment.

71a396e

topperc approved these changes Dec 16, 2024

View reviewed changes

SixWeining approved these changes Dec 18, 2024

View reviewed changes

ylzsx merged commit f334db9 into llvm:main Dec 19, 2024
8 checks passed

ylzsx deleted the powi branch December 30, 2024 12:17

This was referenced Jun 2, 2025

[MTE] [NFC] use vector to collect globals to tag (#120283) #142329

Closed

[MTE] [NFC] use vector to collect globals to tag (#120283) #142330

Draft

		if (N->getValueType(0).isVector())
		return DAG.UnrollVectorOp(N);

[llvm][CodeGen] Intrinsic llvm.powi.* code gen for vector arguments #118242

[llvm][CodeGen] Intrinsic llvm.powi.* code gen for vector arguments #118242

Uh oh!

Conversation

ylzsx commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 2, 2024

Uh oh!

llvmbot commented Dec 2, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

topperc commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ylzsx commented Dec 4, 2024

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

ylzsx commented Dec 6, 2024

Uh oh!

Uh oh!

ylzsx commented Dec 16, 2024

Uh oh!

topperc left a comment

Choose a reason for hiding this comment

Uh oh!

SixWeining left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Dec 19, 2024

Uh oh!

Uh oh!

[llvm][CodeGen] Intrinsic `llvm.powi.*` code gen for vector arguments #118242

[llvm][CodeGen] Intrinsic `llvm.powi.*` code gen for vector arguments #118242

ylzsx commented Dec 2, 2024 •

edited

Loading

topperc commented Dec 4, 2024 •

edited

Loading

github-actions bot commented Dec 4, 2024 •

edited

Loading