Skip to content

[Analysis] atan2: isTriviallyVectorizable; add to massv and accelerate veclibs #113637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 9, 2024

Conversation

tex3d
Copy link
Contributor

@tex3d tex3d commented Oct 25, 2024

This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

  • Return true for atan2 from isTriviallyVectorizable
  • Add atan2 to VecFuncs.def for massv and accelerate libraries.
  • Add atan2 to hasOptimizedCodeGen
  • Add atan2 support in llvm/lib/Analysis/ValueTracking.cpp llvm::getIntrinsicForCallSite and update vectorization tests
  • Add atan2 name check to isLoweredToCall in llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Thanks to @jroelofs for the atan2 accelerate veclib and associated test additions, plus the hasOptimizedCodeGen addition.

Part of: Implement the atan2 HLSL Function #70096.

@llvmbot
Copy link
Member

llvmbot commented Oct 25, 2024

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Tex Riddell (tex3d)

Changes

This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

This returns true from isTriviallyVectorizable for llvm.atan2 intrinsic. It also adds llvm atan2 intrinsic equivalents to VecFuncs.def for massv.

Part of: Implement the atan2 HLSL Function #70096.


Full diff: https://github.com/llvm/llvm-project/pull/113637.diff

3 Files Affected:

  • (modified) llvm/include/llvm/Analysis/VecFuncs.def (+2)
  • (modified) llvm/lib/Analysis/VectorUtils.cpp (+1)
  • (modified) llvm/test/Transforms/LoopVectorize/PowerPC/massv-calls.ll (+46)
diff --git a/llvm/include/llvm/Analysis/VecFuncs.def b/llvm/include/llvm/Analysis/VecFuncs.def
index c4586894e3e490..6f0ba3f9e406c4 100644
--- a/llvm/include/llvm/Analysis/VecFuncs.def
+++ b/llvm/include/llvm/Analysis/VecFuncs.def
@@ -289,7 +289,9 @@ TLI_DEFINE_VECFUNC("acosf", "__acosf4", FIXED(4), "_ZGV_LLVM_N4v")
 TLI_DEFINE_VECFUNC("atan", "__atand2", FIXED(2), "_ZGV_LLVM_N2v")
 TLI_DEFINE_VECFUNC("atanf", "__atanf4", FIXED(4), "_ZGV_LLVM_N4v")
 TLI_DEFINE_VECFUNC("atan2", "__atan2d2", FIXED(2), "_ZGV_LLVM_N2vv")
+TLI_DEFINE_VECFUNC("llvm.atan2.f64", "__atan2d2", FIXED(2), "_ZGV_LLVM_N2vv")
 TLI_DEFINE_VECFUNC("atan2f", "__atan2f4", FIXED(4), "_ZGV_LLVM_N4vv")
+TLI_DEFINE_VECFUNC("llvm.atan2.f32", "__atan2f4", FIXED(4), "_ZGV_LLVM_N4vv")
 
 // Hyperbolic Functions
 TLI_DEFINE_VECFUNC("sinh", "__sinhd2", FIXED(2), "_ZGV_LLVM_N2v")
diff --git a/llvm/lib/Analysis/VectorUtils.cpp b/llvm/lib/Analysis/VectorUtils.cpp
index 37c443011719b6..f3ffddeb40d4b4 100644
--- a/llvm/lib/Analysis/VectorUtils.cpp
+++ b/llvm/lib/Analysis/VectorUtils.cpp
@@ -69,6 +69,7 @@ bool llvm::isTriviallyVectorizable(Intrinsic::ID ID) {
   case Intrinsic::asin:
   case Intrinsic::acos:
   case Intrinsic::atan:
+  case Intrinsic::atan2:
   case Intrinsic::sin:
   case Intrinsic::cos:
   case Intrinsic::tan:
diff --git a/llvm/test/Transforms/LoopVectorize/PowerPC/massv-calls.ll b/llvm/test/Transforms/LoopVectorize/PowerPC/massv-calls.ll
index ba6e85e2852952..badc67866d2bad 100644
--- a/llvm/test/Transforms/LoopVectorize/PowerPC/massv-calls.ll
+++ b/llvm/test/Transforms/LoopVectorize/PowerPC/massv-calls.ll
@@ -1244,6 +1244,52 @@ for.end:
   ret void
 }
 
+define void @atan2_f64_intrinsic(ptr nocapture %varray) {
+; CHECK-LABEL: @atan2_f64_intrinsic(
+; CHECK: __atan2d2{{.*}}<2 x double>
+; CHECK: ret void
+;
+entry:
+  br label %for.body
+
+for.body:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+  %tmp = trunc i64 %iv to i32
+  %conv = sitofp i32 %tmp to double
+  %call = tail call double @llvm.atan2.f64(double %conv, double %conv)
+  %arrayidx = getelementptr inbounds double, ptr %varray, i64 %iv
+  store double %call, ptr %arrayidx, align 4
+  %iv.next = add nuw nsw i64 %iv, 1
+  %exitcond = icmp eq i64 %iv.next, 1000
+  br i1 %exitcond, label %for.end, label %for.body
+
+for.end:
+  ret void
+}
+
+define void @atan2_f32_intrinsic(ptr nocapture %varray) {
+; CHECK-LABEL: @atan2_f32_intrinsic(
+; CHECK: __atan2f4{{.*}}<4 x float>
+; CHECK: ret void
+;
+entry:
+  br label %for.body
+
+for.body:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+  %tmp = trunc i64 %iv to i32
+  %conv = sitofp i32 %tmp to float
+  %call = tail call float @llvm.atan2.f32(float %conv, float %conv)
+  %arrayidx = getelementptr inbounds float, ptr %varray, i64 %iv
+  store float %call, ptr %arrayidx, align 4
+  %iv.next = add nuw nsw i64 %iv, 1
+  %exitcond = icmp eq i64 %iv.next, 1000
+  br i1 %exitcond, label %for.end, label %for.body
+
+for.end:
+  ret void
+}
+
 define void @sinh_f64(ptr nocapture %varray) {
 ; CHECK-LABEL: @sinh_f64(
 ; CHECK: __sinhd2{{.*}}<2 x double>

@llvmbot
Copy link
Member

llvmbot commented Oct 25, 2024

@llvm/pr-subscribers-backend-powerpc

Author: Tex Riddell (tex3d)

Changes

This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

This returns true from isTriviallyVectorizable for llvm.atan2 intrinsic. It also adds llvm atan2 intrinsic equivalents to VecFuncs.def for massv.

Part of: Implement the atan2 HLSL Function #70096.


Full diff: https://github.com/llvm/llvm-project/pull/113637.diff

3 Files Affected:

  • (modified) llvm/include/llvm/Analysis/VecFuncs.def (+2)
  • (modified) llvm/lib/Analysis/VectorUtils.cpp (+1)
  • (modified) llvm/test/Transforms/LoopVectorize/PowerPC/massv-calls.ll (+46)
diff --git a/llvm/include/llvm/Analysis/VecFuncs.def b/llvm/include/llvm/Analysis/VecFuncs.def
index c4586894e3e490..6f0ba3f9e406c4 100644
--- a/llvm/include/llvm/Analysis/VecFuncs.def
+++ b/llvm/include/llvm/Analysis/VecFuncs.def
@@ -289,7 +289,9 @@ TLI_DEFINE_VECFUNC("acosf", "__acosf4", FIXED(4), "_ZGV_LLVM_N4v")
 TLI_DEFINE_VECFUNC("atan", "__atand2", FIXED(2), "_ZGV_LLVM_N2v")
 TLI_DEFINE_VECFUNC("atanf", "__atanf4", FIXED(4), "_ZGV_LLVM_N4v")
 TLI_DEFINE_VECFUNC("atan2", "__atan2d2", FIXED(2), "_ZGV_LLVM_N2vv")
+TLI_DEFINE_VECFUNC("llvm.atan2.f64", "__atan2d2", FIXED(2), "_ZGV_LLVM_N2vv")
 TLI_DEFINE_VECFUNC("atan2f", "__atan2f4", FIXED(4), "_ZGV_LLVM_N4vv")
+TLI_DEFINE_VECFUNC("llvm.atan2.f32", "__atan2f4", FIXED(4), "_ZGV_LLVM_N4vv")
 
 // Hyperbolic Functions
 TLI_DEFINE_VECFUNC("sinh", "__sinhd2", FIXED(2), "_ZGV_LLVM_N2v")
diff --git a/llvm/lib/Analysis/VectorUtils.cpp b/llvm/lib/Analysis/VectorUtils.cpp
index 37c443011719b6..f3ffddeb40d4b4 100644
--- a/llvm/lib/Analysis/VectorUtils.cpp
+++ b/llvm/lib/Analysis/VectorUtils.cpp
@@ -69,6 +69,7 @@ bool llvm::isTriviallyVectorizable(Intrinsic::ID ID) {
   case Intrinsic::asin:
   case Intrinsic::acos:
   case Intrinsic::atan:
+  case Intrinsic::atan2:
   case Intrinsic::sin:
   case Intrinsic::cos:
   case Intrinsic::tan:
diff --git a/llvm/test/Transforms/LoopVectorize/PowerPC/massv-calls.ll b/llvm/test/Transforms/LoopVectorize/PowerPC/massv-calls.ll
index ba6e85e2852952..badc67866d2bad 100644
--- a/llvm/test/Transforms/LoopVectorize/PowerPC/massv-calls.ll
+++ b/llvm/test/Transforms/LoopVectorize/PowerPC/massv-calls.ll
@@ -1244,6 +1244,52 @@ for.end:
   ret void
 }
 
+define void @atan2_f64_intrinsic(ptr nocapture %varray) {
+; CHECK-LABEL: @atan2_f64_intrinsic(
+; CHECK: __atan2d2{{.*}}<2 x double>
+; CHECK: ret void
+;
+entry:
+  br label %for.body
+
+for.body:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+  %tmp = trunc i64 %iv to i32
+  %conv = sitofp i32 %tmp to double
+  %call = tail call double @llvm.atan2.f64(double %conv, double %conv)
+  %arrayidx = getelementptr inbounds double, ptr %varray, i64 %iv
+  store double %call, ptr %arrayidx, align 4
+  %iv.next = add nuw nsw i64 %iv, 1
+  %exitcond = icmp eq i64 %iv.next, 1000
+  br i1 %exitcond, label %for.end, label %for.body
+
+for.end:
+  ret void
+}
+
+define void @atan2_f32_intrinsic(ptr nocapture %varray) {
+; CHECK-LABEL: @atan2_f32_intrinsic(
+; CHECK: __atan2f4{{.*}}<4 x float>
+; CHECK: ret void
+;
+entry:
+  br label %for.body
+
+for.body:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+  %tmp = trunc i64 %iv to i32
+  %conv = sitofp i32 %tmp to float
+  %call = tail call float @llvm.atan2.f32(float %conv, float %conv)
+  %arrayidx = getelementptr inbounds float, ptr %varray, i64 %iv
+  store float %call, ptr %arrayidx, align 4
+  %iv.next = add nuw nsw i64 %iv, 1
+  %exitcond = icmp eq i64 %iv.next, 1000
+  br i1 %exitcond, label %for.end, label %for.body
+
+for.end:
+  ret void
+}
+
 define void @sinh_f64(ptr nocapture %varray) {
 ; CHECK-LABEL: @sinh_f64(
 ; CHECK: __sinhd2{{.*}}<2 x double>

Copy link
Contributor

@jroelofs jroelofs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tex3d tex3d changed the title [Analysis] isTriviallyVectorizable add vectorization for atan2 intrinsic [Analysis] atan2: isTriviallyVectorizable; add to massv and accelerate veclibs Oct 29, 2024
@farzonl
Copy link
Member

farzonl commented Nov 6, 2024

  1. Add libfunc mapping to intrinsic in llvm/lib/Analysis/ValueTracking.cpp llvm::getIntrinsicForCallSite
  2. Add atan2 name check in llvm/include/llvm/Analysis/TargetTransformInfoImpl.h isLoweredToCall
  3. I noticed there were some examples that use a amdlibm vector form below. Should we do a follow on PR that adds atan2 amdlibm and updates llvm/test/Transforms/LoopVectorize/X86/amdlibm-calls.ll?

amd_vrs4_atan2f
amd_vrs8_atan2f
amd_vrd2_atan2
amd_vrd4_atan2

tex3d added 4 commits November 6, 2024 12:19
This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

This returns true from isTriviallyVectorizable for llvm.atan2 intrinsic.
It also adds llvm atan2 intrinsic equivalents to VecFuncs.def for massv.

Part of: Implement the atan2 HLSL Function llvm#70096.
…sis/TargetTransformInfoImpl.h`

TBD: Add testing for this change.
@tex3d tex3d requested a review from nikic as a code owner November 7, 2024 01:28
@tex3d
Copy link
Contributor Author

tex3d commented Nov 7, 2024

@farzonl

  • Add libfunc mapping to intrinsic in llvm/lib/Analysis/ValueTracking.cpp llvm::getIntrinsicForCallSite

Added.

  • Add atan2 name check in llvm/include/llvm/Analysis/TargetTransformInfoImpl.h isLoweredToCall

Added, but it had no test impact, so I'm looking for feedback on how this can be tested.

  • I noticed there were some examples that use a amdlibm vector form below. Should we do a follow on PR that adds atan2 amdlibm and updates llvm/test/Transforms/LoopVectorize/X86/amdlibm-calls.ll?

It turns out these example files have all the code under #if 0, and there doesn't appear to be any declarations of these vector operations in the header. I'll leave this for a later update when the operations are added to amdlibm.

@farzonl
Copy link
Member

farzonl commented Nov 7, 2024

See this pr: https://github.com/llvm/llvm-project/pull/95517/files

tests like this should show that the libcall name to intrinsic mapping in the check line

llvm/test/Transforms/SLPVectorizer/X86/call.ll

@tex3d
Copy link
Contributor Author

tex3d commented Nov 8, 2024

@farzonl

See this pr: https://github.com/llvm/llvm-project/pull/95517/files
tests like this should show that the libcall name to intrinsic mapping in the check line
llvm/test/Transforms/SLPVectorizer/X86/call.ll

This still passes without the mapping for tan. There are other places where libs calls are mapped, which is why I'm unsure of what actually depends on this code.

I commented out the whole block to see if any tests would fail, and found exactly one:

; Don't turn this into an infinite loop, this is probably the implementation
; of fabs and we expect the codegen to lower fabs.
; CHECK: @fabs(double %f)
; CHECK: call
; CHECK: ret
define double @fabs(double %f) {
entry:
%tmp2 = call double @fabs( double %f ) ; <double> [#uses=1]
ret double %tmp2
}

If isLoweredToCall returns true for fabs, this results in:

define double @fabs(double %f) {
entry:
 br label %tailrecurse
tailrecurse: ; preds = %tailrecurse, %entry
 br label %tailrecurse
}

This doesn't exactly seem like the primary effect this code is meant for. From call graph inspection, it looks like this could impact some loop unrolling if it contains a matching call. It doesn't look like any of these were explicitly tested though.

Architecturally, this is ugly, as is pointed out by the comment from 10 years ago in that code:

// FIXME: These should almost certainly not be handled here, and instead
// handled with the help of TLI or the target itself. This was largely
// ported from existing analysis heuristics here so that such refactorings
// can take place in the future.

It seems like we should just have tablegen definitions with clear properties somewhere, then all these random spots in code that depend on specific properties would be handled in a uniform way.

I guess I'll just merge it as-is for now.

@tex3d tex3d merged commit 818d715 into llvm:main Nov 9, 2024
8 checks passed
@tex3d tex3d deleted the atan2-vectorize branch November 9, 2024 00:08
Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Nov 15, 2024
…e veclibs (llvm#113637)

This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

- Return true for atan2 from isTriviallyVectorizable
- Add atan2 to VecFuncs.def for massv and accelerate libraries.
- Add atan2 to hasOptimizedCodeGen
- Add atan2 support in llvm/lib/Analysis/ValueTracking.cpp
llvm::getIntrinsicForCallSite and update vectorization tests
- Add atan2 name check to isLoweredToCall in
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
- Note: there's no test coverage for these names in isLoweredToCall, except that Transforms/TailCallElim/inf-recursion.ll is impacted by the "fabs" case

Thanks to @jroelofs for the atan2 accelerate veclib and associated test
additions, plus the hasOptimizedCodeGen addition.

Part of: Implement the atan2 HLSL Function llvm#70096.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants