Skip to content

[clang] Reland Add tanf16 builtin and support for tan constrained intrinsic #94559

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 11, 2024

Conversation

farzonl
Copy link
Member

@farzonl farzonl commented Jun 6, 2024

Relanding this PR now that #90503 has merged. with FTAN landing in TargetLoweringBase.cpp:L1021 There is now a llvm tan intrinsic 32\64\128 Expand case for all llvm backends.

In LLVM, the llvm.experimental.constrained.cos and llvm.experimental.constrained.sin intrinsics are used for performing cosine and sine calculations with additional constraints on floating-point operations. This behavior is expected for all floating-point math intrinsics. This change adds these constraints for the tan intrinsic.

  • Builtins.td - replace TanF128 with F16F128MathTemplate
  • CGBuiltin.cpp - map existing tan builtins to tan and constrained_tan intrinsic
  • ConstrainedOps.def map tan and constrained_tan to an ISDOpcode.

resolves #91421

@farzonl farzonl requested review from ilovepi and efriedma-quic June 6, 2024 02:44
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. llvm:ir labels Jun 6, 2024
@llvmbot
Copy link
Member

llvmbot commented Jun 6, 2024

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-backend-arm
@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-clang

Author: Farzon Lotfi (farzonl)

Changes

Relanding this PR now that #90503 has merged. with FTAN landing in TargetLoweringBase.cpp:L1021 There is now a llvm tan intrinsic 32\64\128 Expand case for all llvm backends.

In LLVM, the llvm.experimental.constrained.cos and llvm.experimental.constrained.sin intrinsics are used for performing cosine and sine calculations with additional constraints on floating-point operations. This behavior is expected for all floating-point math intrinsics. This change adds these constraints for the tan intrinsic.

  • Builtins.td - replace TanF128 with F16F128MathTemplate
  • CGBuiltin.cpp - map existing tan builtins to tan and constrained_tan intrinsic
  • ConstrainedOps.def map tan and constrained_tan to an ISDOpcode.
  • ISDOpcodes.h - define tan and strict tan opcodes

resolves #91421


Full diff: https://github.com/llvm/llvm-project/pull/94559.diff

11 Files Affected:

  • (modified) clang/include/clang/Basic/Builtins.td (+3-3)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+12)
  • (modified) clang/test/CodeGen/X86/math-builtins.c (+4-4)
  • (modified) clang/test/CodeGen/constrained-math-builtins.c (+13)
  • (modified) clang/test/CodeGen/math-libcalls.c (+6-6)
  • (modified) clang/test/CodeGenOpenCL/builtins-f16.cl (+3)
  • (modified) llvm/docs/LangRef.rst (+36)
  • (modified) llvm/include/llvm/IR/ConstrainedOps.def (+1)
  • (modified) llvm/include/llvm/IR/Intrinsics.td (+4)
  • (modified) llvm/test/Assembler/fp-intrinsics-attr.ll (+8)
  • (modified) llvm/test/Feature/fp-intrinsics.ll (+11)
diff --git a/clang/include/clang/Basic/Builtins.td b/clang/include/clang/Basic/Builtins.td
index 11982af3fa609..7bef5fd7ad40f 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -482,11 +482,11 @@ def SqrtF16F128 : Builtin, F16F128MathTemplate {
   let Prototype = "T(T)";
 }
 
-def TanF128 : Builtin {
-  let Spellings = ["__builtin_tanf128"];
+def TanF16F128 : Builtin, F16F128MathTemplate {
+  let Spellings = ["__builtin_tan"];
   let Attributes = [FunctionWithBuiltinPrefix, NoThrow,
                     ConstIgnoringErrnoAndExceptions];
-  let Prototype = "__float128(__float128)";
+  let Prototype = "T(T)";
 }
 
 def TanhF128 : Builtin {
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 37d0c478e0330..a76ce82830d8f 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -2923,6 +2923,18 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
       SetSqrtFPAccuracy(Call);
       return RValue::get(Call);
     }
+
+    case Builtin::BItan:
+    case Builtin::BItanf:
+    case Builtin::BItanl:
+    case Builtin::BI__builtin_tan:
+    case Builtin::BI__builtin_tanf:
+    case Builtin::BI__builtin_tanf16:
+    case Builtin::BI__builtin_tanl:
+    case Builtin::BI__builtin_tanf128:
+      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
+          *this, E, Intrinsic::tan, Intrinsic::experimental_constrained_tan));
+
     case Builtin::BItrunc:
     case Builtin::BItruncf:
     case Builtin::BItruncl:
diff --git a/clang/test/CodeGen/X86/math-builtins.c b/clang/test/CodeGen/X86/math-builtins.c
index 093239b448260..1e0f129b98610 100644
--- a/clang/test/CodeGen/X86/math-builtins.c
+++ b/clang/test/CodeGen/X86/math-builtins.c
@@ -674,10 +674,10 @@ __builtin_sqrt(f);       __builtin_sqrtf(f);      __builtin_sqrtl(f); __builtin_
 
 __builtin_tan(f);        __builtin_tanf(f);       __builtin_tanl(f); __builtin_tanf128(f);
 
-// NO__ERRNO: declare double @tan(double noundef) [[READNONE]]
-// NO__ERRNO: declare float @tanf(float noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @tanl(x86_fp80 noundef) [[READNONE]]
-// NO__ERRNO: declare fp128 @tanf128(fp128 noundef) [[READNONE]]
+// NO__ERRNO: declare double @llvm.tan.f64(double) [[READNONE_INTRINSIC]]
+// NO__ERRNO: declare float @llvm.tan.f32(float) [[READNONE_INTRINSIC]]
+// NO__ERRNO: declare x86_fp80 @llvm.tan.f80(x86_fp80) [[READNONE_INTRINSIC]]
+// NO__ERRNO: declare fp128 @llvm.tan.f128(fp128) [[READNONE_INTRINSIC]]
 // HAS_ERRNO: declare double @tan(double noundef) [[NOT_READNONE]]
 // HAS_ERRNO: declare float @tanf(float noundef) [[NOT_READNONE]]
 // HAS_ERRNO: declare x86_fp80 @tanl(x86_fp80 noundef) [[NOT_READNONE]]
diff --git a/clang/test/CodeGen/constrained-math-builtins.c b/clang/test/CodeGen/constrained-math-builtins.c
index 2de832dd2b6ca..6cc3a10a1e794 100644
--- a/clang/test/CodeGen/constrained-math-builtins.c
+++ b/clang/test/CodeGen/constrained-math-builtins.c
@@ -183,6 +183,14 @@ void foo(double *d, float f, float *fp, long double *l, int *i, const char *c, _
 // CHECK: call x86_fp80 @llvm.experimental.constrained.sqrt.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 // CHECK: call fp128 @llvm.experimental.constrained.sqrt.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
+  __builtin_tan(f);        __builtin_tanf(f);       __builtin_tanl(f); __builtin_tanf128(f);
+
+// CHECK: call double @llvm.experimental.constrained.tan.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+// CHECK: call float @llvm.experimental.constrained.tan.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+// CHECK: call x86_fp80 @llvm.experimental.constrained.tan.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+// CHECK: call fp128 @llvm.experimental.constrained.tan.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+
+
   __builtin_trunc(f);      __builtin_truncf(f);     __builtin_truncl(f); __builtin_truncf128(f);
 
 // CHECK: call double @llvm.experimental.constrained.trunc.f64(double %{{.*}}, metadata !"fpexcept.strict")
@@ -315,6 +323,11 @@ void foo(double *d, float f, float *fp, long double *l, int *i, const char *c, _
 // CHECK: declare x86_fp80 @llvm.experimental.constrained.sqrt.f80(x86_fp80, metadata, metadata)
 // CHECK: declare fp128 @llvm.experimental.constrained.sqrt.f128(fp128, metadata, metadata)
 
+// CHECK: declare double @llvm.experimental.constrained.tan.f64(double, metadata, metadata)
+// CHECK: declare float @llvm.experimental.constrained.tan.f32(float, metadata, metadata)
+// CHECK: declare x86_fp80 @llvm.experimental.constrained.tan.f80(x86_fp80, metadata, metadata)
+// CHECK: declare fp128 @llvm.experimental.constrained.tan.f128(fp128, metadata, metadata)
+
 // CHECK: declare double @llvm.experimental.constrained.trunc.f64(double, metadata)
 // CHECK: declare float @llvm.experimental.constrained.trunc.f32(float, metadata)
 // CHECK: declare x86_fp80 @llvm.experimental.constrained.trunc.f80(x86_fp80, metadata)
diff --git a/clang/test/CodeGen/math-libcalls.c b/clang/test/CodeGen/math-libcalls.c
index 29c312ba0ecac..a249182692762 100644
--- a/clang/test/CodeGen/math-libcalls.c
+++ b/clang/test/CodeGen/math-libcalls.c
@@ -662,15 +662,15 @@ void foo(double *d, float f, float *fp, long double *l, int *i, const char *c) {
 
   tan(f);        tanf(f);       tanl(f);
 
-// NO__ERRNO: declare double @tan(double noundef) [[READNONE]]
-// NO__ERRNO: declare float @tanf(float noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @tanl(x86_fp80 noundef) [[READNONE]]
+// NO__ERRNO: declare double @llvm.tan.f64(double) [[READNONE_INTRINSIC]]
+// NO__ERRNO: declare float @llvm.tan.f32(float) [[READNONE_INTRINSIC]]
+// NO__ERRNO: declare x86_fp80 @llvm.tan.f80(x86_fp80) [[READNONE_INTRINSIC]]
 // HAS_ERRNO: declare double @tan(double noundef) [[NOT_READNONE]]
 // HAS_ERRNO: declare float @tanf(float noundef) [[NOT_READNONE]]
 // HAS_ERRNO: declare x86_fp80 @tanl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @tan(double noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @tanf(float noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @tanl(x86_fp80 noundef) [[NOT_READNONE]]
+// HAS_MAYTRAP: declare double @llvm.experimental.constrained.tan.f64(
+// HAS_MAYTRAP: declare float @llvm.experimental.constrained.tan.f32(
+// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.tan.f80(
 
   tanh(f);       tanhf(f);      tanhl(f);
 
diff --git a/clang/test/CodeGenOpenCL/builtins-f16.cl b/clang/test/CodeGenOpenCL/builtins-f16.cl
index adf7cdde154f5..d7bffdad5c548 100644
--- a/clang/test/CodeGenOpenCL/builtins-f16.cl
+++ b/clang/test/CodeGenOpenCL/builtins-f16.cl
@@ -66,6 +66,9 @@ void test_half_builtins(half h0, half h1, half h2, int i0) {
   // CHECK: call half @llvm.sqrt.f16(half %h0)
   res = __builtin_sqrtf16(h0);
 
+  // CHECK: call half @llvm.tan.f16(half %h0)
+  res = __builtin_tanf16(h0);
+
   // CHECK: call half @llvm.trunc.f16(half %h0)
   res = __builtin_truncf16(h0);
 
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 9d7ade8eb523b..08a41170d9b9b 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -26231,6 +26231,42 @@ same values as the libm ``cos`` functions would, and handles error
 conditions in the same way.
 
 
+'``llvm.experimental.constrained.tan``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      declare <type>
+      @llvm.experimental.constrained.tan(<type> <op1>,
+                                         metadata <rounding mode>,
+                                         metadata <exception behavior>)
+
+Overview:
+"""""""""
+
+The '``llvm.experimental.constrained.tan``' intrinsic returns the tangent of the
+first operand.
+
+Arguments:
+""""""""""
+
+The first argument and the return type are floating-point numbers of the same
+type.
+
+The second and third arguments specify the rounding mode and exception
+behavior as described above.
+
+Semantics:
+""""""""""
+
+This function returns the tangent of the specified operand, returning the
+same values as the libm ``tan`` functions would, and handles error
+conditions in the same way.
+
+
 '``llvm.experimental.constrained.exp``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/llvm/include/llvm/IR/ConstrainedOps.def b/llvm/include/llvm/IR/ConstrainedOps.def
index 41aa44de957f9..a7b37c5cb204d 100644
--- a/llvm/include/llvm/IR/ConstrainedOps.def
+++ b/llvm/include/llvm/IR/ConstrainedOps.def
@@ -95,6 +95,7 @@ DAG_FUNCTION(round,           1, 0, experimental_constrained_round,      FROUND)
 DAG_FUNCTION(roundeven,       1, 0, experimental_constrained_roundeven,  FROUNDEVEN)
 DAG_FUNCTION(sin,             1, 1, experimental_constrained_sin,        FSIN)
 DAG_FUNCTION(sqrt,            1, 1, experimental_constrained_sqrt,       FSQRT)
+DAG_FUNCTION(tan,             1, 1, experimental_constrained_tan,        FTAN)
 DAG_FUNCTION(trunc,           1, 0, experimental_constrained_trunc,      FTRUNC)
 
 // This is definition for fmuladd intrinsic function, that is converted into
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 107442623ab7b..4c506a6ace23e 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -1218,6 +1218,10 @@ let IntrProperties = [IntrInaccessibleMemOnly, IntrWillReturn, IntrStrictFP] in
                                                     [ LLVMMatchType<0>,
                                                       llvm_metadata_ty,
                                                       llvm_metadata_ty ]>;
+  def int_experimental_constrained_tan  : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
+                                                    [ LLVMMatchType<0>,
+                                                      llvm_metadata_ty,
+                                                      llvm_metadata_ty ]>;
   def int_experimental_constrained_pow  : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
                                                     [ LLVMMatchType<0>,
                                                       LLVMMatchType<0>,
diff --git a/llvm/test/Assembler/fp-intrinsics-attr.ll b/llvm/test/Assembler/fp-intrinsics-attr.ll
index 6546d1a275c99..613630e1a2b4d 100644
--- a/llvm/test/Assembler/fp-intrinsics-attr.ll
+++ b/llvm/test/Assembler/fp-intrinsics-attr.ll
@@ -85,6 +85,11 @@ define void @func(double %a, double %b, double %c, i32 %i) strictfp {
                                                metadata !"round.dynamic",
                                                metadata !"fpexcept.strict")
 
+  %tan = call double @llvm.experimental.constrained.tan.f64(
+                                               double %a,
+                                               metadata !"round.dynamic",
+                                               metadata !"fpexcept.strict")
+
   %pow = call double @llvm.experimental.constrained.pow.f64(
                                                double %a, double %b,
                                                metadata !"round.dynamic",
@@ -244,6 +249,9 @@ declare double @llvm.experimental.constrained.sin.f64(double, metadata, metadata
 declare double @llvm.experimental.constrained.cos.f64(double, metadata, metadata)
 ; CHECK: @llvm.experimental.constrained.cos.f64({{.*}}) #[[ATTR1]]
 
+declare double @llvm.experimental.constrained.tan.f64(double, metadata, metadata)
+; CHECK: @llvm.experimental.constrained.tan.f64({{.*}}) #[[ATTR1]]
+
 declare double @llvm.experimental.constrained.pow.f64(double, double, metadata, metadata)
 ; CHECK: @llvm.experimental.constrained.pow.f64({{.*}}) #[[ATTR1]]
 
diff --git a/llvm/test/Feature/fp-intrinsics.ll b/llvm/test/Feature/fp-intrinsics.ll
index b92408a1bf1cd..7759813dc2e11 100644
--- a/llvm/test/Feature/fp-intrinsics.ll
+++ b/llvm/test/Feature/fp-intrinsics.ll
@@ -151,6 +151,17 @@ entry:
   ret double %result
 }
 
+; Verify that tan(42.0) isn't simplified when the rounding mode is unknown.
+; CHECK-LABEL: ftan
+; CHECK: call double @llvm.experimental.constrained.tan
+define double @ftan() #0 {
+entry:
+  %result = call double @llvm.experimental.constrained.tan.f64(double 42.0,
+                                               metadata !"round.dynamic",
+                                               metadata !"fpexcept.strict") #0
+  ret double %result
+}
+
 ; Verify that exp(42.0) isn't simplified when the rounding mode is unknown.
 ; CHECK-LABEL: f10
 ; CHECK: call double @llvm.experimental.constrained.exp

@llvmbot
Copy link
Member

llvmbot commented Jun 6, 2024

@llvm/pr-subscribers-llvm-ir

Author: Farzon Lotfi (farzonl)

Changes

Relanding this PR now that #90503 has merged. with FTAN landing in TargetLoweringBase.cpp:L1021 There is now a llvm tan intrinsic 32\64\128 Expand case for all llvm backends.

In LLVM, the llvm.experimental.constrained.cos and llvm.experimental.constrained.sin intrinsics are used for performing cosine and sine calculations with additional constraints on floating-point operations. This behavior is expected for all floating-point math intrinsics. This change adds these constraints for the tan intrinsic.

  • Builtins.td - replace TanF128 with F16F128MathTemplate
  • CGBuiltin.cpp - map existing tan builtins to tan and constrained_tan intrinsic
  • ConstrainedOps.def map tan and constrained_tan to an ISDOpcode.
  • ISDOpcodes.h - define tan and strict tan opcodes

resolves #91421


Full diff: https://github.com/llvm/llvm-project/pull/94559.diff

11 Files Affected:

  • (modified) clang/include/clang/Basic/Builtins.td (+3-3)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+12)
  • (modified) clang/test/CodeGen/X86/math-builtins.c (+4-4)
  • (modified) clang/test/CodeGen/constrained-math-builtins.c (+13)
  • (modified) clang/test/CodeGen/math-libcalls.c (+6-6)
  • (modified) clang/test/CodeGenOpenCL/builtins-f16.cl (+3)
  • (modified) llvm/docs/LangRef.rst (+36)
  • (modified) llvm/include/llvm/IR/ConstrainedOps.def (+1)
  • (modified) llvm/include/llvm/IR/Intrinsics.td (+4)
  • (modified) llvm/test/Assembler/fp-intrinsics-attr.ll (+8)
  • (modified) llvm/test/Feature/fp-intrinsics.ll (+11)
diff --git a/clang/include/clang/Basic/Builtins.td b/clang/include/clang/Basic/Builtins.td
index 11982af3fa609..7bef5fd7ad40f 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -482,11 +482,11 @@ def SqrtF16F128 : Builtin, F16F128MathTemplate {
   let Prototype = "T(T)";
 }
 
-def TanF128 : Builtin {
-  let Spellings = ["__builtin_tanf128"];
+def TanF16F128 : Builtin, F16F128MathTemplate {
+  let Spellings = ["__builtin_tan"];
   let Attributes = [FunctionWithBuiltinPrefix, NoThrow,
                     ConstIgnoringErrnoAndExceptions];
-  let Prototype = "__float128(__float128)";
+  let Prototype = "T(T)";
 }
 
 def TanhF128 : Builtin {
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 37d0c478e0330..a76ce82830d8f 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -2923,6 +2923,18 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
       SetSqrtFPAccuracy(Call);
       return RValue::get(Call);
     }
+
+    case Builtin::BItan:
+    case Builtin::BItanf:
+    case Builtin::BItanl:
+    case Builtin::BI__builtin_tan:
+    case Builtin::BI__builtin_tanf:
+    case Builtin::BI__builtin_tanf16:
+    case Builtin::BI__builtin_tanl:
+    case Builtin::BI__builtin_tanf128:
+      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
+          *this, E, Intrinsic::tan, Intrinsic::experimental_constrained_tan));
+
     case Builtin::BItrunc:
     case Builtin::BItruncf:
     case Builtin::BItruncl:
diff --git a/clang/test/CodeGen/X86/math-builtins.c b/clang/test/CodeGen/X86/math-builtins.c
index 093239b448260..1e0f129b98610 100644
--- a/clang/test/CodeGen/X86/math-builtins.c
+++ b/clang/test/CodeGen/X86/math-builtins.c
@@ -674,10 +674,10 @@ __builtin_sqrt(f);       __builtin_sqrtf(f);      __builtin_sqrtl(f); __builtin_
 
 __builtin_tan(f);        __builtin_tanf(f);       __builtin_tanl(f); __builtin_tanf128(f);
 
-// NO__ERRNO: declare double @tan(double noundef) [[READNONE]]
-// NO__ERRNO: declare float @tanf(float noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @tanl(x86_fp80 noundef) [[READNONE]]
-// NO__ERRNO: declare fp128 @tanf128(fp128 noundef) [[READNONE]]
+// NO__ERRNO: declare double @llvm.tan.f64(double) [[READNONE_INTRINSIC]]
+// NO__ERRNO: declare float @llvm.tan.f32(float) [[READNONE_INTRINSIC]]
+// NO__ERRNO: declare x86_fp80 @llvm.tan.f80(x86_fp80) [[READNONE_INTRINSIC]]
+// NO__ERRNO: declare fp128 @llvm.tan.f128(fp128) [[READNONE_INTRINSIC]]
 // HAS_ERRNO: declare double @tan(double noundef) [[NOT_READNONE]]
 // HAS_ERRNO: declare float @tanf(float noundef) [[NOT_READNONE]]
 // HAS_ERRNO: declare x86_fp80 @tanl(x86_fp80 noundef) [[NOT_READNONE]]
diff --git a/clang/test/CodeGen/constrained-math-builtins.c b/clang/test/CodeGen/constrained-math-builtins.c
index 2de832dd2b6ca..6cc3a10a1e794 100644
--- a/clang/test/CodeGen/constrained-math-builtins.c
+++ b/clang/test/CodeGen/constrained-math-builtins.c
@@ -183,6 +183,14 @@ void foo(double *d, float f, float *fp, long double *l, int *i, const char *c, _
 // CHECK: call x86_fp80 @llvm.experimental.constrained.sqrt.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 // CHECK: call fp128 @llvm.experimental.constrained.sqrt.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
+  __builtin_tan(f);        __builtin_tanf(f);       __builtin_tanl(f); __builtin_tanf128(f);
+
+// CHECK: call double @llvm.experimental.constrained.tan.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+// CHECK: call float @llvm.experimental.constrained.tan.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+// CHECK: call x86_fp80 @llvm.experimental.constrained.tan.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+// CHECK: call fp128 @llvm.experimental.constrained.tan.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+
+
   __builtin_trunc(f);      __builtin_truncf(f);     __builtin_truncl(f); __builtin_truncf128(f);
 
 // CHECK: call double @llvm.experimental.constrained.trunc.f64(double %{{.*}}, metadata !"fpexcept.strict")
@@ -315,6 +323,11 @@ void foo(double *d, float f, float *fp, long double *l, int *i, const char *c, _
 // CHECK: declare x86_fp80 @llvm.experimental.constrained.sqrt.f80(x86_fp80, metadata, metadata)
 // CHECK: declare fp128 @llvm.experimental.constrained.sqrt.f128(fp128, metadata, metadata)
 
+// CHECK: declare double @llvm.experimental.constrained.tan.f64(double, metadata, metadata)
+// CHECK: declare float @llvm.experimental.constrained.tan.f32(float, metadata, metadata)
+// CHECK: declare x86_fp80 @llvm.experimental.constrained.tan.f80(x86_fp80, metadata, metadata)
+// CHECK: declare fp128 @llvm.experimental.constrained.tan.f128(fp128, metadata, metadata)
+
 // CHECK: declare double @llvm.experimental.constrained.trunc.f64(double, metadata)
 // CHECK: declare float @llvm.experimental.constrained.trunc.f32(float, metadata)
 // CHECK: declare x86_fp80 @llvm.experimental.constrained.trunc.f80(x86_fp80, metadata)
diff --git a/clang/test/CodeGen/math-libcalls.c b/clang/test/CodeGen/math-libcalls.c
index 29c312ba0ecac..a249182692762 100644
--- a/clang/test/CodeGen/math-libcalls.c
+++ b/clang/test/CodeGen/math-libcalls.c
@@ -662,15 +662,15 @@ void foo(double *d, float f, float *fp, long double *l, int *i, const char *c) {
 
   tan(f);        tanf(f);       tanl(f);
 
-// NO__ERRNO: declare double @tan(double noundef) [[READNONE]]
-// NO__ERRNO: declare float @tanf(float noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @tanl(x86_fp80 noundef) [[READNONE]]
+// NO__ERRNO: declare double @llvm.tan.f64(double) [[READNONE_INTRINSIC]]
+// NO__ERRNO: declare float @llvm.tan.f32(float) [[READNONE_INTRINSIC]]
+// NO__ERRNO: declare x86_fp80 @llvm.tan.f80(x86_fp80) [[READNONE_INTRINSIC]]
 // HAS_ERRNO: declare double @tan(double noundef) [[NOT_READNONE]]
 // HAS_ERRNO: declare float @tanf(float noundef) [[NOT_READNONE]]
 // HAS_ERRNO: declare x86_fp80 @tanl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @tan(double noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @tanf(float noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @tanl(x86_fp80 noundef) [[NOT_READNONE]]
+// HAS_MAYTRAP: declare double @llvm.experimental.constrained.tan.f64(
+// HAS_MAYTRAP: declare float @llvm.experimental.constrained.tan.f32(
+// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.tan.f80(
 
   tanh(f);       tanhf(f);      tanhl(f);
 
diff --git a/clang/test/CodeGenOpenCL/builtins-f16.cl b/clang/test/CodeGenOpenCL/builtins-f16.cl
index adf7cdde154f5..d7bffdad5c548 100644
--- a/clang/test/CodeGenOpenCL/builtins-f16.cl
+++ b/clang/test/CodeGenOpenCL/builtins-f16.cl
@@ -66,6 +66,9 @@ void test_half_builtins(half h0, half h1, half h2, int i0) {
   // CHECK: call half @llvm.sqrt.f16(half %h0)
   res = __builtin_sqrtf16(h0);
 
+  // CHECK: call half @llvm.tan.f16(half %h0)
+  res = __builtin_tanf16(h0);
+
   // CHECK: call half @llvm.trunc.f16(half %h0)
   res = __builtin_truncf16(h0);
 
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 9d7ade8eb523b..08a41170d9b9b 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -26231,6 +26231,42 @@ same values as the libm ``cos`` functions would, and handles error
 conditions in the same way.
 
 
+'``llvm.experimental.constrained.tan``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      declare <type>
+      @llvm.experimental.constrained.tan(<type> <op1>,
+                                         metadata <rounding mode>,
+                                         metadata <exception behavior>)
+
+Overview:
+"""""""""
+
+The '``llvm.experimental.constrained.tan``' intrinsic returns the tangent of the
+first operand.
+
+Arguments:
+""""""""""
+
+The first argument and the return type are floating-point numbers of the same
+type.
+
+The second and third arguments specify the rounding mode and exception
+behavior as described above.
+
+Semantics:
+""""""""""
+
+This function returns the tangent of the specified operand, returning the
+same values as the libm ``tan`` functions would, and handles error
+conditions in the same way.
+
+
 '``llvm.experimental.constrained.exp``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/llvm/include/llvm/IR/ConstrainedOps.def b/llvm/include/llvm/IR/ConstrainedOps.def
index 41aa44de957f9..a7b37c5cb204d 100644
--- a/llvm/include/llvm/IR/ConstrainedOps.def
+++ b/llvm/include/llvm/IR/ConstrainedOps.def
@@ -95,6 +95,7 @@ DAG_FUNCTION(round,           1, 0, experimental_constrained_round,      FROUND)
 DAG_FUNCTION(roundeven,       1, 0, experimental_constrained_roundeven,  FROUNDEVEN)
 DAG_FUNCTION(sin,             1, 1, experimental_constrained_sin,        FSIN)
 DAG_FUNCTION(sqrt,            1, 1, experimental_constrained_sqrt,       FSQRT)
+DAG_FUNCTION(tan,             1, 1, experimental_constrained_tan,        FTAN)
 DAG_FUNCTION(trunc,           1, 0, experimental_constrained_trunc,      FTRUNC)
 
 // This is definition for fmuladd intrinsic function, that is converted into
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 107442623ab7b..4c506a6ace23e 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -1218,6 +1218,10 @@ let IntrProperties = [IntrInaccessibleMemOnly, IntrWillReturn, IntrStrictFP] in
                                                     [ LLVMMatchType<0>,
                                                       llvm_metadata_ty,
                                                       llvm_metadata_ty ]>;
+  def int_experimental_constrained_tan  : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
+                                                    [ LLVMMatchType<0>,
+                                                      llvm_metadata_ty,
+                                                      llvm_metadata_ty ]>;
   def int_experimental_constrained_pow  : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
                                                     [ LLVMMatchType<0>,
                                                       LLVMMatchType<0>,
diff --git a/llvm/test/Assembler/fp-intrinsics-attr.ll b/llvm/test/Assembler/fp-intrinsics-attr.ll
index 6546d1a275c99..613630e1a2b4d 100644
--- a/llvm/test/Assembler/fp-intrinsics-attr.ll
+++ b/llvm/test/Assembler/fp-intrinsics-attr.ll
@@ -85,6 +85,11 @@ define void @func(double %a, double %b, double %c, i32 %i) strictfp {
                                                metadata !"round.dynamic",
                                                metadata !"fpexcept.strict")
 
+  %tan = call double @llvm.experimental.constrained.tan.f64(
+                                               double %a,
+                                               metadata !"round.dynamic",
+                                               metadata !"fpexcept.strict")
+
   %pow = call double @llvm.experimental.constrained.pow.f64(
                                                double %a, double %b,
                                                metadata !"round.dynamic",
@@ -244,6 +249,9 @@ declare double @llvm.experimental.constrained.sin.f64(double, metadata, metadata
 declare double @llvm.experimental.constrained.cos.f64(double, metadata, metadata)
 ; CHECK: @llvm.experimental.constrained.cos.f64({{.*}}) #[[ATTR1]]
 
+declare double @llvm.experimental.constrained.tan.f64(double, metadata, metadata)
+; CHECK: @llvm.experimental.constrained.tan.f64({{.*}}) #[[ATTR1]]
+
 declare double @llvm.experimental.constrained.pow.f64(double, double, metadata, metadata)
 ; CHECK: @llvm.experimental.constrained.pow.f64({{.*}}) #[[ATTR1]]
 
diff --git a/llvm/test/Feature/fp-intrinsics.ll b/llvm/test/Feature/fp-intrinsics.ll
index b92408a1bf1cd..7759813dc2e11 100644
--- a/llvm/test/Feature/fp-intrinsics.ll
+++ b/llvm/test/Feature/fp-intrinsics.ll
@@ -151,6 +151,17 @@ entry:
   ret double %result
 }
 
+; Verify that tan(42.0) isn't simplified when the rounding mode is unknown.
+; CHECK-LABEL: ftan
+; CHECK: call double @llvm.experimental.constrained.tan
+define double @ftan() #0 {
+entry:
+  %result = call double @llvm.experimental.constrained.tan.f64(double 42.0,
+                                               metadata !"round.dynamic",
+                                               metadata !"fpexcept.strict") #0
+  ret double %result
+}
+
 ; Verify that exp(42.0) isn't simplified when the rounding mode is unknown.
 ; CHECK-LABEL: f10
 ; CHECK: call double @llvm.experimental.constrained.exp

@farzonl
Copy link
Member Author

farzonl commented Jun 6, 2024

@ilovepi I minimised the failing issue from: https://fuchsia.googlesource.com/fuchsia/+/refs/heads/main/src/graphics/lib/compute/spinel/ext/transform_stack/transform_stack.c
to:

#include <math.h>
#include <stdio.h>

#define SPN_TRANSFORM_STACK_TAN(x_)  tanf(x_)

float
spinel_transform_stack_push_skew_x(float theta)
{
  float const tan_theta = SPN_TRANSFORM_STACK_TAN(theta);
  return tan_theta;
}

int main() {
    printf("%f\n", spinel_transform_stack_push_skew_x(1.05f));
    return 0;
}

run like so:

farzon@devbox: projects/llvm-project$ <build_dir>/bin/clang  <test_path>/test.c -lm -o  <test_out_path>/test.out
farzon@devbox: projects/llvm-project$ ./<test_out_path>/test.out 
1.743315

The intrinsic should be able to be lowered now.

@bob80905
Copy link
Contributor

bob80905 commented Jun 7, 2024

Should a test be added when the return type is a different float type than the first arg type?

@ilovepi
Copy link
Contributor

ilovepi commented Jun 7, 2024

@ilovepi I minimised the failing issue from: https://fuchsia.googlesource.com/fuchsia/+/refs/heads/main/src/graphics/lib/compute/spinel/ext/transform_stack/transform_stack.c to:

#include <math.h>
#include <stdio.h>

#define SPN_TRANSFORM_STACK_TAN(x_)  tanf(x_)

float
spinel_transform_stack_push_skew_x(float theta)
{
  float const tan_theta = SPN_TRANSFORM_STACK_TAN(theta);
  return tan_theta;
}

int main() {
    printf("%f\n", spinel_transform_stack_push_skew_x(1.05f));
    return 0;
}

run like so:

farzon@devbox: projects/llvm-project$ <build_dir>/bin/clang  <test_path>/test.c -lm -o  <test_out_path>/test.out
farzon@devbox: projects/llvm-project$ ./<test_out_path>/test.out 
1.743315

The intrinsic should be able to be lowered now.

Thanks. This seems fine to me. I'll defer to the other reviewers on everything else, but I'm satisfied with the new test. so LGTM from my perspective.

@farzonl farzonl force-pushed the add-tan-constrained-intrinsic branch from ce4e2a0 to 35bbeed Compare June 7, 2024 20:19
@farzonl
Copy link
Member Author

farzonl commented Jun 7, 2024

Should a test be added when the return type is a different float type than the first arg type?

So if our function that calls the builtin has a different return type than the input that we pass in to the builtin then that will be handled by other casting code that has its own tests so no.

Second way this could happen if tan had different return type than from input type but in this case input dictates return type for tan. So the answer again would be no.

2. add arm64 and x86 tests.
3. fix missing ISD::STRICT_FTAN in aarch64
@farzonl farzonl force-pushed the add-tan-constrained-intrinsic branch from 35bbeed to af44e78 Compare June 7, 2024 23:11
@farzonl
Copy link
Member Author

farzonl commented Jun 10, 2024

@efriedma-quic If you have time could you take a look at this pr. It is the same as the one you review just with more tests across more targets. Problem from the first merge was because the target base changes were tied up in the x86 backend change. Since those have merged this pr should work just fine now.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@farzonl farzonl merged commit 189d471 into llvm:main Jun 11, 2024
10 checks passed
@farzonl
Copy link
Member Author

farzonl commented Jun 11, 2024

This broke building wasi-libc, with a crash:

Stack dump:
0.	Program arguments: /builds/worker/fetches/clang/bin/clang-19 -cc1 -triple wasm32-unknown-wasi -emit-obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name ctanh.c -mrelocation-model static -mthread-model single -mframe-pointer=none -ffp-contract=on -fno-rounding-math -ffp-exception-behavior=ignore -mconstructor-aliases -target-cpu generic -fvisibility=hidden -debugger-tuning=gdb -fdebug-compilation-dir=/builds/worker/fetches/wasi-sdk/src/wasi-libc -fcoverage-compilation-dir=/builds/worker/fetches/wasi-sdk/src/wasi-libc -sys-header-deps -D NDEBUG -O2 -Wall -Wextra -Wno-null-pointer-arithmetic -Wno-unused-parameter -Wno-sign-compare -Wno-unused-variable -Wno-unused-function -Wno-ignored-attributes -Wno-missing-braces -Wno-ignored-pragmas -Wno-unused-but-set-variable -Wno-unknown-warning-option -Wno-parentheses -Wno-shift-op-parentheses -Wno-bitwise-op-parentheses -Wno-logical-op-parentheses -Wno-string-plus-int -Wno-dangling-else -Wno-unknown-pragmas -ferror-limit 19 -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -vectorize-loops -vectorize-slp -x c ctanh-46a6c8.c
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module 'ctanh-46a6c8.c'.
4.	Running pass 'WebAssembly Assembly Printer' on function '@ctanh'
 #0 0x00007f2205ba35cd llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /tmp/llvm/llvm/lib/Support/Unix/Signals.inc:723:11
 #1 0x00007f2205ba3abb PrintStackTraceSignalHandler(void*) /tmp/llvm/llvm/lib/Support/Unix/Signals.inc:798:1
 #2 0x00007f2205ba1b26 llvm::sys::RunSignalHandlers() /tmp/llvm/llvm/lib/Support/Signals.cpp:105:5
 #3 0x00007f2205ba4275 SignalHandler(int) /tmp/llvm/llvm/lib/Support/Unix/Signals.inc:413:1
 #4 0x00007f2203ff4f90 (/lib/x86_64-linux-gnu/libc.so.6+0x3bf90)
 #5 0x00007f2204043ccc (/lib/x86_64-linux-gnu/libc.so.6+0x8accc)
 #6 0x00007f2203ff4ef2 raise (/lib/x86_64-linux-gnu/libc.so.6+0x3bef2)
 #7 0x00007f2203fdf472 abort (/lib/x86_64-linux-gnu/libc.so.6+0x26472)
 #8 0x00007f2205a7f300 llvm::install_out_of_memory_new_handler() /tmp/llvm/llvm/lib/Support/ErrorHandling.cpp:194:0
 #9 0x00007f220ba5052d llvm::WebAssembly::getLibcallSignature(llvm::WebAssemblySubtarget const&, llvm::StringRef, llvm::SmallVectorImpl<llvm::wasm::ValType>&, llvm::SmallVectorImpl<llvm::wasm::ValType>&) /tmp/llvm/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp:907:30
#10 0x00007f220b9c4e0f llvm::WebAssemblyAsmPrinter::getOrCreateWasmSymbol(llvm::StringRef) /tmp/llvm/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp:0:5
#11 0x00007f220b9c4f46 llvm::WebAssemblyAsmPrinter::emitDecls(llvm::Module const&) /tmp/llvm/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp:310:40
#12 0x00007f220b9c7179 llvm::WebAssemblyAsmPrinter::emitConstantPool() /tmp/llvm/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp:604:3
#13 0x00007f2206ee9526 llvm::AsmPrinter::emitFunctionHeader() /tmp/llvm/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:966:3

This file contains the script and preprocessed source that reproduce the problem: ctanh.zip

This seems to be because the WASM backend does some non standard things setting up a libcall to type signature map. I put up a fix #95082. I typically would just revert, but the fix is so compact i think it makes more sense to merge a fix to unblock. That said i'm not a regular WASM contributor so i'm going to go back to bed and wait till i get some guidance from those folks.

farzonl added a commit that referenced this pull request Jun 11, 2024
…reTable (#95082)

The wasm backend fetches the tan runtime lib call in
`llvm/include/llvm/IR/RuntimeLibcalls.def` via `StaticLibcallNameMap()`,
but ignores the runtime function because a function sinature mapping is
not specified in RuntimeLibcallSignatureTable(). The fix is to specify
the function signatures for float32-128.

This is a fix for a build break reported on PR
#94559 (comment).
Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Jun 12, 2024
…rinsic (llvm#94559)

Relanding this PR now that
llvm#90503 has merged. with `FTAN`
landing in
[TargetLoweringBase.cpp:L1021](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/TargetLoweringBase.cpp#L1020C23-L1021C63
) There is now a llvm tan intrinsic 32\64\128 Expand case for all llvm
backends.

In LLVM, the `llvm.experimental.constrained.cos` and
`llvm.experimental.constrained.sin` intrinsics are used for performing
cosine and sine calculations with additional constraints on
floating-point operations. This behavior is expected for all
floating-point math intrinsics. This change adds these constraints for
the `tan` intrinsic.

-  `Builtins.td` - replace TanF128 with F16F128MathTemplate
- `CGBuiltin.cpp` - map existing tan builtins to `tan` and
`constrained_tan` intrinsic
-   `ConstrainedOps.def` map tan and constrained_tan  to an ISDOpcode.

resolves  llvm#91421

---------

Co-authored-by: Farzon Lotfi <[email protected]>
@aeubanks
Copy link
Contributor

I believe this has exposed a preexisting issue in isel with tan:

$ cat /tmp/b.ll
target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "thumbv7-unknown-linux-android26"

define <2 x float> @g() {
entry:
  %0 = call <2 x float> @llvm.tan.v2f32(<2 x float> zeroinitializer)
  ret <2 x float> %0
}
$ llc /tmp/b.ll -o /dev/null
LLVM ERROR: Cannot select: t3: v2f32 = ftan t15                                                                                                                                                                                                                                                                    
  t15: v2f32 = bitcast t14                                                                                                                                                                                                                                                                                         
    t14: v2i32 = ARMISD::VMOVIMM TargetConstant:i32<0>                                                                                                   
      t13: i32 = TargetConstant<0>                                                                                                                                                                                                                                                                                 

@farzonl
Copy link
Member Author

farzonl commented Jun 12, 2024

@aeubanks The issue you are seeing here is because only aarch64 and x86 backends have tan intrinsic lowering support.
changing one line in your example from "thumbv7-unknown-linux-android26" to "aarch64-unknown-linux-android26" or "x86_64-unknown-linux-gnu" will show this to be the case.

This shouldn't be a regression because there never was a llvm.tan.v2f32 for ARMv7. That still needs to be added. Can you show me how you could have generated llvm.tan.v2f32 for ARMv7 from clang?

By default TargetLoweringBase.cpp:1016-1022 only supports lowering for f32, f64, and f128 scalars across all backends.

The builtins that were added (Builtin::BItanf, Builtin::BItanl, Builtin::BI__builtin_tan, Builtin::BI__builtin_tanf, Builtin::BI__builtin_tanf16, Builtin::BI__builtin_tanl, and Builtin::BI__builtin_tanf128) should only cover scalar cases.

For alll other intrinsic types from f16 to the vectors types, those need to be explicitly supported.

So i'm not sure how this pr could have generate the intrinsic you have reported unless you used __builtin_elementwise_tan, but that builtin is only being used by HLSL.

@aeubanks
Copy link
Contributor

SLPVectorizer can introduce llvm.tan.v2f32. For example, running opt -O3 on the following introduces llvm.tan.v2f32

target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "thumbv7-unknown-linux-android26"

; Function Attrs: null_pointer_is_valid
define void @foo() #0 {
bb:
  %call = call float @pluto(float 1.000000e+00)
  %call1 = call i1 @ham(float %call)
  %call2 = call float @pluto(float 0.000000e+00)
  %call3 = call i1 @ham(float %call2)
  %select = select i1 %call3, float 0.000000e+00, float 1.000000e+00
  %select4 = select i1 %call1, float 0.000000e+00, float 1.000000e+00
  store float %select, ptr null, align 4
  %call5 = call ptr null(ptr null, float %select4, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00)
  ret void
}

define float @pluto(float %arg) {
bb:
  %call = call float @llvm.tan.f32(float %arg)
  ret float %call
}

define i1 @ham(float %arg) {
bb:
  %fcmp = fcmp ole float %arg, 0.000000e+00
  ret i1 %fcmp
}

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare float @llvm.tan.f32(float) #1

attributes #0 = { null_pointer_is_valid }
attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }

@farzonl
Copy link
Member Author

farzonl commented Jun 13, 2024

It seems like we have four options here. We can drop the def Tan : FPMathTemplate, LibBuiltin<"math.h"> builtins so no Builtin::BItanf or Builtin::BItanl in cgbuitlin switch case. That wouldn't solve the SLPVectorizer case but doesn't expose it either unless you use a clang_builtin function which should limit the impact. The tan library functions would behave as before.

Option 2 we land an arm7 backend for Tan. That could work for you, but the issue you raised with the SLPVectorizer case likely means this same issue exists for powerPC, RISCV, and potentially other backends.
current estimates are:
backends that need f16 support that don't have it

  • AMDGPU
  • ARM
  • NVPTX
  • RISCV
    -Maybe MipsSEI It has a f16 but there are no tests for it.

backends that need vector support that don't have it

  • RISCV
  • AMDGPU
  • ARM/Thumb2
  • PowerPC
  • WebAssembly

Option 3. we guard the emitter with a target check for wasm (maybe), x86, and aarch64

    case Builtin::BItanf:
    case Builtin::BItanl:
    case Builtin::BI__builtin_tan:
    case Builtin::BI__builtin_tanf:
    case Builtin::BI__builtin_tanf16:
    case Builtin::BI__builtin_tanl:
    case Builtin::BI__builtin_tanf128: {
      switch(CGF.getTarget().getTriple().getArch()) {
           case llvm::Triple::aarch64:
           case llvm::Triple::x86:
           case llvm::Triple::x86_64:
           case llvm::Triple::wasm32:
           case llvm::Triple::wasm64:
               return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
                      *this, E, Intrinsic::tan, Intrinsic::experimental_constrained_tan));
      }
}

Option 4 we revert this change, The test cases would go stale and it also means our plans for landing constraint intrinsics needs to go in the backlog, because it means constraint intrinsics don't make sense until tan has full support across all backends.

@efriedma-quic
Copy link
Collaborator

Can we change the target-independent bits of the tan() implementation in the backend so it doesn't require each target to explicitly request that tan() needs to be expanded? It should be possible to adjust the code in TargetLoweringBase.cpp a bit so FTAN defaults to being expanded for all types.

@davemgreen
Copy link
Collaborator

If you remove tan from isTriviallyVectorizable it should prevent vectorization in the short term.

It might be better to default FTAN to expand in

, which seems to only be done for f32/f64/f128 at the moment.

@farzonl
Copy link
Member Author

farzonl commented Jun 13, 2024

Can we change the target-independent bits of the tan() implementation in the backend so it doesn't require each target to explicitly request that tan() needs to be expanded? It should be possible to adjust the code in TargetLoweringBase.cpp a bit so FTAN defaults to being expanded for all types.

It might be better to default FTAN to expand in

, which seems to only be done for f32/f64/f128 at the moment.

I don't think we can default on vector types to Expand as it would expand on vector that are not supported for tan. For example VT in this case is MVT::all_valuetypes() That includes integer vectors.

for (MVT VT : MVT::all_valuetypes()) {

Even if it only included floats it would not necessarily be correct For example in ARM

const MVT FloatTypes[] = { MVT::v8f16, MVT::v4f32 };
for (auto VT : FloatTypes) {

some cases like MVE it is MVT::v8f16, MVT::v4f32
but for NEON MVT::v2f64, MVT::v4f32, and MVT::v2f32 are supported. Doing it per target seems intentional so as to have more precision so only intrinsics that are valid to lower get expanded.

@davemgreen
Copy link
Collaborator

Usually when new ISD nodes are added they are expanded for all types, so that every backend will get at least working code even if it is not optimal. The targets can then come along and override the defaults for the types they are interested in, to get better results.

For tan I would expect most vector types would want to scalarize, so marking them as expand would make sense. If more types than are necessary get marked as Expand that shouldn't be an issue, it looks like we already do that for a number of other nodes.

@HerrCai0907 HerrCai0907 mentioned this pull request Jun 13, 2024
@aeubanks
Copy link
Contributor

@alexey-bataev

@farzonl
Copy link
Member Author

farzonl commented Jun 13, 2024

Usually when new ISD nodes are added they are expanded for all types, so that every backend will get at least working code even if it is not optimal. The targets can then come along and override the defaults for the types they are interested in, to get better results.

For tan I would expect most vector types would want to scalarize, so marking them as expand would make sense. If more types than are necessary get marked as Expand that shouldn't be an issue, it looks like we already do that for a number of other nodes.

Should this be considered a termporary change until their is parity across alll backends? I'm trying to understand why no on else has done this for sin\cos\log\exp\etc.

@davemgreen
Copy link
Collaborator

I believe they were added so long ago that the default Expanding wasn't done at the time. @efriedma-quic do you have more of an idea than that?

@efriedma-quic
Copy link
Collaborator

Yes, I think it's just a historical mistake; sin/cos/log/exp were added a very long time ago, and we weren't as careful about that sort of thing. And nobody has taken the time to try to cleanup the current defaults.

@farzonl
Copy link
Member Author

farzonl commented Jun 14, 2024

PR #95518 should address all backends.

Also SLPVectorizer was disabled for tan so I put out PR #95517

farzonl added a commit that referenced this pull request Jun 27, 2024
#95517)

This PR is intended to address the limited SLPVectorizer support of tan
raised in the comments of this PR:
#94559.

Right now emitting the tan intrinsisic allows you to vectorize tan, but
emitting the libfunc does not. to address this the libcall needs to be
mapped to the intrinsic. and the libcall and function name need to be
marked approriately so they can be optimized or defined as a call
lowering.
lravenclaw pushed a commit to lravenclaw/llvm-project that referenced this pull request Jul 3, 2024
llvm#95517)

This PR is intended to address the limited SLPVectorizer support of tan
raised in the comments of this PR:
llvm#94559.

Right now emitting the tan intrinsisic allows you to vectorize tan, but
emitting the libfunc does not. to address this the libcall needs to be
mapped to the intrinsic. and the libcall and function name need to be
marked approriately so they can be optimized or defined as a call
lowering.
@farzonl farzonl deleted the add-tan-constrained-intrinsic branch July 14, 2024 08:51
farzonl added a commit that referenced this pull request Jul 19, 2024
…#98755)

## Change:
- WebAssemblyRuntimeLibcallSignatures.cpp: Expose the RTLIB's for use by
WASM
-  Add trig specific test cases

## History
This change is part of an implementation of
#87367 investigation on
supporting IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

This change adds wasm lowering cases for `acos`, `asin`, `atan`, `cosh`,
`sinh`, and `tanh`.

#70079
#70080
#70081
#70083
#70084
#95966

## Why Web Assembly?
From past changes to try and support constraint intrinsics the changes
to the trig builtins to emit intrinsics\constraint intrinsics broke the
WASM build. This is an attempt to preempt any such build break.

- #95082
-
#94559 (comment)
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
…#98755)

## Change:
- WebAssemblyRuntimeLibcallSignatures.cpp: Expose the RTLIB's for use by
WASM
-  Add trig specific test cases

## History
This change is part of an implementation of
#87367 investigation on
supporting IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

This change adds wasm lowering cases for `acos`, `asin`, `atan`, `cosh`,
`sinh`, and `tanh`.

#70079
#70080
#70081
#70083
#70084
#95966

## Why Web Assembly?
From past changes to try and support constraint intrinsics the changes
to the trig builtins to emit intrinsics\constraint intrinsics broke the
WASM build. This is an attempt to preempt any such build break.

- #95082
-
#94559 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 backend:ARM backend:PowerPC backend:SystemZ backend:X86 clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category llvm:ir
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[HLSL] Add experimental_constrained_tan intrinsic
8 participants