-
Notifications
You must be signed in to change notification settings - Fork 13.7k
[clang][CodeGen] Additional fixes for #114062 #128166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ace it with later AS cast.
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-codegen Author: Alex Voicu (AlexVlx) ChangesThis addresses two issues introduced by moving indirect args into an explicit AS (please see <#114062 (comment)> and <#114062 (comment)>):
The way we handle (2), by subverting CK_BitCast emission iff a sret arg is involved, is quite naff, but I couldn't think of any other way to use a non default indirect AS and make this case work (hopefully this is a failure of imagination on my part). Full diff: https://github.com/llvm/llvm-project/pull/128166.diff 5 Files Affected:
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 47bfd470dbafb..b5a4435e75ea0 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -5163,6 +5163,20 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
}
}
if (IRFunctionArgs.hasSRetArg()) {
+ // A mismatch between the allocated return value's AS and the target's
+ // chosen IndirectAS can happen e.g. when passing the this pointer through
+ // a chain involving stores to / loads from the DefaultAS; we address this
+ // here, symmetrically with the handling we have for normal pointer args.
+ if (SRetPtr.getAddressSpace() != RetAI.getIndirectAddrSpace())
+ SRetPtr = SRetPtr.withPointer(
+ getTargetHooks().performAddrSpaceCast(
+ *this, SRetPtr.getBasePointer(),
+ getLangASFromTargetAS(SRetPtr.getAddressSpace()),
+ getLangASFromTargetAS(RetAI.getIndirectAddrSpace()),
+ llvm::PointerType::get(getLLVMContext(),
+ RetAI.getIndirectAddrSpace()),
+ true),
+ SRetPtr.isKnownNonNull());
IRCallArgs[IRFunctionArgs.getSRetArgNo()] =
getAsNaturalPointerTo(SRetPtr, RetTy);
} else if (RetAI.isInAlloca()) {
@@ -5394,9 +5408,7 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
V->getType()->isIntegerTy())
V = Builder.CreateZExt(V, ArgInfo.getCoerceToType());
- // The only plausible mismatch here would be for pointer address spaces,
- // which can happen e.g. when passing a sret arg that is in the AllocaAS
- // to a function that takes a pointer to and argument in the DefaultAS.
+ // The only plausible mismatch here would be for pointer address spaces.
// We assume that the target has a reasonable mapping for the DefaultAS
// (it can be casted to from incoming specific ASes), and insert an AS
// cast to address the mismatch.
diff --git a/clang/lib/CodeGen/CGExprAgg.cpp b/clang/lib/CodeGen/CGExprAgg.cpp
index 625ca363d9019..200fb7fd756fd 100644
--- a/clang/lib/CodeGen/CGExprAgg.cpp
+++ b/clang/lib/CodeGen/CGExprAgg.cpp
@@ -302,15 +302,8 @@ void AggExprEmitter::withReturnValueSlot(
llvm::Value *LifetimeSizePtr = nullptr;
llvm::IntrinsicInst *LifetimeStartInst = nullptr;
if (!UseTemp) {
- // It is possible for the existing slot we are using directly to have been
- // allocated in the correct AS for an indirect return, and then cast to
- // the default AS (this is the behaviour of CreateMemTemp), however we know
- // that the return address is expected to point to the uncasted AS, hence we
- // strip possible pointer casts here.
- if (Dest.getAddress().isValid())
- RetAddr = Dest.getAddress().withPointer(
- Dest.getAddress().getBasePointer()->stripPointerCasts(),
- Dest.getAddress().isKnownNonNull());
+ RetAddr = Dest.getAddress();
+ RawAddress RetAllocaAddr = RawAddress::invalid();
} else {
RetAddr = CGF.CreateMemTempWithoutCast(RetTy, "tmp");
llvm::TypeSize Size =
diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index 5ee8a1bfa8175..7f8d438e36d6a 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -30,6 +30,7 @@
#include "clang/Basic/CodeGenOptions.h"
#include "clang/Basic/TargetInfo.h"
#include "llvm/ADT/APFixedPoint.h"
+#include "llvm/IR/Argument.h"
#include "llvm/IR/CFG.h"
#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"
@@ -2352,6 +2353,22 @@ Value *ScalarExprEmitter::VisitCastExpr(CastExpr *CE) {
Value *Src = Visit(const_cast<Expr*>(E));
llvm::Type *SrcTy = Src->getType();
llvm::Type *DstTy = ConvertType(DestTy);
+
+ // FIXME: this is a gross but seemingly necessary workaround for an issue
+ // manifesting when a target uses a non-default AS for indirect sret args,
+ // but the source HLL is generic, wherein a valid C-cast or reinterpret_cast
+ // on the address of a local struct that gets returned by value yields an
+ // invalid bitcast from the a pointer to the IndirectAS to a pointer to the
+ // DefaultAS. We can only do this subversive thing because sret args are
+ // manufactured and them residing in the IndirectAS is a target specific
+ // detail, and doing an AS cast here still retains the semantics the user
+ // expects. It is desirable to remove this iff a better solution is found.
+ if (SrcTy != DstTy && isa<llvm::Argument>(Src) &&
+ cast<llvm::Argument>(Src)->hasStructRetAttr())
+ return CGF.CGM.getTargetCodeGenInfo().performAddrSpaceCast(
+ CGF, Src, E->getType().getAddressSpace(), DestTy.getAddressSpace(),
+ DstTy);
+
assert(
(!SrcTy->isPtrOrPtrVectorTy() || !DstTy->isPtrOrPtrVectorTy() ||
SrcTy->getPointerAddressSpace() == DstTy->getPointerAddressSpace()) &&
diff --git a/clang/test/CodeGenCXX/sret_cast_with_nonzero_alloca_as.cpp b/clang/test/CodeGenCXX/sret_cast_with_nonzero_alloca_as.cpp
new file mode 100644
index 0000000000000..320c712b665de
--- /dev/null
+++ b/clang/test/CodeGenCXX/sret_cast_with_nonzero_alloca_as.cpp
@@ -0,0 +1,32 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa-gnu -target-cpu gfx900 -emit-llvm -o - %s | FileCheck %s
+
+struct X { int z[17]; };
+
+// CHECK-LABEL: define dso_local void @_Z3foocc(
+// CHECK-SAME: ptr addrspace(5) dead_on_unwind noalias writable sret([[STRUCT_X:%.*]]) align 4 [[AGG_RESULT:%.*]], i8 noundef signext [[X:%.*]], i8 noundef signext [[Y:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[X_ADDR:%.*]] = alloca i8, align 1, addrspace(5)
+// CHECK-NEXT: [[Y_ADDR:%.*]] = alloca i8, align 1, addrspace(5)
+// CHECK-NEXT: [[X_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[X_ADDR]] to ptr
+// CHECK-NEXT: [[Y_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[Y_ADDR]] to ptr
+// CHECK-NEXT: store i8 [[X]], ptr [[X_ADDR_ASCAST]], align 1
+// CHECK-NEXT: store i8 [[Y]], ptr [[Y_ADDR_ASCAST]], align 1
+// CHECK-NEXT: [[TMP0:%.*]] = load i8, ptr [[X_ADDR_ASCAST]], align 1
+// CHECK-NEXT: [[AGG_RESULT_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AGG_RESULT]] to ptr
+// CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i8, ptr [[AGG_RESULT_ASCAST]], i64 1
+// CHECK-NEXT: store i8 [[TMP0]], ptr [[ADD_PTR]], align 1
+// CHECK-NEXT: [[TMP1:%.*]] = load i8, ptr [[Y_ADDR_ASCAST]], align 1
+// CHECK-NEXT: [[AGG_RESULT_ASCAST1:%.*]] = addrspacecast ptr addrspace(5) [[AGG_RESULT]] to ptr
+// CHECK-NEXT: [[ADD_PTR2:%.*]] = getelementptr inbounds i8, ptr [[AGG_RESULT_ASCAST1]], i64 2
+// CHECK-NEXT: store i8 [[TMP1]], ptr [[ADD_PTR2]], align 1
+// CHECK-NEXT: ret void
+//
+X foo(char x, char y) {
+ X r;
+
+ *(((char*)&r) + 1) = x;
+ *(reinterpret_cast<char*>(&r) + 2) = y;
+
+ return r;
+}
diff --git a/clang/test/OpenMP/amdgcn_sret_ctor.cpp b/clang/test/OpenMP/amdgcn_sret_ctor.cpp
new file mode 100644
index 0000000000000..0eb4748571d56
--- /dev/null
+++ b/clang/test/OpenMP/amdgcn_sret_ctor.cpp
@@ -0,0 +1,38 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --include-generated-funcs --version 5
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -o - | FileCheck %s
+
+#pragma omp begin declare target
+struct S {
+ ~S();
+};
+S s();
+struct E {
+ S foo;
+ E() noexcept;
+};
+E::E() noexcept : foo(s()) {}
+#pragma omp end declare target
+// CHECK-LABEL: define hidden void @_ZN1EC2Ev(
+// CHECK-SAME: ptr noundef nonnull align 1 dereferenceable(1) [[THIS:%.*]]) unnamed_addr #[[ATTR0:[0-9]+]] align 2 {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[THIS_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-NEXT: [[THIS_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[THIS_ADDR]] to ptr
+// CHECK-NEXT: store ptr [[THIS]], ptr [[THIS_ADDR_ASCAST]], align 8
+// CHECK-NEXT: [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR_ASCAST]], align 8
+// CHECK-NEXT: [[THIS1_ASCAST:%.*]] = addrspacecast ptr [[THIS1]] to ptr addrspace(5)
+// CHECK-NEXT: call void @_Z1sv(ptr addrspace(5) dead_on_unwind writable sret([[STRUCT_S:%.*]]) align 1 [[THIS1_ASCAST]]) #[[ATTR2:[0-9]+]]
+// CHECK-NEXT: ret void
+//
+// CHECK-LABEL: declare void @_Z1sv(
+// CHECK-SAME: ptr addrspace(5) dead_on_unwind writable sret([[STRUCT_S]]) align 1) #[[ATTR1:[0-9]+]]
+//
+// CHECK-LABEL: define hidden void @_ZN1EC1Ev(
+// CHECK-SAME: ptr noundef nonnull align 1 dereferenceable(1) [[THIS:%.*]]) unnamed_addr #[[ATTR0]] align 2 {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[THIS_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// CHECK-NEXT: [[THIS_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[THIS_ADDR]] to ptr
+// CHECK-NEXT: store ptr [[THIS]], ptr [[THIS_ADDR_ASCAST]], align 8
+// CHECK-NEXT: [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR_ASCAST]], align 8
+// CHECK-NEXT: call void @_ZN1EC2Ev(ptr noundef nonnull align 1 dereferenceable(1) [[THIS1]]) #[[ATTR3:[0-9]+]]
+// CHECK-NEXT: ret void
+//
|
clang/lib/CodeGen/CGExprScalar.cpp
Outdated
if (SrcTy != DstTy && isa<llvm::Argument>(Src) && | ||
cast<llvm::Argument>(Src)->hasStructRetAttr()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dyn_cast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd nest an extra if
(maybe a pair of if
s) unless we use C++17's if statement with initializer stuff (if (auto A = dyn_cast<llvm::Argument>(Src); A->hasStructRetAttr())
) which I've not seen much of / is a bit surprising for those unaware of the feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C++17 thing is slowly but surely spreading to be the canonical way to do this (needs A && A->hasStructRetAttr())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C++17 thing is slowly but surely spreading to be the canonical way to do this (needs A && A->hasStructRetAttr())
Oh, that's cool to hear, I'll rewrite it then, cheers.
@@ -0,0 +1,38 @@ | |||
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --include-generated-funcs --version 5 | |||
// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -o - | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably don't need both triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa but I never remember how these interact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, shouldn't be necessary. Could add nvptx64-nvidia-cuda
instead to be thorough since I'm assuming it'd use the same AS handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTX is 0 all over the place at the moment so it'd not be impacted; I only copy&pastaed the cc1 invocation from an existing OMP test since I also have no idea how these interact:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally gave an OpenMP version just to indicate that the failure wasn't restricted to my weird use-cases as someone could wrap their code in begin / end declare target stuff and have it fail.
return CGF.CGM.getTargetCodeGenInfo().performAddrSpaceCast( | ||
CGF, Src, E->getType().getAddressSpace(), DestTy.getAddressSpace(), | ||
DstTy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does performAddrSpaceCast take a source value, and the source address space? The source value's type already carries the address space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It carries the LLVM AS, not the AST AS. Since this is a target hook, I assume the intention is to allow a target to override this interface and take into account language intended AS mappings, rather than how the values being cast got codegen-ed? For example, it is plausible, if not currently done (?) that LangAS::Foo
and LangAS::Bar
are aliases / map to the same target AS etc., so it'd be possible to simply NOP this and return the source. AFAICS no target does this / overrides the interface though.
*(reinterpret_cast<char*>(&r) + 2) = y; | ||
|
||
return r; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the analogous byval / byref case need to be tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe those are fine since they're not implicit / manufactured and they already get AS-casted to generic as part of normal handling: https://gcc.godbolt.org/z/df3TE8d19.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's reasonable and we should probably get this in before the release window.
The release branch reverted the original offending patch, so we don't need this or the other follow up patches |
Hello, Some of our targets in Google are broken by tho original patch and we need this fix patch to unblock our internal releases. Can we land the fix soonish ? |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/51/builds/11572 Here is the relevant piece of the build log for the reference
|
This addresses two issues introduced by moving indirect args into an explicit AS (please see <llvm#114062 (comment)> and <llvm#114062 (comment)>): 1. Unconditionally stripping casts from a pre-allocated return slot was incorrect / insufficient (this is illustrated by the `amdgcn_sret_ctor.cpp` test); 2. Putting compiler manufactured sret args in a non default AS can lead to a C-cast (surprisingly) requiring an AS cast (this is illustrated by the `sret_cast_with_nonzero_alloca_as.cpp test). The way we handle (2), by subverting CK_BitCast emission iff a sret arg is involved, is quite naff, but I couldn't think of any other way to use a non default indirect AS and make this case work (hopefully this is a failure of imagination on my part).
Remove accidental leftover unused variable.
This addresses two issues introduced by moving indirect args into an explicit AS (please see <llvm#114062 (comment)> and <llvm#114062 (comment)>): 1. Unconditionally stripping casts from a pre-allocated return slot was incorrect / insufficient (this is illustrated by the `amdgcn_sret_ctor.cpp` test); 2. Putting compiler manufactured sret args in a non default AS can lead to a C-cast (surprisingly) requiring an AS cast (this is illustrated by the `sret_cast_with_nonzero_alloca_as.cpp test). The way we handle (2), by subverting CK_BitCast emission iff a sret arg is involved, is quite naff, but I couldn't think of any other way to use a non default indirect AS and make this case work (hopefully this is a failure of imagination on my part).
Remove accidental leftover unused variable.
It will enforce address space to the return value helping to avoid invalid AS cast from local to private for sret argument that would appear after llvm/llvm-project#128166 and the related work. Signed-off-by: Sidorov, Dmitry <[email protected]>
This addresses two issues introduced by moving indirect args into an explicit AS (please see #114062 (comment) and #114062 (comment)):
amdgcn_sret_ctor.cpp
test);The way we handle (2), by subverting CK_BitCast emission iff a sret arg is involved, is quite naff, but I couldn't think of any other way to use a non default indirect AS and make this case work (hopefully this is a failure of imagination on my part).