-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[NVPTX] Add Volta Load/Store Atomics (.relaxed, .acquire, .release) and Volatile (.mmio/.volatile) support #98022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-nvptx Author: None (gonzalobg) ChangesThis PR adds initial support for some of Volta's (sm_70) load/store atomic and volatile/MMIO operations, hopefully without breaking any preexisting code. Only relaxed, acquire, and release operations w/ volatile are handled. This PR does not aim to add support for any of the following:
Patch is 105.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/98022.diff 6 Files Affected:
diff --git a/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.cpp b/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.cpp
index b7a20c351f5ff6..c96e5a8878fcba 100644
--- a/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.cpp
+++ b/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.cpp
@@ -224,9 +224,32 @@ void NVPTXInstPrinter::printLdStCode(const MCInst *MI, int OpNum,
if (Modifier) {
const MCOperand &MO = MI->getOperand(OpNum);
int Imm = (int) MO.getImm();
- if (!strcmp(Modifier, "volatile")) {
- if (Imm)
- O << ".volatile";
+ if (!strcmp(Modifier, "sem")) {
+ switch (Imm) {
+ case NVPTX::PTXLdStInstCode::NotAtomic:
+ break;
+ case NVPTX::PTXLdStInstCode::Volatile:
+ O << ".volatile";
+ break;
+ case NVPTX::PTXLdStInstCode::Relaxed:
+ O << ".relaxed.sys";
+ break;
+ case NVPTX::PTXLdStInstCode::Acquire:
+ O << ".acquire.sys";
+ break;
+ case NVPTX::PTXLdStInstCode::Release:
+ O << ".release.sys";
+ break;
+ case NVPTX::PTXLdStInstCode::RelaxedMMIO:
+ O << ".mmio.relaxed.sys";
+ break;
+ default:
+ SmallString<256> Msg;
+ raw_svector_ostream OS(Msg);
+ OS << "NVPTX LdStCode Printer does not support \"" << Imm << "\" sem modifier.";
+ report_fatal_error(OS.str());
+ break;
+ }
} else if (!strcmp(Modifier, "addsp")) {
switch (Imm) {
case NVPTX::PTXLdStInstCode::GLOBAL:
diff --git a/llvm/lib/Target/NVPTX/NVPTX.h b/llvm/lib/Target/NVPTX/NVPTX.h
index b0cb24c63c3ceb..3c7167b1570254 100644
--- a/llvm/lib/Target/NVPTX/NVPTX.h
+++ b/llvm/lib/Target/NVPTX/NVPTX.h
@@ -107,6 +107,14 @@ enum LoadStore {
};
namespace PTXLdStInstCode {
+enum MemorySemantic {
+ NotAtomic = 0, // PTX calls these: "Weak"
+ Volatile = 1,
+ Relaxed = 2,
+ Acquire = 3,
+ Release = 4,
+ RelaxedMMIO = 5
+};
enum AddressSpace {
GENERIC = 0,
GLOBAL = 1,
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
index 1e1cbb15e33d4a..8c25c79db41ebd 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
@@ -700,6 +700,109 @@ static unsigned int getCodeAddrSpace(MemSDNode *N) {
return NVPTX::PTXLdStInstCode::GENERIC;
}
+static unsigned int getCodeMemorySemantic(MemSDNode *N, const NVPTXSubtarget *Subtarget) {
+ AtomicOrdering Ordering = N->getSuccessOrdering();
+ auto CodeAddrSpace = getCodeAddrSpace(N);
+
+ // Supports relaxed, acquire, release, weak:
+ bool hasAtomics = Subtarget->getPTXVersion() >= 60 && Subtarget->getSmVersion() >= 70;
+ // Supports mmio:
+ bool hasRelaxedMMIO = Subtarget->getPTXVersion() >= 82 && Subtarget->getSmVersion() >= 70;
+
+ // TODO: lowering for SequentiallyConsistent Operations: for now, we error.
+ // TODO: lowering for AcquireRelease Operations: for now, we error.
+ //
+ // Lowering for non-SequentiallyConsistent Operations
+ //
+ // | Atomic | Volatile | Statespace | Lowering sm_60- | Lowering sm_70+ |
+ // |---------|----------|-------------------------------|-----------------|------------------------------------------------------|
+ // | No | No | All | plain | .weak |
+ // | No | Yes | Generic / Shared / Global [0] | .volatile | .volatile |
+ // | No | Yes | Local / Const / Param | plain [1] | .weak [1] |
+ // | Relaxed | No | Generic / Shared / Global [0] | .volatile | <atomic sem> |
+ // | Other | No | Generic / Shared / Global [0] | Error [2] | <atomic sem> |
+ // | Yes | No | Local / Const / Param | plain [1] | .weak [1] |
+ // | Relaxed | Yes | Generic / Shared [0] | .volatile | .volatile |
+ // | Relaxed | Yes | Global [0] | .volatile | .mmio.relaxed.sys (PTX 8.2+) or .volatile (PTX 8.1-) |
+ // | Relaxed | Yes | Local / Const / Param | plain [1] | .weak [1] |
+ // | Other | Yes | Generic / Shared / Global [0] | Error [4] | <atomic sem> [3] |
+ //
+ // [0]: volatile and atomics are only supported on generic addressing to shared or global, or shared, or global.
+ // MMIO requires generic addressing to global or global, but (TODO) we only implement it for global.
+ // [1]: TODO: this implementation exhibits PTX Undefined Behavior; it fails to preserve the side-effects of atomics and volatile
+ // accesses in LLVM IR to local / const / param, causing well-formed LLVM-IR & CUDA C++ programs to be miscompiled in sm_70+.
+ if (CodeAddrSpace == NVPTX::PTXLdStInstCode::LOCAL || CodeAddrSpace == NVPTX::PTXLdStInstCode::CONSTANT || CodeAddrSpace == NVPTX::PTXLdStInstCode::PARAM) {
+ return NVPTX::PTXLdStInstCode::NotAtomic;
+ }
+
+ // [2]: Atomics with Ordering different than Relaxed are not supported on sm_60 and older.
+ if (!(Ordering == AtomicOrdering::NotAtomic || Ordering == AtomicOrdering::Monotonic) && !hasAtomics) {
+ SmallString<256> Msg;
+ raw_svector_ostream OS(Msg);
+ OS << "PTX does not support \"atomic\" for orderings different than \"NotAtomic\" or \"Monotonic\" for sm_60 or older, but order is: \"" << toIRString(Ordering) << "\".";
+ report_fatal_error(OS.str());
+ }
+
+ // [3]: TODO: these should eventually use .mmio<.atomic sem>; for now we drop the volatile semantics and preserve the atomic ones.
+ // [4]: TODO: volatile atomics with order stronger than relaxed are currently unimplemented in sm_60 and older..
+ if (!hasAtomics && N->isVolatile() && !(Ordering == AtomicOrdering::NotAtomic || Ordering == AtomicOrdering::Monotonic)) {
+ SmallString<256> Msg;
+ raw_svector_ostream OS(Msg);
+ OS << "PTX does not support \"volatile atomic\" for orderings different than \"NotAtomic\" or \"Monotonic\" for sm_60 and older, but order is: \"" << toIRString(Ordering) << "\".";
+ report_fatal_error(OS.str());
+ }
+
+ // PTX volatile and PTX atomics are not available for statespace that differ from .generic, .global, or .shared.
+ // The behavior of PTX volatile and PTX atomics is undefined if the generic address does not refer to a .global or .shared memory location.
+ bool addrGenericOrGlobalOrShared = (CodeAddrSpace == NVPTX::PTXLdStInstCode::GENERIC
+ || CodeAddrSpace == NVPTX::PTXLdStInstCode::GLOBAL
+ || CodeAddrSpace == NVPTX::PTXLdStInstCode::SHARED);
+ bool useRelaxedMMIO = hasRelaxedMMIO && CodeAddrSpace == NVPTX::PTXLdStInstCode::GLOBAL;
+
+ switch (Ordering) {
+ case AtomicOrdering::NotAtomic: return N->isVolatile() && addrGenericOrGlobalOrShared? NVPTX::PTXLdStInstCode::Volatile: NVPTX::PTXLdStInstCode::NotAtomic;
+ case AtomicOrdering::Monotonic:
+ if (N->isVolatile()) return useRelaxedMMIO? NVPTX::PTXLdStInstCode::RelaxedMMIO: addrGenericOrGlobalOrShared? NVPTX::PTXLdStInstCode::Volatile: NVPTX::PTXLdStInstCode::NotAtomic;
+ else return hasAtomics? NVPTX::PTXLdStInstCode::Relaxed: addrGenericOrGlobalOrShared? NVPTX::PTXLdStInstCode::Volatile: NVPTX::PTXLdStInstCode::NotAtomic;
+ case AtomicOrdering::Acquire:
+ if (!N->readMem()) {
+ SmallString<256> Msg;
+ raw_svector_ostream OS(Msg);
+ OS << "PTX only supports Acquire Ordering on reads: " << N->getOperationName();
+ N->print(OS);
+ report_fatal_error(OS.str());
+ }
+ return addrGenericOrGlobalOrShared? NVPTX::PTXLdStInstCode::Acquire: NVPTX::PTXLdStInstCode::NotAtomic;
+ case AtomicOrdering::Release:
+ if (!N->writeMem()) {
+ SmallString<256> Msg;
+ raw_svector_ostream OS(Msg);
+ OS << "PTX only supports Release Ordering on writes: " << N->getOperationName();
+ N->print(OS);
+ report_fatal_error(OS.str());
+ }
+ return addrGenericOrGlobalOrShared? NVPTX::PTXLdStInstCode::Release: NVPTX::PTXLdStInstCode::NotAtomic;
+ case AtomicOrdering::AcquireRelease: {
+ SmallString<256> Msg;
+ raw_svector_ostream OS(Msg);
+ OS << "PTX only supports AcquireRelease Ordering on read-modify-write: " << N->getOperationName();
+ N->print(OS);
+ report_fatal_error(OS.str());
+ }
+ case AtomicOrdering::SequentiallyConsistent:
+ case AtomicOrdering::Unordered:
+ default: {
+ // TODO: support AcquireRelease and SequentiallyConsistent
+ SmallString<256> Msg;
+ raw_svector_ostream OS(Msg);
+ OS << "NVPTX backend does not support AtomicOrdering \"" << toIRString(Ordering) << "\" yet.";
+ report_fatal_error(OS.str());
+ }
+ }
+
+ report_fatal_error("unreachable");
+}
+
static bool canLowerToLDG(MemSDNode *N, const NVPTXSubtarget &Subtarget,
unsigned CodeAddrSpace, MachineFunction *F) {
// We use ldg (i.e. ld.global.nc) for invariant loads from the global address
@@ -902,32 +1005,18 @@ bool NVPTXDAGToDAGISel::tryLoad(SDNode *N) {
if (!LoadedVT.isSimple())
return false;
- AtomicOrdering Ordering = LD->getSuccessOrdering();
- // In order to lower atomic loads with stronger guarantees we would need to
- // use load.acquire or insert fences. However these features were only added
- // with PTX ISA 6.0 / sm_70.
- // TODO: Check if we can actually use the new instructions and implement them.
- if (isStrongerThanMonotonic(Ordering))
- return false;
-
// Address Space Setting
unsigned int CodeAddrSpace = getCodeAddrSpace(LD);
if (canLowerToLDG(LD, *Subtarget, CodeAddrSpace, MF)) {
return tryLDGLDU(N);
}
+ // Memory Semantic Setting
+ unsigned int CodeMemorySem = getCodeMemorySemantic(LD, Subtarget);
+
unsigned int PointerSize =
CurDAG->getDataLayout().getPointerSizeInBits(LD->getAddressSpace());
- // Volatile Setting
- // - .volatile is only available for .global and .shared
- // - .volatile has the same memory synchronization semantics as .relaxed.sys
- bool isVolatile = LD->isVolatile() || Ordering == AtomicOrdering::Monotonic;
- if (CodeAddrSpace != NVPTX::PTXLdStInstCode::GLOBAL &&
- CodeAddrSpace != NVPTX::PTXLdStInstCode::SHARED &&
- CodeAddrSpace != NVPTX::PTXLdStInstCode::GENERIC)
- isVolatile = false;
-
// Type Setting: fromType + fromTypeWidth
//
// Sign : ISD::SEXTLOAD
@@ -968,7 +1057,7 @@ bool NVPTXDAGToDAGISel::tryLoad(SDNode *N) {
NVPTX::LD_f32_avar, NVPTX::LD_f64_avar);
if (!Opcode)
return false;
- SDValue Ops[] = { getI32Imm(isVolatile, dl), getI32Imm(CodeAddrSpace, dl),
+ SDValue Ops[] = { getI32Imm(CodeMemorySem, dl), getI32Imm(CodeAddrSpace, dl),
getI32Imm(vecType, dl), getI32Imm(fromType, dl),
getI32Imm(fromTypeWidth, dl), Addr, Chain };
NVPTXLD = CurDAG->getMachineNode(*Opcode, dl, TargetVT, MVT::Other, Ops);
@@ -979,7 +1068,7 @@ bool NVPTXDAGToDAGISel::tryLoad(SDNode *N) {
NVPTX::LD_f32_asi, NVPTX::LD_f64_asi);
if (!Opcode)
return false;
- SDValue Ops[] = { getI32Imm(isVolatile, dl), getI32Imm(CodeAddrSpace, dl),
+ SDValue Ops[] = { getI32Imm(CodeMemorySem, dl), getI32Imm(CodeAddrSpace, dl),
getI32Imm(vecType, dl), getI32Imm(fromType, dl),
getI32Imm(fromTypeWidth, dl), Base, Offset, Chain };
NVPTXLD = CurDAG->getMachineNode(*Opcode, dl, TargetVT, MVT::Other, Ops);
@@ -996,7 +1085,7 @@ bool NVPTXDAGToDAGISel::tryLoad(SDNode *N) {
NVPTX::LD_f32_ari, NVPTX::LD_f64_ari);
if (!Opcode)
return false;
- SDValue Ops[] = { getI32Imm(isVolatile, dl), getI32Imm(CodeAddrSpace, dl),
+ SDValue Ops[] = { getI32Imm(CodeMemorySem, dl), getI32Imm(CodeAddrSpace, dl),
getI32Imm(vecType, dl), getI32Imm(fromType, dl),
getI32Imm(fromTypeWidth, dl), Base, Offset, Chain };
NVPTXLD = CurDAG->getMachineNode(*Opcode, dl, TargetVT, MVT::Other, Ops);
@@ -1012,7 +1101,7 @@ bool NVPTXDAGToDAGISel::tryLoad(SDNode *N) {
NVPTX::LD_f32_areg, NVPTX::LD_f64_areg);
if (!Opcode)
return false;
- SDValue Ops[] = { getI32Imm(isVolatile, dl), getI32Imm(CodeAddrSpace, dl),
+ SDValue Ops[] = { getI32Imm(CodeMemorySem, dl), getI32Imm(CodeAddrSpace, dl),
getI32Imm(vecType, dl), getI32Imm(fromType, dl),
getI32Imm(fromTypeWidth, dl), N1, Chain };
NVPTXLD = CurDAG->getMachineNode(*Opcode, dl, TargetVT, MVT::Other, Ops);
@@ -1051,13 +1140,8 @@ bool NVPTXDAGToDAGISel::tryLoadVector(SDNode *N) {
unsigned int PointerSize =
CurDAG->getDataLayout().getPointerSizeInBits(MemSD->getAddressSpace());
- // Volatile Setting
- // - .volatile is only availalble for .global and .shared
- bool IsVolatile = MemSD->isVolatile();
- if (CodeAddrSpace != NVPTX::PTXLdStInstCode::GLOBAL &&
- CodeAddrSpace != NVPTX::PTXLdStInstCode::SHARED &&
- CodeAddrSpace != NVPTX::PTXLdStInstCode::GENERIC)
- IsVolatile = false;
+ // Memory Semantic Setting
+ unsigned int CodeMemorySem = getCodeMemorySemantic(MemSD, Subtarget);
// Vector Setting
MVT SimpleVT = LoadedVT.getSimpleVT();
@@ -1124,7 +1208,7 @@ bool NVPTXDAGToDAGISel::tryLoadVector(SDNode *N) {
}
if (!Opcode)
return false;
- SDValue Ops[] = { getI32Imm(IsVolatile, DL), getI32Imm(CodeAddrSpace, DL),
+ SDValue Ops[] = { getI32Imm(CodeMemorySem, DL), getI32Imm(CodeAddrSpace, DL),
getI32Imm(VecType, DL), getI32Imm(FromType, DL),
getI32Imm(FromTypeWidth, DL), Addr, Chain };
LD = CurDAG->getMachineNode(*Opcode, DL, N->getVTList(), Ops);
@@ -1149,7 +1233,7 @@ bool NVPTXDAGToDAGISel::tryLoadVector(SDNode *N) {
}
if (!Opcode)
return false;
- SDValue Ops[] = { getI32Imm(IsVolatile, DL), getI32Imm(CodeAddrSpace, DL),
+ SDValue Ops[] = { getI32Imm(CodeMemorySem, DL), getI32Imm(CodeAddrSpace, DL),
getI32Imm(VecType, DL), getI32Imm(FromType, DL),
getI32Imm(FromTypeWidth, DL), Base, Offset, Chain };
LD = CurDAG->getMachineNode(*Opcode, DL, N->getVTList(), Ops);
@@ -1194,7 +1278,7 @@ bool NVPTXDAGToDAGISel::tryLoadVector(SDNode *N) {
}
if (!Opcode)
return false;
- SDValue Ops[] = { getI32Imm(IsVolatile, DL), getI32Imm(CodeAddrSpace, DL),
+ SDValue Ops[] = { getI32Imm(CodeMemorySem, DL), getI32Imm(CodeAddrSpace, DL),
getI32Imm(VecType, DL), getI32Imm(FromType, DL),
getI32Imm(FromTypeWidth, DL), Base, Offset, Chain };
@@ -1239,7 +1323,7 @@ bool NVPTXDAGToDAGISel::tryLoadVector(SDNode *N) {
}
if (!Opcode)
return false;
- SDValue Ops[] = { getI32Imm(IsVolatile, DL), getI32Imm(CodeAddrSpace, DL),
+ SDValue Ops[] = { getI32Imm(CodeMemorySem, DL), getI32Imm(CodeAddrSpace, DL),
getI32Imm(VecType, DL), getI32Imm(FromType, DL),
getI32Imm(FromTypeWidth, DL), Op1, Chain };
LD = CurDAG->getMachineNode(*Opcode, DL, N->getVTList(), Ops);
@@ -1684,27 +1768,13 @@ bool NVPTXDAGToDAGISel::tryStore(SDNode *N) {
if (!StoreVT.isSimple())
return false;
- AtomicOrdering Ordering = ST->getSuccessOrdering();
- // In order to lower atomic loads with stronger guarantees we would need to
- // use store.release or insert fences. However these features were only added
- // with PTX ISA 6.0 / sm_70.
- // TODO: Check if we can actually use the new instructions and implement them.
- if (isStrongerThanMonotonic(Ordering))
- return false;
-
// Address Space Setting
unsigned int CodeAddrSpace = getCodeAddrSpace(ST);
unsigned int PointerSize =
CurDAG->getDataLayout().getPointerSizeInBits(ST->getAddressSpace());
- // Volatile Setting
- // - .volatile is only available for .global and .shared
- // - .volatile has the same memory synchronization semantics as .relaxed.sys
- bool isVolatile = ST->isVolatile() || Ordering == AtomicOrdering::Monotonic;
- if (CodeAddrSpace != NVPTX::PTXLdStInstCode::GLOBAL &&
- CodeAddrSpace != NVPTX::PTXLdStInstCode::SHARED &&
- CodeAddrSpace != NVPTX::PTXLdStInstCode::GENERIC)
- isVolatile = false;
+ // Memory Semantic Setting
+ unsigned int CodeMemorySem = getCodeMemorySemantic(ST, Subtarget);
// Vector Setting
MVT SimpleVT = StoreVT.getSimpleVT();
@@ -1741,7 +1811,7 @@ bool NVPTXDAGToDAGISel::tryStore(SDNode *N) {
if (!Opcode)
return false;
SDValue Ops[] = {Value,
- getI32Imm(isVolatile, dl),
+ getI32Imm(CodeMemorySem, dl),
getI32Imm(CodeAddrSpace, dl),
getI32Imm(vecType, dl),
getI32Imm(toType, dl),
@@ -1758,7 +1828,7 @@ bool NVPTXDAGToDAGISel::tryStore(SDNode *N) {
if (!Opcode)
return false;
SDValue Ops[] = {Value,
- getI32Imm(isVolatile, dl),
+ getI32Imm(CodeMemorySem, dl),
getI32Imm(CodeAddrSpace, dl),
getI32Imm(vecType, dl),
getI32Imm(toType, dl),
@@ -1783,7 +1853,7 @@ bool NVPTXDAGToDAGISel::tryStore(SDNode *N) {
return false;
SDValue Ops[] = {Value,
- getI32Imm(isVolatile, dl),
+ getI32Imm(CodeMemorySem, dl),
getI32Imm(CodeAddrSpace, dl),
getI32Imm(vecType, dl),
getI32Imm(toType, dl),
@@ -1805,7 +1875,7 @@ bool NVPTXDAGToDAGISel::tryStore(SDNode *N) {
if (!Opcode)
return false;
SDValue Ops[] = {Value,
- getI32Imm(isVolatile, dl),
+ getI32Imm(CodeMemorySem, dl),
getI32Imm(CodeAddrSpace, dl),
getI32Imm(vecType, dl),
getI32Imm(toType, dl),
@@ -1844,13 +1914,8 @@ bool NVPTXDAGToDAGISel::tryStoreVector(SDNode *N) {
unsigned int PointerSize =
CurDAG->getDataLayout().getPointerSizeInBits(MemSD->getAddressSpace());
- // Volatile Setting
- // - .volatile is only availalble for .global and .shared
- bool IsVolatile = MemSD->isVolatile();
- if (CodeAddrSpace != NVPTX::PTXLdStInstCode::GLOBAL &&
- CodeAddrSpace != NVPTX::PTXLdStInstCode::SHARED &&
- CodeAddrSpace != NVPTX::PTXLdStInstCode::GENERIC)
- IsVolatile = false;
+ // Memory Semantic Setting
+ unsigned int CodeMemorySem = getCodeMemorySemantic(MemSD, Subtarget);
// Type Setting: toType + toTypeWidth
// - for integer type, always use 'u'
@@ -1892,7 +1957,7 @@ bool NVPTXDAGToDAGISel::tryStoreVector(SDNode *N) {
ToTypeWidth = 32;
}
- StOps.push_back(getI32Imm(IsVolatile, DL));
+ StOps.push_back(getI32Imm(CodeMemorySem, DL));
StOps.push_back(getI32Imm(CodeAddrSpace, DL));
StOps.push_back(getI32Imm(VecType, DL));
StOps.push_back(getI32Imm(ToType, DL));
diff --git a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
index c4c35a1f74ba93..9be8fe5d4e6efb 100644
--- a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+++ b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
@@ -2939,39 +2939,39 @@ foreach vt = [v2f16, v2bf16, v2i16, v4i8] in {
multiclass LD<NVPTXRegClass regclass> {
def _avar : NVPTXInst<
(outs regclass:$dst),
- (ins LdStCode:$isVol, LdStCode:$addsp, LdStCode:$Vec, LdStCode:$Sign,
+ (ins LdStCode:$sem, LdStCode:$addsp, LdStCode:$Vec, LdStCode:$Sign,
i32imm:$fromWidth, imem:$addr),
- "ld${isVol:volatile}${addsp:addsp}${Vec:vec}.${Sign:sign}$fromWidth "
+ "ld${sem:sem}${addsp:addsp}${Vec:vec}.${Sign:sign}$fromWidth "
"\t$dst, [$addr];", []>;
def _areg : NVPTXInst<
(outs regclass:$dst),
- (ins LdStCode:$isVol, LdStCode:$addsp, LdStCode:$Vec, LdStCode:$Sign,
+ (ins LdStCode:$sem, LdStCode:$addsp, Ld...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
b5b83fb
to
d163bc5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in principle, with few comments and nits.
af3b119
to
d298f16
Compare
d298f16
to
46ab54a
Compare
Looks like the patch is causing crashes for us. We need to revert it.
I will extract the reproducer a bit later. |
Thank you, really appreciate it. Do you know if it fails in any of the clang tests or other tests in the LLVM repo? (if so I can try running those to fix it) |
I believe it was triggered by some XLA tests, during JIT. That's why diffing out the IR which triggers the problem may take a bit of time. |
…nd Volatile (.mmio/.volatile) support (#98022) Summary: This PR adds initial support for some of Volta's (sm_70) load/store atomic and volatile/MMIO operations, hopefully without breaking any preexisting code. Only relaxed, acquire, and release operations w/ volatile are handled. This PR does not aim to add support for any of the following: - syncscope support - read atomic ops to const, param, grid param - local memory atomics - sequentially consistent atomics - atomicrmw - ... Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251724
…nd Volatile (.mmio/.volatile) support (#99709) Summary: Followup to #98022 which broke support for LLVM `unordered` atomic ordering. This PR lowers `atomic unordered` to PTX volatile operations to preserve atomicity, and adds a bunch of tests for this ordering to the NVPTX backend (we had none). Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250591
This PR adds initial support for some of Volta's (sm_70) load/store atomic and volatile/MMIO operations, hopefully without breaking any preexisting code. Only relaxed, acquire, and release operations w/ volatile are handled.
This PR does not aim to add support for any of the following:
This PR builds on top of #96436 which has not been reviewed yet (not sure if there is anything for me to do there; opened it 2 weeks ago and have heard nothing).