Crash during nvptx codegen/instruction selection #117606

wsmoses · 2024-11-25T18:44:01Z

LLVM ERROR: PTX does not support "atomic" for orderings different than"NotAtomic" or "Monotonic" for sm_60 or older, but order is: "seq_cst".
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /opt/compiler-explorer/clang-assertions-trunk/bin/llc -o /app/output.s -x86-asm-syntax=intel <source>
1.	Running pass 'Function Pass Manager' on module '<source>'.
2.	Running pass 'NVPTX DAG->DAG Pattern Instruction Selection' on function '@diffe_ZL9atomicAddPdd'
 #0 0x0000000003c15cd8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x3c15cd8)
 #1 0x0000000003c136cc SignalHandler(int) Signals.cpp:0:0
 #2 0x00007cc624442520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #3 0x00007cc6244969fc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x969fc)
 #4 0x00007cc624442476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476)
 #5 0x00007cc6244287f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3)
 #6 0x0000000000758499 (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x758499)
 #7 0x000000000196156b llvm::NVPTXDAGToDAGISel::insertMemoryInstructionFence(llvm::SDLoc, llvm::SDValue&, llvm::MemSDNode*) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x196156b)
 #8 0x00000000019631c5 llvm::NVPTXDAGToDAGISel::tryLoad(llvm::SDNode*) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x19631c5)
 #9 0x00000000019670e3 llvm::NVPTXDAGToDAGISel::Select(llvm::SDNode*) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x19670e3)
#10 0x00000000039bc70b llvm::SelectionDAGISel::DoInstructionSelection() (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x39bc70b)
#11 0x00000000039cba3a llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x39cba3a)
#12 0x00000000039cf022 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x39cf022)
#13 0x00000000039d0340 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x39d0340)
#14 0x00000000019681c3 llvm::NVPTXDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x19681c3)
#15 0x00000000039c0c0f llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x39c0c0f)
#16 0x0000000002b7a219 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.0) MachineFunctionPass.cpp:0:0
#17 0x0000000003180250 llvm::FPPassManager::runOnFunction(llvm::Function&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x3180250)
#18 0x0000000003180601 llvm::FPPassManager::runOnModule(llvm::Module&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x3180601)
#19 0x0000000003180eb7 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x3180eb7)
#20 0x000000000086bed8 compileModule(char**, llvm::LLVMContext&) llc.cpp:0:0
#21 0x000000000075fbbe main (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x75fbbe)
#22 0x00007cc624429d90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
#23 0x00007cc624429e40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40)
#24 0x000000000086281e _start (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x86281e)

However the function in question is marked sm_80:


; Function Attrs: mustprogress nofree noinline norecurse nounwind willreturn
define internal fastcc double @diffe_ZL9atomicAddPdd(ptr nocapture noundef %address, ptr nocapture readonly %"address'", double noundef %val) unnamed_addr #143 {
entry:
  %0 = atomicrmw fadd ptr %address, double %val seq_cst, align 8
  %1 = load atomic double, ptr %"address'" seq_cst, align 8
  ret double %1
}

attributes #143 = { mustprogress nofree noinline norecurse nounwind willreturn "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="sm_80" "target-features"="+ptx75,+sm_80" }

The text was updated successfully, but these errors were encountered:

wsmoses · 2024-11-25T18:46:37Z

Looks like the relevant portion of the code doesn't actually check if the sm version is above 60:

llvm-project/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp

Line 836 in ed6749a

!HasMemoryOrdering) {

Artem-B · 2024-11-25T19:41:02Z

I do not think per-function GPU targeting is ever going to work for NVPTX. Produced assembly must have the GPU arch set for the whole output.

So, if you're trying to generate a function that attempts to do something that relies on sm_80 while compiling for sm_60 that is not expected to work.

@gonzalobg : ^^^

Artem-B · 2024-11-25T19:48:40Z

@wsmoses Would it be possible for you to create a resuced reproducer based on CUDA source? My understanding is that originally the issue popped up on an attempt to use atomicAdd in CUDA code. I would like to see what exactly is going on.

wsmoses · 2024-11-25T22:29:32Z

So the original code came from a clang invocation set for sm_80, so ideally everything is available for sm80.

The original source reproducer comes from a cuda call being compiled with the enzyme compiler plugin (demonstrated here in our compiler explorer instance: https://fwd.gymni.ch/fMbaMT). Note that while Enzyme supports llvm#main, we only have an LLVM16 build of cuda on the explorer. The original godbolt link above is for current main and has a slightly different error

Artem-B · 2024-11-25T22:50:28Z

I suspect something odd is going on in the enzyme's compilation pipeline setup.

If I get your original reproducer and manually tell llc to target sm_80, it compiles that IR just fine: https://godbolt.org/z/5nMz9jnKP

So, whatever enzyme does, it somehow fails to pass the CPU into to LLVM, though it apparently does generate IR for a newer GPU variant.

Assuming your compiler does use CUDA headers, and attempt to compile for a GPU older than sm_60 you would've seen a compilation error https://godbolt.org/z/Kn5e6qP3M:

 `error: '__nvvm_atom_add_gen_d' needs target feature sm_60|sm_61|sm_62|sm_70|sm_72|sm_75|sm_80|sm_86|sm_87|sm_89|sm_90|sm_90a|sm_100`

Your compilation did not see that error, so the front-end was targeting a new GPU, but the back-end apparently didn't. I'm fairly sure it's enzyme's problem.

wsmoses · 2024-11-26T02:00:50Z

So Enzyme doesn’t modify the clang pipeline and just emits additional LLVM functions, so I’d be surprised if the pipeline args were changed.

The particular part of the code that Enzyme emits here is it takes an atomic add emitted by cuda (marked seq const), and emits a new function with an atomic add and an atomic load (of the same monotonicitt, in this case seq const). It’s this additional load which is presently causing errors.

In fairness, the original error report we had was from llvm 15 and was a different error message in instruction selection for the same instruction. The llc godbolt case was me taking that intermediate IR and running it through main llc and seeing an error at a similar place. Let me see if it reproduces end to end on current llvm.

Artem-B · 2024-11-26T05:08:23Z

the original error report we had was from llvm 15

I would suggest updating to a very recent clang version -- atomic operations were largely unsupported by NVPTX until fairly recently.

Some details on the current state of support are here:

llvm-project/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp

Line 744 in eb5cda4

    
           // Lowering for Load/Store Operations (note: AcquireRelease Loads or Stores error).

See following pull requests for more:

[NVPTX] Add Volta Load/Store Atomics (.relaxed, .acquire, .release) and Volatile (.mmio/.volatile) support #98022
[NVPTX] Add Volta Atomic SequentiallyConsistent Load and Store Operations #98551
[NVPTX] Load/Store/Fence syncscope support #106101

gonzalobg · 2024-11-27T19:29:29Z

So, if you're trying to generate a function that attempts to do something that relies on sm_80 while compiling for sm_60 that is not expected to work.

Agreed.

Will, let me know if I can help with this in any way.

minansys · 2024-12-02T20:38:01Z

@wsmoses @Artem-B After building the LLVM/clang using the main branch, and rebuilding the enzyme using the latest LLVM, the atomic add issue is resolved now. Thanks a lot for your help!

Artem-B · 2024-12-02T20:45:14Z

Closing as resolved.

github-actions bot added the new issue label Nov 25, 2024

wsmoses added llvm:codegen backend:NVPTX llvm:SelectionDAG SelectionDAGISel as well labels Nov 25, 2024

minansys mentioned this issue Nov 25, 2024

Enzyme failed to support atomicAdd, atomicCAS, and assert for cuda code EnzymeAD/Enzyme#2053

Open

EugeneZelenko added crash Prefer [crash-on-valid] or [crash-on-invalid] and removed llvm:codegen new issue labels Nov 25, 2024

Artem-B closed this as completed Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash during nvptx codegen/instruction selection #117606

Crash during nvptx codegen/instruction selection #117606

wsmoses commented Nov 25, 2024

wsmoses commented Nov 25, 2024

Artem-B commented Nov 25, 2024

Artem-B commented Nov 25, 2024

wsmoses commented Nov 25, 2024

Artem-B commented Nov 25, 2024 •

edited

Loading

wsmoses commented Nov 26, 2024

Artem-B commented Nov 26, 2024

gonzalobg commented Nov 27, 2024

minansys commented Dec 2, 2024

Artem-B commented Dec 2, 2024

Crash during nvptx codegen/instruction selection #117606

Crash during nvptx codegen/instruction selection #117606

Comments

wsmoses commented Nov 25, 2024

wsmoses commented Nov 25, 2024

Artem-B commented Nov 25, 2024

Artem-B commented Nov 25, 2024

wsmoses commented Nov 25, 2024

Artem-B commented Nov 25, 2024 • edited Loading

wsmoses commented Nov 26, 2024

Artem-B commented Nov 26, 2024

gonzalobg commented Nov 27, 2024

minansys commented Dec 2, 2024

Artem-B commented Dec 2, 2024

Artem-B commented Nov 25, 2024 •

edited

Loading