Skip to content

Crash during nvptx codegen/instruction selection #117606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wsmoses opened this issue Nov 25, 2024 · 10 comments
Closed

Crash during nvptx codegen/instruction selection #117606

wsmoses opened this issue Nov 25, 2024 · 10 comments
Labels
backend:NVPTX crash Prefer [crash-on-valid] or [crash-on-invalid] llvm:SelectionDAG SelectionDAGISel as well

Comments

@wsmoses
Copy link
Member

wsmoses commented Nov 25, 2024

https://godbolt.org/z/fGWzdfvM1

LLVM ERROR: PTX does not support "atomic" for orderings different than"NotAtomic" or "Monotonic" for sm_60 or older, but order is: "seq_cst".
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /opt/compiler-explorer/clang-assertions-trunk/bin/llc -o /app/output.s -x86-asm-syntax=intel <source>
1.	Running pass 'Function Pass Manager' on module '<source>'.
2.	Running pass 'NVPTX DAG->DAG Pattern Instruction Selection' on function '@diffe_ZL9atomicAddPdd'
 #0 0x0000000003c15cd8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x3c15cd8)
 #1 0x0000000003c136cc SignalHandler(int) Signals.cpp:0:0
 #2 0x00007cc624442520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #3 0x00007cc6244969fc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x969fc)
 #4 0x00007cc624442476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476)
 #5 0x00007cc6244287f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3)
 #6 0x0000000000758499 (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x758499)
 #7 0x000000000196156b llvm::NVPTXDAGToDAGISel::insertMemoryInstructionFence(llvm::SDLoc, llvm::SDValue&, llvm::MemSDNode*) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x196156b)
 #8 0x00000000019631c5 llvm::NVPTXDAGToDAGISel::tryLoad(llvm::SDNode*) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x19631c5)
 #9 0x00000000019670e3 llvm::NVPTXDAGToDAGISel::Select(llvm::SDNode*) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x19670e3)
#10 0x00000000039bc70b llvm::SelectionDAGISel::DoInstructionSelection() (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x39bc70b)
#11 0x00000000039cba3a llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x39cba3a)
#12 0x00000000039cf022 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x39cf022)
#13 0x00000000039d0340 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x39d0340)
#14 0x00000000019681c3 llvm::NVPTXDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x19681c3)
#15 0x00000000039c0c0f llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x39c0c0f)
#16 0x0000000002b7a219 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.0) MachineFunctionPass.cpp:0:0
#17 0x0000000003180250 llvm::FPPassManager::runOnFunction(llvm::Function&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x3180250)
#18 0x0000000003180601 llvm::FPPassManager::runOnModule(llvm::Module&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x3180601)
#19 0x0000000003180eb7 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x3180eb7)
#20 0x000000000086bed8 compileModule(char**, llvm::LLVMContext&) llc.cpp:0:0
#21 0x000000000075fbbe main (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x75fbbe)
#22 0x00007cc624429d90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
#23 0x00007cc624429e40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40)
#24 0x000000000086281e _start (/opt/compiler-explorer/clang-assertions-trunk/bin/llc+0x86281e)

However the function in question is marked sm_80:


; Function Attrs: mustprogress nofree noinline norecurse nounwind willreturn
define internal fastcc double @diffe_ZL9atomicAddPdd(ptr nocapture noundef %address, ptr nocapture readonly %"address'", double noundef %val) unnamed_addr #143 {
entry:
  %0 = atomicrmw fadd ptr %address, double %val seq_cst, align 8
  %1 = load atomic double, ptr %"address'" seq_cst, align 8
  ret double %1
}

attributes #143 = { mustprogress nofree noinline norecurse nounwind willreturn "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="sm_80" "target-features"="+ptx75,+sm_80" }

@wsmoses
Copy link
Member Author

wsmoses commented Nov 25, 2024

Looks like the relevant portion of the code doesn't actually check if the sm version is above 60:

@Artem-B
Copy link
Member

Artem-B commented Nov 25, 2024

I do not think per-function GPU targeting is ever going to work for NVPTX. Produced assembly must have the GPU arch set for the whole output.

So, if you're trying to generate a function that attempts to do something that relies on sm_80 while compiling for sm_60 that is not expected to work.

@gonzalobg : ^^^

@Artem-B
Copy link
Member

Artem-B commented Nov 25, 2024

@wsmoses Would it be possible for you to create a resuced reproducer based on CUDA source? My understanding is that originally the issue popped up on an attempt to use atomicAdd in CUDA code. I would like to see what exactly is going on.

@wsmoses
Copy link
Member Author

wsmoses commented Nov 25, 2024

So the original code came from a clang invocation set for sm_80, so ideally everything is available for sm80.

The original source reproducer comes from a cuda call being compiled with the enzyme compiler plugin (demonstrated here in our compiler explorer instance: https://fwd.gymni.ch/fMbaMT). Note that while Enzyme supports llvm#main, we only have an LLVM16 build of cuda on the explorer. The original godbolt link above is for current main and has a slightly different error

@Artem-B
Copy link
Member

Artem-B commented Nov 25, 2024

I suspect something odd is going on in the enzyme's compilation pipeline setup.

If I get your original reproducer and manually tell llc to target sm_80, it compiles that IR just fine: https://godbolt.org/z/5nMz9jnKP

So, whatever enzyme does, it somehow fails to pass the CPU into to LLVM, though it apparently does generate IR for a newer GPU variant.

Assuming your compiler does use CUDA headers, and attempt to compile for a GPU older than sm_60 you would've seen a compilation error https://godbolt.org/z/Kn5e6qP3M:

 `error: '__nvvm_atom_add_gen_d' needs target feature sm_60|sm_61|sm_62|sm_70|sm_72|sm_75|sm_80|sm_86|sm_87|sm_89|sm_90|sm_90a|sm_100`

Your compilation did not see that error, so the front-end was targeting a new GPU, but the back-end apparently didn't. I'm fairly sure it's enzyme's problem.

@wsmoses
Copy link
Member Author

wsmoses commented Nov 26, 2024

So Enzyme doesn’t modify the clang pipeline and just emits additional LLVM functions, so I’d be surprised if the pipeline args were changed.

The particular part of the code that Enzyme emits here is it takes an atomic add emitted by cuda (marked seq const), and emits a new function with an atomic add and an atomic load (of the same monotonicitt, in this case seq const). It’s this additional load which is presently causing errors.

In fairness, the original error report we had was from llvm 15 and was a different error message in instruction selection for the same instruction. The llc godbolt case was me taking that intermediate IR and running it through main llc and seeing an error at a similar place. Let me see if it reproduces end to end on current llvm.

@Artem-B
Copy link
Member

Artem-B commented Nov 26, 2024

the original error report we had was from llvm 15

I would suggest updating to a very recent clang version -- atomic operations were largely unsupported by NVPTX until fairly recently.

Some details on the current state of support are here:

// Lowering for Load/Store Operations (note: AcquireRelease Loads or Stores error).

See following pull requests for more:

@gonzalobg
Copy link
Contributor

So, if you're trying to generate a function that attempts to do something that relies on sm_80 while compiling for sm_60 that is not expected to work.

Agreed.

Will, let me know if I can help with this in any way.

@minansys
Copy link

minansys commented Dec 2, 2024

@wsmoses @Artem-B After building the LLVM/clang using the main branch, and rebuilding the enzyme using the latest LLVM, the atomic add issue is resolved now. Thanks a lot for your help!

@Artem-B
Copy link
Member

Artem-B commented Dec 2, 2024

Closing as resolved.

@Artem-B Artem-B closed this as completed Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:NVPTX crash Prefer [crash-on-valid] or [crash-on-invalid] llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

No branches or pull requests

5 participants