-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Crash during nvptx codegen/instruction selection #117606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Looks like the relevant portion of the code doesn't actually check if the sm version is above 60:
|
I do not think per-function GPU targeting is ever going to work for NVPTX. Produced assembly must have the GPU arch set for the whole output. So, if you're trying to generate a function that attempts to do something that relies on sm_80 while compiling for sm_60 that is not expected to work. @gonzalobg : ^^^ |
@wsmoses Would it be possible for you to create a resuced reproducer based on CUDA source? My understanding is that originally the issue popped up on an attempt to use |
So the original code came from a clang invocation set for sm_80, so ideally everything is available for sm80. The original source reproducer comes from a cuda call being compiled with the enzyme compiler plugin (demonstrated here in our compiler explorer instance: https://fwd.gymni.ch/fMbaMT). Note that while Enzyme supports llvm#main, we only have an LLVM16 build of cuda on the explorer. The original godbolt link above is for current main and has a slightly different error |
I suspect something odd is going on in the enzyme's compilation pipeline setup. If I get your original reproducer and manually tell llc to target sm_80, it compiles that IR just fine: https://godbolt.org/z/5nMz9jnKP So, whatever enzyme does, it somehow fails to pass the CPU into to LLVM, though it apparently does generate IR for a newer GPU variant. Assuming your compiler does use CUDA headers, and attempt to compile for a GPU older than sm_60 you would've seen a compilation error https://godbolt.org/z/Kn5e6qP3M:
Your compilation did not see that error, so the front-end was targeting a new GPU, but the back-end apparently didn't. I'm fairly sure it's enzyme's problem. |
So Enzyme doesn’t modify the clang pipeline and just emits additional LLVM functions, so I’d be surprised if the pipeline args were changed. The particular part of the code that Enzyme emits here is it takes an atomic add emitted by cuda (marked seq const), and emits a new function with an atomic add and an atomic load (of the same monotonicitt, in this case seq const). It’s this additional load which is presently causing errors. In fairness, the original error report we had was from llvm 15 and was a different error message in instruction selection for the same instruction. The llc godbolt case was me taking that intermediate IR and running it through main llc and seeing an error at a similar place. Let me see if it reproduces end to end on current llvm. |
I would suggest updating to a very recent clang version -- atomic operations were largely unsupported by NVPTX until fairly recently. Some details on the current state of support are here:
See following pull requests for more: |
Agreed. Will, let me know if I can help with this in any way. |
Closing as resolved. |
https://godbolt.org/z/fGWzdfvM1
However the function in question is marked sm_80:
The text was updated successfully, but these errors were encountered: