Skip to content

Unsupported calling convention: AMDGPU_KERNEL encountered when codegen some amdgpu (opencl) bitcode #60313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
littlewu2508 opened this issue Jan 26, 2023 · 12 comments · Fixed by #115821
Assignees
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema"

Comments

@littlewu2508
Copy link

I'm not sure whether this is a bug or a missing feature, but please look at https://github.com/littlewu2508/LLVMAMDGPUcodegenbug, which is the reproducing method.

The issue is, during codegen phase, some simple opencl program meets "Unsupported calling convention: AMDGPU_KERNEL", while using the ROCm-forked llvm, the issue does not exists. Using gdb, I found that with vanilla llvm, one of the instructions I in

has OpCode=56 and SubclassData=365, which leads to 91 (CallingConv::AMDGPU_KERNEL) in followed steps (LowerCallTo, LowerCall, CCAssignFnForCall, etc), and finally get Unsupported calling convention issue. Gdb trace:
llvm-14.0.5-dbg-cmdline.log.gz

For ROCm forked llvm, there's no issue. My guess is that source1.cl is calling amdgpu kernel functions, but somehow it no I with 91 (CallingConv::AMDGPU_KERNEL) appearing, only 0 (CallingConv::C) is seen, so no unsupported calling convention issues (gdb trace:
ROCm-llvm-5.1.3-dbg-cmdline.log.gz). But I failed to find out which change they use to get things worked, due to lack of compiler knowledge, and inability to bisect ROCm forked llvm (encountered many build issues when bisecting).

Forgive me of using llvm-14 because I built the debug version of llvm/clang to start investigating th issue half a year earlier (original issue: https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/45). I can confirm that this situation still remains now for both llvm-15.0.7 and llvm-16

@llvmbot
Copy link
Member

llvmbot commented Jan 26, 2023

@llvm/issue-subscribers-backend-amdgpu

@littlewu2508
Copy link
Author

BTW, replacing triple from amdgpu into x86-64, there's no issue of-course

--- a/run.sh
+++ b/run.sh
@@ -1,7 +1,7 @@
 #!/bin/bash -x

-AMDGPU_ARCH=gfx803
-TRIPLE=amdgcn-amd-amdhsa
+AMDGPU_ARCH=x86-64
+TRIPLE=x86_64-pc-linux-gnu

and execute ./run.sh

@arsenm
Copy link
Contributor

arsenm commented Jan 26, 2023

This is because the ugly hack we use to postprocess the IR to handle calls to kernels by splitting into a kernel and a stub function is not upstream. You can avoid this by having a separate kernel function and implementation in a callable function.

Really clang needs to introduce 2 separate functions and come up with a different name mangling scheme for the two. @cdevadas was looking at this at some point. I still think this was a terrible feature the spec should have disallowed

@littlewu2508
Copy link
Author

This is because the ugly hack we use to postprocess the IR to handle calls to kernels by splitting into a kernel and a stub function is not upstream. You can avoid this by having a separate kernel function and implementation in a callable function.

Thanks for the explanation!

I'd like to know the hack refer to which commit(s)? Since I'm packaging ROCm stack, if users met such problems they can simply apply that patch upon common llvm releases as a temporary workaround.

Really clang needs to introduce 2 separate functions and come up with a different name mangling scheme for the two. @cdevadas was looking at this at some point. I still think this was a terrible feature the spec should have disallowed

Understood. So, is there any plan for ROCm stack to remove such kind of unsupported call in GPU codes? Because when distribution (Debian, Gentoo, Fedora) packages https://github.com/RadeonOpenCompute/RoCm-CompilerSupport/, there's always 4 failing tests due to this issue.

@arsenm
Copy link
Contributor

arsenm commented Jan 27, 2023

This is because the ugly hack we use to postprocess the IR to handle calls to kernels by splitting into a kernel and a stub function is not upstream. You can avoid this by having a separate kernel function and implementation in a callable function.

Thanks for the explanation!

I'd like to know the hack refer to which commit(s)? Since I'm packaging ROCm stack, if users met such problems they can simply apply that patch upon common llvm releases as a temporary workaround.

It's this pass that's been copied around for years https://github.com/RadeonOpenCompute/llvm-project/blob/amd-stg-open/llvm/lib/Target/AMDGPU/AMDGPULowerKernelCalls.cpp

Really clang needs to introduce 2 separate functions and come up with a different name mangling scheme for the two. @cdevadas was looking at this at some point. I still think this was a terrible feature the spec should have disallowed

Understood. So, is there any plan for ROCm stack to remove such kind of unsupported call in GPU codes? Because when distribution (Debian, Gentoo, Fedora) packages https://github.com/RadeonOpenCompute/RoCm-CompilerSupport/, there's always 4 failing tests due to this issue.

The hope was it would be deleted and upstream would do something better in clang, which https://reviews.llvm.org/D120566 started towards

@littlewu2508
Copy link
Author

I'm not sure whether this is related to #62066, but seems that AMDGPU openmp is also bothered by unsupported libcall legalization in llvm::SITargetLowering::LowerCall

@arsenm
Copy link
Contributor

arsenm commented Aug 9, 2023

I'm not sure whether this is related to #62066, but seems that AMDGPU openmp is also bothered by unsupported libcall legalization in llvm::SITargetLowering::LowerCall

This would be unrelated

@cdevadas
Copy link
Collaborator

cdevadas commented Sep 5, 2024

lalaniket8 is working on bringing this feature in the upstream compiler at clang codegen.

lalaniket8 pushed a commit that referenced this issue Nov 12, 2024
OpenCL allows a kernel function to call another kernel function.
To facilitate this we emit a stub version of each kernel function
with different name mangling scheme, and replace the kernel
callsite appropriately.

#60313
https://ontrack-internal.amd.com/browse/SWDEV-245936
lalaniket8 pushed a commit that referenced this issue Nov 13, 2024
OpenCL allows a kernel function to call another kernel function.
To facilitate this we emit a stub version of each kernel function
with different name mangling scheme, and replace the kernel
callsite appropriately.

This commit fixes #60313
https://ontrack-internal.amd.com/browse/SWDEV-245936

D120566 was an earlier effort to upstream a fix for this issue.
lalaniket8 pushed a commit that referenced this issue Nov 13, 2024
OpenCL allows a kernel function to call another kernel function.
To facilitate this we emit a stub version of each kernel function
with different name mangling scheme, and replace the kernel
callsite appropriately.

This commit fixes #60313
https://ontrack-internal.amd.com/browse/SWDEV-245936

D120566 was an earlier effort to upstream a fix for this issue.
lalaniket8 pushed a commit that referenced this issue Nov 25, 2024
To facilitate this we emit a stub version of each kernel function
with different name mangling scheme, and replace the kernel
callsite appropriately.

#60313
https://ontrack-internal.amd.com/browse/SWDEV-245936
lalaniket8 pushed a commit that referenced this issue Nov 27, 2024
This feature is currently not supported in the compiler.
To facilitate this we emit a stub version of each kernel
function body with different name mangling scheme, and
replaces the respective kernel call-sites appropriately.

Fixes #60313

D120566 was an earlier attempt made to upstream a solution
for this issue.
lalaniket8 pushed a commit that referenced this issue Nov 29, 2024
This feature is currently not supported in the compiler.
To facilitate this we emit a stub version of each kernel
function body with different name mangling scheme, and
replaces the respective kernel call-sites appropriately.

Fixes #60313

D120566 was an earlier attempt made to upstream a solution
for this issue.
lalaniket8 pushed a commit that referenced this issue Jan 22, 2025
This feature is currently not supported in the compiler.
To facilitate this we emit a stub version of each kernel
function body with different name mangling scheme, and
replaces the respective kernel call-sites appropriately.

Fixes #60313

D120566 was an earlier attempt made to upstream a solution
for this issue.
lalaniket8 pushed a commit that referenced this issue Jan 29, 2025
This feature is currently not supported in the compiler.
To facilitate this we emit a stub version of each kernel
function body with different name mangling scheme, and
replaces the respective kernel call-sites appropriately.

Fixes #60313

D120566 was an earlier attempt made to upstream a solution
for this issue.
lalaniket8 pushed a commit that referenced this issue Mar 12, 2025
This feature is currently not supported in the compiler.
To facilitate this we emit a stub version of each kernel
function body with different name mangling scheme, and
replaces the respective kernel call-sites appropriately.

Fixes #60313

D120566 was an earlier attempt made to upstream a solution
for this issue.
lalaniket8 pushed a commit that referenced this issue Mar 13, 2025
This feature is currently not supported in the compiler.
To facilitate this we emit a stub version of each kernel
function body with different name mangling scheme, and
replaces the respective kernel call-sites appropriately.

Fixes #60313

D120566 was an earlier attempt made to upstream a solution
for this issue.
lalaniket8 pushed a commit that referenced this issue Mar 19, 2025
This feature is currently not supported in the compiler.
To facilitate this we emit a stub version of each kernel
function body with different name mangling scheme, and
replaces the respective kernel call-sites appropriately.

Fixes #60313

D120566 was an earlier attempt made to upstream a solution
for this issue.
lalaniket8 pushed a commit that referenced this issue Apr 7, 2025
This feature is currently not supported in the compiler.
To facilitate this we emit a stub version of each kernel
function body with different name mangling scheme, and
replaces the respective kernel call-sites appropriately.

Fixes #60313

D120566 was an earlier attempt made to upstream a solution
for this issue.
@EugeneZelenko EugeneZelenko added clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. and removed backend:AMDGPU labels Apr 8, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 8, 2025

@llvm/issue-subscribers-clang-codegen

Author: None (littlewu2508)

I'm not sure whether this is a bug or a missing feature, but please look at https://github.com/littlewu2508/LLVMAMDGPUcodegenbug, which is the reproducing method.

The issue is, during codegen phase, some simple opencl program meets "Unsupported calling convention: AMDGPU_KERNEL", while using the ROCm-forked llvm, the issue does not exists. Using gdb, I found that with vanilla llvm, one of the instructions I in

has OpCode=56 and SubclassData=365, which leads to 91 (CallingConv::AMDGPU_KERNEL) in followed steps (LowerCallTo, LowerCall, CCAssignFnForCall, etc), and finally get Unsupported calling convention issue. Gdb trace:
llvm-14.0.5-dbg-cmdline.log.gz

For ROCm forked llvm, there's no issue. My guess is that source1.cl is calling amdgpu kernel functions, but somehow it no I with 91 (CallingConv::AMDGPU_KERNEL) appearing, only 0 (CallingConv::C) is seen, so no unsupported calling convention issues (gdb trace:
ROCm-llvm-5.1.3-dbg-cmdline.log.gz). But I failed to find out which change they use to get things worked, due to lack of compiler knowledge, and inability to bisect ROCm forked llvm (encountered many build issues when bisecting).

Forgive me of using llvm-14 because I built the debug version of llvm/clang to start investigating th issue half a year earlier (original issue: https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/45). I can confirm that this situation still remains now for both llvm-15.0.7 and llvm-16

@llvmbot
Copy link
Member

llvmbot commented Apr 8, 2025

@llvm/issue-subscribers-clang-frontend

Author: None (littlewu2508)

I'm not sure whether this is a bug or a missing feature, but please look at https://github.com/littlewu2508/LLVMAMDGPUcodegenbug, which is the reproducing method.

The issue is, during codegen phase, some simple opencl program meets "Unsupported calling convention: AMDGPU_KERNEL", while using the ROCm-forked llvm, the issue does not exists. Using gdb, I found that with vanilla llvm, one of the instructions I in

has OpCode=56 and SubclassData=365, which leads to 91 (CallingConv::AMDGPU_KERNEL) in followed steps (LowerCallTo, LowerCall, CCAssignFnForCall, etc), and finally get Unsupported calling convention issue. Gdb trace:
llvm-14.0.5-dbg-cmdline.log.gz

For ROCm forked llvm, there's no issue. My guess is that source1.cl is calling amdgpu kernel functions, but somehow it no I with 91 (CallingConv::AMDGPU_KERNEL) appearing, only 0 (CallingConv::C) is seen, so no unsupported calling convention issues (gdb trace:
ROCm-llvm-5.1.3-dbg-cmdline.log.gz). But I failed to find out which change they use to get things worked, due to lack of compiler knowledge, and inability to bisect ROCm forked llvm (encountered many build issues when bisecting).

Forgive me of using llvm-14 because I built the debug version of llvm/clang to start investigating th issue half a year earlier (original issue: https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/45). I can confirm that this situation still remains now for both llvm-15.0.7 and llvm-16

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this issue Apr 8, 2025
…nel (#115821)

This feature is currently not supported in the compiler.
To facilitate this we emit a stub version of each kernel
function body with different name mangling scheme, and
replaces the respective kernel call-sites appropriately.

Fixes llvm/llvm-project#60313

D120566 was an earlier attempt made to upstream a solution
for this issue.

---------

Co-authored-by: anikelal <[email protected]>
@arsenm
Copy link
Contributor

arsenm commented Apr 8, 2025

Technically the bitcode will still fail, but we should upgrade this to an IR verifier error

searlmc1 pushed a commit to ROCm/llvm-project that referenced this issue Apr 8, 2025
This feature is currently not supported in the compiler.
To facilitate this we emit a stub version of each kernel
function body with different name mangling scheme, and
replaces the respective kernel call-sites appropriately.

Fixes llvm#60313

D120566 was an earlier attempt made to upstream a solution
for this.
@lalaniket8
Copy link
Contributor

Technically the bitcode will still fail, but we should upgrade this to an IR verifier error

PR created #134987

jsji pushed a commit to intel/llvm that referenced this issue Apr 23, 2025
This feature is currently not supported in the compiler.
To facilitate this we emit a stub version of each kernel
function body with different name mangling scheme, and
replaces the respective kernel call-sites appropriately.
    
Fixes llvm/llvm-project#60313
    
D120566 was an earlier attempt made to upstream a solution
for this issue.

---------

Co-authored-by: anikelal <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema"
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants