Skip to content

[AMDGPU] Emit AMDHSA kernel descriptors to .amdhsa.kd section #122930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Jan 14, 2025

Summary:
This patch emits all kernel descriptors to a new section rather than
mixing them in with the other symbols in .rodata or .text. The
advantage to this approach is that it will allow the runtime to memcpy
the whole section rather than copying individual symbols whose names end
in .kd.

Additionally, this fixes an issue where --gc-sections will trim these
symbols as they are not used by anything and simply exist as metadata.
Adding the SHF_GNU_RETAIN prevents this entirely.

Fixes: #119479

Summary:
This patch emits all kernel descriptors to a new section rather than
mixing  them in with the other symbols in `.rodata` or `.text`. The
advantage to this approach is that it will allow the runtime to memcpy
the whole section rather than copying individual symbols whose names end
in `.kd`.

Additionally, this fixes an issue where `--gc-sections` will trim these
symbols as they are not used by anything and simply exist as metadata.
Adding the `SHF_GNU_RETAIN` prevents this entirely.

Fixes: llvm#119479
@llvmbot
Copy link
Member

llvmbot commented Jan 14, 2025

@llvm/pr-subscribers-mc
@llvm/pr-subscribers-llvm-binary-utilities

@llvm/pr-subscribers-backend-amdgpu

Author: Joseph Huber (jhuber6)

Changes

Summary:
This patch emits all kernel descriptors to a new section rather than
mixing them in with the other symbols in .rodata or .text. The
advantage to this approach is that it will allow the runtime to memcpy
the whole section rather than copying individual symbols whose names end
in .kd.

Additionally, this fixes an issue where --gc-sections will trim these
symbols as they are not used by anything and simply exist as metadata.
Adding the SHF_GNU_RETAIN prevents this entirely.

Fixes: #119479


Patch is 63.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/122930.diff

31 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp (+6)
  • (modified) llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/code-object-v3.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-off.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-kd-xnack-on.ll (+2-2)
  • (modified) llvm/test/MC/AMDGPU/amdhsa-kd-kernarg-preload.s (+2-2)
  • (modified) llvm/test/MC/AMDGPU/hsa-amdgpu-exprs.s (+1-1)
  • (modified) llvm/test/MC/AMDGPU/hsa-gfx12-v4.s (+9-8)
  • (modified) llvm/test/MC/AMDGPU/hsa-sgpr-init-bug-v3.s (+2-2)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx10.s (+2-2)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx11.s (+2-2)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx12.s (+2-2)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx7.s (+2-2)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx8.s (+2-2)
  • (modified) llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx90a.s (+2-2)
  • (modified) llvm/test/MC/AMDGPU/hsa-tg-split.s (+2-2)
  • (modified) llvm/test/MC/AMDGPU/hsa-v4.s (+9-8)
  • (modified) llvm/test/MC/AMDGPU/hsa-v5-uses-dynamic-stack.s (+10-9)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-cov5.s (+3-3)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx10.s (+4-4)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx11.s (+4-4)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx12.s (+2-2)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx90a.s (+3-3)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx950.s (+1-1)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s (+3-3)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s (+3-3)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx10.s (+3-3)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx9.s (+2-2)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-raw.s (+2-2)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/subtarget.ll (+86-86)
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
index eccd77d6c00f0b..e0528e17885b80 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
@@ -980,7 +980,12 @@ void AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor(
   if (KernelCodeSymbol->getVisibility() == ELF::STV_DEFAULT)
     KernelCodeSymbol->setVisibility(ELF::STV_PROTECTED);
 
+  Streamer.pushSection();
+  Streamer.switchSection(Context.getELFSection(
+      ".amdhsa.kd", ELF::SHT_PROGBITS, ELF::SHF_ALLOC | ELF::SHF_GNU_RETAIN));
   Streamer.emitLabel(KernelDescriptorSymbol);
+  Streamer.emitValueToAlignment(Align(alignof(amdhsa::kernel_descriptor_t)), 0,
+                                1, 0);
   Streamer.emitValue(
       KernelDescriptor.group_segment_fixed_size,
       sizeof(amdhsa::kernel_descriptor_t::group_segment_fixed_size));
@@ -1020,4 +1025,5 @@ void AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor(
                      sizeof(amdhsa::kernel_descriptor_t::kernarg_preload));
   for (uint32_t i = 0; i < sizeof(amdhsa::kernel_descriptor_t::reserved3); ++i)
     Streamer.emitInt8(0u);
+  Streamer.popSection();
 }
diff --git a/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll b/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll
index 835e5e5f06ef0f..70e085d863b609 100644
--- a/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll
@@ -1,7 +1,7 @@
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -filetype=obj < %s | llvm-objdump -s -j .rodata - | FileCheck --check-prefix=OBJDUMP %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -filetype=obj < %s | llvm-objdump -s -j .amdhsa.kd - | FileCheck --check-prefix=OBJDUMP %s
 ; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 < %s | FileCheck --check-prefix=ASM %s
 
-; OBJDUMP: Contents of section .rodata:
+; OBJDUMP: Contents of section .amdhsa.kd:
 ; OBJDUMP-NEXT: 0000 00000000 00000000 10010000 00000000  ................
 ; OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000  ................
 ; OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000000  ................
diff --git a/llvm/test/CodeGen/AMDGPU/code-object-v3.ll b/llvm/test/CodeGen/AMDGPU/code-object-v3.ll
index ee4a2ed883b638..b1ab87d8fe7272 100644
--- a/llvm/test/CodeGen/AMDGPU/code-object-v3.ll
+++ b/llvm/test/CodeGen/AMDGPU/code-object-v3.ll
@@ -51,7 +51,7 @@
 ; OSABI-AMDHSA-ELF: .text   PROGBITS {{[0-9]+}} {{[0-9]+}} {{[0-9a-f]+}} {{[0-9]+}} AX {{[0-9]+}} {{[0-9]+}} 256
 ; OSABI-AMDHSA-ELF: .rodata PROGBITS {{[0-9]+}} {{[0-9]+}} {{[0-9a-f]+}} {{[0-9]+}}  A {{[0-9]+}} {{[0-9]+}} 64
 
-; OSABI-AMDHSA-ELF: Relocation section '.rela.rodata' at offset
+; OSABI-AMDHSA-ELF: Relocation section '.rela.amdhsa.kd' at offset
 ; OSABI-AMDHSA-ELF: R_AMDGPU_REL64 0000000000000000 fadd + 10
 ; OSABI-AMDHSA-ELF: R_AMDGPU_REL64 0000000000000100 fsub + 10
 ; OSABI-AMDHSA-ELF: R_AMDGPU_REL64 0000000000000200 empty + 10
diff --git a/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll b/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll
index 19d633651fdd0d..eb4343509a6160 100644
--- a/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll
+++ b/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll
@@ -1,5 +1,5 @@
 ; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a < %s | FileCheck --check-prefixes=ASM %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a --filetype=obj < %s | llvm-objdump -s -j .rodata - | FileCheck --check-prefixes=OBJ %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a --filetype=obj < %s | llvm-objdump -s -j .amdhsa.kd - | FileCheck --check-prefixes=OBJ %s
 ; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a --filetype=obj < %s | llvm-readelf --notes - | FileCheck --check-prefixes=ELF %s
 
 ; TODO: Update to check for granulated sgpr count directive once one is added.
@@ -10,7 +10,7 @@ define amdgpu_kernel void @kern() #0 {
 ; ASM: .amdhsa_reserve_xnack_mask 1
 
 ; Verify that an extra SGPR block is reserved with XNACK "any" tid setting.
-; OBJ: Contents of section .rodata:
+; OBJ: Contents of section .amdhsa.kd:
 ; OBJ-NEXT: 0000 00000000 00000000 00000000 00000000  ................
 ; OBJ-NEXT: 0010 00000000 00000000 00000000 00000000  ................
 ; OBJ-NEXT: 0020 00000000 00000000 00000000 00000000  ................
diff --git a/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-off.ll b/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-off.ll
index 2097579e0c9959..8336378287aa56 100644
--- a/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-off.ll
+++ b/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-off.ll
@@ -1,5 +1,5 @@
 ; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=-xnack < %s | FileCheck --check-prefixes=ASM %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=-xnack --filetype=obj < %s | llvm-objdump -s -j .rodata - | FileCheck --check-prefixes=OBJ %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=-xnack --filetype=obj < %s | llvm-objdump -s -j .amdhsa.kd - | FileCheck --check-prefixes=OBJ %s
 ; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=-xnack --filetype=obj < %s | llvm-readelf --notes - | FileCheck --check-prefixes=ELF %s
 
 ; TODO: Update to check for granulated sgpr count directive once one is added.
@@ -10,7 +10,7 @@ define amdgpu_kernel void @kern() #0 {
 ; ASM: .amdhsa_reserve_xnack_mask 0
 
 ; Verify that an extra SGPR block is not reserved with XNACK "off" tid setting.
-; OBJ: Contents of section .rodata:
+; OBJ: Contents of section .amdhsa.kd:
 ; OBJ-NEXT: 0000 00000000 00000000 00000000 00000000  ................
 ; OBJ-NEXT: 0010 00000000 00000000 00000000 00000000  ................
 ; OBJ-NEXT: 0020 00000000 00000000 00000000 00000000  ................
diff --git a/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-on.ll b/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-on.ll
index 775c62e73261a9..492abc9edd5d3c 100644
--- a/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-on.ll
+++ b/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-on.ll
@@ -1,5 +1,5 @@
 ; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+xnack < %s | FileCheck --check-prefixes=ASM %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+xnack --filetype=obj < %s | llvm-objdump -s -j .rodata - | FileCheck --check-prefixes=OBJ %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+xnack --filetype=obj < %s | llvm-objdump -s -j .amdhsa.kd - | FileCheck --check-prefixes=OBJ %s
 ; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+xnack --filetype=obj < %s | llvm-readelf --notes - | FileCheck --check-prefixes=ELF %s
 
 ; TODO: Update to check for granulated sgpr count directive once one is added.
@@ -10,7 +10,7 @@ define amdgpu_kernel void @kern() #0 {
 ; ASM: .amdhsa_reserve_xnack_mask 1
 
 ; Verify that an extra SGPR block is reserved with XNACK "on" tid setting.
-; OBJ: Contents of section .rodata:
+; OBJ: Contents of section .amdhsa.kd:
 ; OBJ-NEXT: 0000 00000000 00000000 00000000 00000000  ................
 ; OBJ-NEXT: 0010 00000000 00000000 00000000 00000000  ................
 ; OBJ-NEXT: 0020 00000000 00000000 00000000 00000000  ................
diff --git a/llvm/test/MC/AMDGPU/amdhsa-kd-kernarg-preload.s b/llvm/test/MC/AMDGPU/amdhsa-kd-kernarg-preload.s
index f4ae23fc0aa7b9..b2f70edf156906 100644
--- a/llvm/test/MC/AMDGPU/amdhsa-kd-kernarg-preload.s
+++ b/llvm/test/MC/AMDGPU/amdhsa-kd-kernarg-preload.s
@@ -1,4 +1,4 @@
-// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx940 -filetype=obj < %s -o - | llvm-objdump -s -j .rodata - | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx940 -filetype=obj < %s -o - | llvm-objdump -s -j .amdhsa.kd - | FileCheck --check-prefix=OBJDUMP %s
 
 .amdgcn_target "amdgcn-amd-amdhsa--gfx940"
 
@@ -6,7 +6,7 @@
 
 // Account for preload kernarg SGPRs in KD field GRANULATED_WAVEFRONT_SGPR_COUNT.
 
-// OBJDUMP:      Contents of section .rodata:
+// OBJDUMP:      Contents of section .amdhsa.kd:
 // OBJDUMP-NEXT: 0000 00000000 00000000 00000000 00000000  ................
 // OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000  ................
 // OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000000  ................
diff --git a/llvm/test/MC/AMDGPU/hsa-amdgpu-exprs.s b/llvm/test/MC/AMDGPU/hsa-amdgpu-exprs.s
index 4623500987be88..fd2010519b1495 100644
--- a/llvm/test/MC/AMDGPU/hsa-amdgpu-exprs.s
+++ b/llvm/test/MC/AMDGPU/hsa-amdgpu-exprs.s
@@ -1,6 +1,6 @@
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx90a < %s | FileCheck --check-prefix=ASM %s
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx90a -filetype=obj < %s > %t
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
 
 // OBJDUMP:       0000 00000000 0f000000 00000000 00000000
 
diff --git a/llvm/test/MC/AMDGPU/hsa-gfx12-v4.s b/llvm/test/MC/AMDGPU/hsa-gfx12-v4.s
index ea649bc76116ad..75e32150d3a9c0 100644
--- a/llvm/test/MC/AMDGPU/hsa-gfx12-v4.s
+++ b/llvm/test/MC/AMDGPU/hsa-gfx12-v4.s
@@ -1,13 +1,14 @@
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx1200 < %s | FileCheck --check-prefix=ASM %s
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx1200 -filetype=obj < %s > %t
 // RUN: llvm-readelf -S -r -s %t | FileCheck --check-prefix=READOBJ %s
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
 
 // READOBJ: Section Headers
 // READOBJ: .text   PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} {{[0-9a-f]+}} {{[0-9]+}} AX {{[0-9]+}} {{[0-9]+}} 256
-// READOBJ: .rodata PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}}        000100 {{[0-9]+}}  A {{[0-9]+}} {{[0-9]+}} 64
+// READOBJ: .rodata PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}}        000000 {{[0-9]+}}  A {{[0-9]+}} {{[0-9]+}} 64
+// READOBJ: .amdhsa.kd PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}}        000100 {{[0-9]+}}  AR {{[0-9]+}} {{[0-9]+}} 8
 
-// READOBJ: Relocation section '.rela.rodata' at offset
+// READOBJ: Relocation section '.rela.amdhsa.kd' at offset
 // READOBJ: 0000000000000010 {{[0-9a-f]+}}00000005 R_AMDGPU_REL64 0000000000000000 .text + 10
 // READOBJ: 0000000000000050 {{[0-9a-f]+}}00000005 R_AMDGPU_REL64 0000000000000000 .text + 110
 // READOBJ: 0000000000000090 {{[0-9a-f]+}}00000005 R_AMDGPU_REL64 0000000000000000 .text + 210
@@ -18,12 +19,12 @@
 // READOBJ-NEXT: 0000000000000100  0 FUNC    LOCAL  PROTECTED 2 complete
 // READOBJ-NEXT: 0000000000000200  0 FUNC    LOCAL  PROTECTED 2 special_sgpr
 // READOBJ-NEXT: 0000000000000300  0 FUNC    LOCAL  PROTECTED 2 disabled_user_sgpr
-// READOBJ-NEXT: 0000000000000000 64 OBJECT  LOCAL  DEFAULT   3 minimal.kd
-// READOBJ-NEXT: 0000000000000040 64 OBJECT  LOCAL  DEFAULT   3 complete.kd
-// READOBJ-NEXT: 0000000000000080 64 OBJECT  LOCAL  DEFAULT   3 special_sgpr.kd
-// READOBJ-NEXT: 00000000000000c0 64 OBJECT  LOCAL  DEFAULT   3 disabled_user_sgpr.kd
+// READOBJ-NEXT: 0000000000000000 64 OBJECT LOCAL DEFAULT 4 minimal.kd 
+// READOBJ-NEXT: 0000000000000040 64 OBJECT LOCAL DEFAULT 4 complete.kd 
+// READOBJ-NEXT: 0000000000000080 64 OBJECT LOCAL DEFAULT 4 special_sgpr.kd 
+// READOBJ-NEXT: 00000000000000c0 64 OBJECT LOCAL DEFAULT 4 disabled_user_sgpr.kd 
 
-// OBJDUMP: Contents of section .rodata
+// OBJDUMP: Contents of section .amdhsa.kd
 // Note, relocation for KERNEL_CODE_ENTRY_BYTE_OFFSET is not resolved here.
 // minimal
 // OBJDUMP-NEXT: 0000 00000000 00000000 00000000 00000000
diff --git a/llvm/test/MC/AMDGPU/hsa-sgpr-init-bug-v3.s b/llvm/test/MC/AMDGPU/hsa-sgpr-init-bug-v3.s
index 2c83329df03543..8ebeb36d521ab3 100644
--- a/llvm/test/MC/AMDGPU/hsa-sgpr-init-bug-v3.s
+++ b/llvm/test/MC/AMDGPU/hsa-sgpr-init-bug-v3.s
@@ -1,10 +1,10 @@
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx802 -filetype=obj < %s > %t
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
 
 // Check that SGPR init bug on gfx803 is corrected by the assembler, setting
 // GRANULATED_WAVEFRONT_SGPR_COUNT to 11.
 
-// OBJDUMP: Contents of section .rodata
+// OBJDUMP: Contents of section .amdhsa.kd
 // OBJDUMP-NEXT: 0000 00000000 00000000 00000000 00000000
 // OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
 // OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000000
diff --git a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx10.s b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx10.s
index bec717e4137df2..f7282248e5aaee 100644
--- a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx10.s
+++ b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx10.s
@@ -1,11 +1,11 @@
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx1010 < %s | FileCheck --check-prefix=ASM %s
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx1010 -filetype=obj < %s > %t
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
 
 // When going from asm -> asm, the expressions should remain the same (i.e., symbolic).
 // When going from asm -> obj, the expressions should get resolved (through fixups),
 
-// OBJDUMP: Contents of section .rodata
+// OBJDUMP: Contents of section .amdhsa.kd
 // expr_defined_later
 // OBJDUMP-NEXT: 0000 2b000000 2c000000 00000000 00000000
 // OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
diff --git a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx11.s b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx11.s
index 85a7ad05b00f48..d88b066413e808 100644
--- a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx11.s
+++ b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx11.s
@@ -1,11 +1,11 @@
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx1100 < %s | FileCheck --check-prefix=ASM %s
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx1100 -filetype=obj < %s > %t
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
 
 // When going from asm -> asm, the expressions should remain the same (i.e., symbolic).
 // When going from asm -> obj, the expressions should get resolved (through fixups),
 
-// OBJDUMP: Contents of section .rodata
+// OBJDUMP: Contents of section .amdhsa.kd
 // expr_defined_later
 // OBJDUMP-NEXT: 0000 2b000000 2c000000 00000000 00000000
 // OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
diff --git a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx12.s b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx12.s
index 51d0fb30b320c5..ce76b7370c8b8c 100644
--- a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx12.s
+++ b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx12.s
@@ -1,11 +1,11 @@
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx1200 < %s | FileCheck --check-prefix=ASM %s
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx1200 -filetype=obj < %s > %t
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
 
 // When going from asm -> asm, the expressions should remain the same (i.e., symbolic).
 // When going from asm -> obj, the expressions should get resolved (through fixups),
 
-// OBJDUMP: Contents of section .rodata
+// OBJDUMP: Contents of section .amdhsa.kd
 // expr_defined_later
 // OBJDUMP-NEXT: 0000 2b000000 2c000000 00000000 00000000
 // OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
diff --git a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx7.s b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx7.s
index 485f48c695c4de..7123b2c7469ea0 100644
--- a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx7.s
+++ b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx7.s
@@ -1,11 +1,11 @@
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx700 < %s | FileCheck --check-prefix=ASM %s
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx700 -filetype=obj < %s > %t
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
 
 // When going from asm -> asm, the expressions should remain the same (i.e., symbolic).
 // When going from asm -> obj, the expressions should get resolved (through fixups),
 
-// OBJDUMP: Contents of section .rodata
+// OBJDUMP: Contents of section .amdhsa.kd
 // expr_defined_later
 // OBJDUMP-NEXT: 0000 2b000000 2c000000 00000000 00000000
 // OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
diff --git a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx8.s b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx8.s
index 0d2e066113ee86..49d66004e46e3d 100644
--- a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx8.s
+++ b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx8.s
@@ -1,12 +1,12 @@
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx801 < %s | FileCheck --check-prefix=ASM %s
 
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx801 -filetype=obj < %s > %t
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
 
 // When going from asm -> asm, the expressions should remain the same (i.e., symbolic).
 // When going from asm -> obj, the expressions should get resolved (through fixups),
 
-// OBJDUMP: Contents of section .rodata
+// OBJDUMP: Contents of section .amdhsa.kd 
 // expr_defined_later
 // OBJDUMP-NEXT: 0000 2b000000 2c000000 00000000 00000000
 // OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
diff --git a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx90a.s b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx90a.s
index 88b5e23a6f2c5f..7ecda2bb0c08ee 100644
--- a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx90a.s
+++ b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx90a.s
@@ -1,11 +1,11 @@
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx90a < %s | FileCheck --check-prefix=ASM %s
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx90a -filetype=obj < %s > %t
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
 
 // When going from asm -> asm, the expressions should remain the same (i.e., symbolic).
 // When going from asm -> obj, the expressions should get resolved (through fixups),
 
-// OBJDUMP: Contents of section .rodata
+// OBJDUMP: Contents of section .amdhsa.kd
 // expr_defined_later
 // OBJDUMP-NEXT: 0000 00000000 00000000 00000000 00000000
 // OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
diff --git a/llvm/test/MC/AMDGPU/hsa-tg-split.s b/llvm/test/MC/AMDGPU/hsa-tg-split.s
index ca3de214a64a05..4d8a8abcb3f9ea 100644
--- a/llvm/test/MC/AMDGPU/hsa-tg-split.s
+++ b/llvm/test/MC/AMDGPU/hsa-tg-split.s
@@ -1,8 +1,8 @@
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+xnack,+tgsplit < %s | FileCheck --check-prefix=ASM %s
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+xnack,+tgsplit -filetype=obj < %s > %t
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
 
-// OBJDUMP: Contents of section .rodata
+// OBJDUMP: Contents of section .amdhsa.kd
 // OBJDUMP-NEXT: 0000 00000000 00000000 00000000 00000000
 // OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
 // OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000100
diff --git a/llvm/test/MC/AMDGPU/hsa-v4.s b/llvm/test/MC/AMDGPU/hsa-v4.s
index 931b4e874630b4..ffbe644e222fd8 100644
--- a/llvm/test/MC/AMDGPU/hsa-v4.s
+++ b/llvm/test/MC/AMDGPU/hsa-v4.s
@@ -1,13 +1,14 @@
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx904 -mattr=+xnack < %s | FileCheck --check-prefix=ASM %s
 // RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx904 -mattr=+xnack -filetype=obj < %s > %t
 // RUN: llvm-readelf -S -r -s %t | FileCheck --check-prefix=READOBJ %s
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
 
 // READOBJ: Section Headers
 // READOBJ: .text   PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} {{[0-9a-f]+}} {{[0-9]+}} AX {{[0-9]+}} {{[0-9]+}} 256
-// READOBJ: .rodata PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}}        000100 {{[0-9]+}}  A {{[0-9]+}} {{[0-9]+}} 64
+// READOBJ: .rodata PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}}        000000 {{[0-9]+}}  A {{[0-9]+}} {{[0-9]+}} 64
+// READOBJ: .amdhsa.kd PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}}        000100 {{[0-9]+}}  AR {{[0-9]+}} {{[0-9]+}} 8
 
-// READOBJ: Relocation section '.rela.rodata' at offset
+// READOBJ: Relocation section '.rela.amdhsa.kd' at o...
[truncated]

@b-sumner
Copy link

This seems to be an ABI change and likely affect tools. It seems like we would need a new code object version.

@yxsamliu yxsamliu requested a review from kzhuravl January 14, 2025 17:01
@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 14, 2025

This seems to be an ABI change and likely affect tools. It seems like we would need a new code object version.

Unlikely, all the tools I've looked at did a brute force

foreach symbol in symtab:
  if symbol.ends_with(".kd")
     kernel_descriptor

Which is unaffected by this patch. However, once we have a new code object version that also contains this code, we can change the runtime to instead memcpy this section without scanning the whole symbol table, that'll save a handful of cycles.

@yxsamliu
Copy link
Collaborator

Do we need ROCm runtime change in place before this change?

@arsenm
Copy link
Contributor

arsenm commented Jan 14, 2025

This seems to be an ABI change and likely affect tools. It seems like we would need a new code object version.

We can advertise this in the next version to get tools to use the section instead of relying on the symbol names, but I don't think we need to leave LTO broken for the current versions

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 14, 2025

Do we need ROCm runtime change in place before this change?

I don't think so, but I'll do some more tests in the meantime.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 14, 2025

Hm, this does seem to cause some issues with libc at least, I'll need to look into it further.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 14, 2025

The call to hsa_executable_get_symbol_by_name works just fine it seems, but there's some unknown problem when actually running the image. Will need to figure out why.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also add this to the release notes

@@ -980,7 +980,12 @@ void AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor(
if (KernelCodeSymbol->getVisibility() == ELF::STV_DEFAULT)
KernelCodeSymbol->setVisibility(ELF::STV_PROTECTED);

Streamer.pushSection();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functions seems to be called during parser time. Otherwise, if it's only called at finish time`, pushSection/popSection would be unnecessary

@benvanik
Copy link
Contributor

Thanks @jhuber6! I patched this and confirmed it works for my usage (#119479). Looks like the test failures here were unrelated and it's ok to land?

@jhuber6
Copy link
Contributor Author

jhuber6 commented Feb 12, 2025

Thanks @jhuber6! I patched this and confirmed it works for my usage (#119479). Looks like the test failures here were unrelated and it's ok to land?

Sorry this one got away from me, it seems that this influences something weird inside of HSA. The actual metadata lookup that gets these sections works fine, but then it has issues when it tries to launch the kernel. I don't think there's a way to solve this in a reasonable amount of time with the cycles I have as it will likely require updating the runtime. We might need to do what was suggested prior and just emit some dummy relocations for these, or just mark the entire data section as GNU_RETAIN.

@benvanik
Copy link
Contributor

Darn, understandable. Dummy relocations or GNU_RETAIN sound reasonable and I'm sure would help me out. My workaround today is either to patch one of the changes in or glob all my files together into a single compilation unit so I don't need LTO - anything is better than that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AMDGPUTargetStreamer generates .kd symbols, breaking LTO requirement, may be discarded by --gc-sections
7 participants