-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[AMDGPU] Emit AMDHSA kernel descriptors to .amdhsa.kd
section
#122930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary: This patch emits all kernel descriptors to a new section rather than mixing them in with the other symbols in `.rodata` or `.text`. The advantage to this approach is that it will allow the runtime to memcpy the whole section rather than copying individual symbols whose names end in `.kd`. Additionally, this fixes an issue where `--gc-sections` will trim these symbols as they are not used by anything and simply exist as metadata. Adding the `SHF_GNU_RETAIN` prevents this entirely. Fixes: llvm#119479
@llvm/pr-subscribers-mc @llvm/pr-subscribers-backend-amdgpu Author: Joseph Huber (jhuber6) ChangesSummary: Additionally, this fixes an issue where Fixes: #119479 Patch is 63.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/122930.diff 31 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
index eccd77d6c00f0b..e0528e17885b80 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
@@ -980,7 +980,12 @@ void AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor(
if (KernelCodeSymbol->getVisibility() == ELF::STV_DEFAULT)
KernelCodeSymbol->setVisibility(ELF::STV_PROTECTED);
+ Streamer.pushSection();
+ Streamer.switchSection(Context.getELFSection(
+ ".amdhsa.kd", ELF::SHT_PROGBITS, ELF::SHF_ALLOC | ELF::SHF_GNU_RETAIN));
Streamer.emitLabel(KernelDescriptorSymbol);
+ Streamer.emitValueToAlignment(Align(alignof(amdhsa::kernel_descriptor_t)), 0,
+ 1, 0);
Streamer.emitValue(
KernelDescriptor.group_segment_fixed_size,
sizeof(amdhsa::kernel_descriptor_t::group_segment_fixed_size));
@@ -1020,4 +1025,5 @@ void AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor(
sizeof(amdhsa::kernel_descriptor_t::kernarg_preload));
for (uint32_t i = 0; i < sizeof(amdhsa::kernel_descriptor_t::reserved3); ++i)
Streamer.emitInt8(0u);
+ Streamer.popSection();
}
diff --git a/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll b/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll
index 835e5e5f06ef0f..70e085d863b609 100644
--- a/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll
@@ -1,7 +1,7 @@
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -filetype=obj < %s | llvm-objdump -s -j .rodata - | FileCheck --check-prefix=OBJDUMP %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -filetype=obj < %s | llvm-objdump -s -j .amdhsa.kd - | FileCheck --check-prefix=OBJDUMP %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 < %s | FileCheck --check-prefix=ASM %s
-; OBJDUMP: Contents of section .rodata:
+; OBJDUMP: Contents of section .amdhsa.kd:
; OBJDUMP-NEXT: 0000 00000000 00000000 10010000 00000000 ................
; OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000 ................
; OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000000 ................
diff --git a/llvm/test/CodeGen/AMDGPU/code-object-v3.ll b/llvm/test/CodeGen/AMDGPU/code-object-v3.ll
index ee4a2ed883b638..b1ab87d8fe7272 100644
--- a/llvm/test/CodeGen/AMDGPU/code-object-v3.ll
+++ b/llvm/test/CodeGen/AMDGPU/code-object-v3.ll
@@ -51,7 +51,7 @@
; OSABI-AMDHSA-ELF: .text PROGBITS {{[0-9]+}} {{[0-9]+}} {{[0-9a-f]+}} {{[0-9]+}} AX {{[0-9]+}} {{[0-9]+}} 256
; OSABI-AMDHSA-ELF: .rodata PROGBITS {{[0-9]+}} {{[0-9]+}} {{[0-9a-f]+}} {{[0-9]+}} A {{[0-9]+}} {{[0-9]+}} 64
-; OSABI-AMDHSA-ELF: Relocation section '.rela.rodata' at offset
+; OSABI-AMDHSA-ELF: Relocation section '.rela.amdhsa.kd' at offset
; OSABI-AMDHSA-ELF: R_AMDGPU_REL64 0000000000000000 fadd + 10
; OSABI-AMDHSA-ELF: R_AMDGPU_REL64 0000000000000100 fsub + 10
; OSABI-AMDHSA-ELF: R_AMDGPU_REL64 0000000000000200 empty + 10
diff --git a/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll b/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll
index 19d633651fdd0d..eb4343509a6160 100644
--- a/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll
+++ b/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll
@@ -1,5 +1,5 @@
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a < %s | FileCheck --check-prefixes=ASM %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a --filetype=obj < %s | llvm-objdump -s -j .rodata - | FileCheck --check-prefixes=OBJ %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a --filetype=obj < %s | llvm-objdump -s -j .amdhsa.kd - | FileCheck --check-prefixes=OBJ %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a --filetype=obj < %s | llvm-readelf --notes - | FileCheck --check-prefixes=ELF %s
; TODO: Update to check for granulated sgpr count directive once one is added.
@@ -10,7 +10,7 @@ define amdgpu_kernel void @kern() #0 {
; ASM: .amdhsa_reserve_xnack_mask 1
; Verify that an extra SGPR block is reserved with XNACK "any" tid setting.
-; OBJ: Contents of section .rodata:
+; OBJ: Contents of section .amdhsa.kd:
; OBJ-NEXT: 0000 00000000 00000000 00000000 00000000 ................
; OBJ-NEXT: 0010 00000000 00000000 00000000 00000000 ................
; OBJ-NEXT: 0020 00000000 00000000 00000000 00000000 ................
diff --git a/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-off.ll b/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-off.ll
index 2097579e0c9959..8336378287aa56 100644
--- a/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-off.ll
+++ b/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-off.ll
@@ -1,5 +1,5 @@
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=-xnack < %s | FileCheck --check-prefixes=ASM %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=-xnack --filetype=obj < %s | llvm-objdump -s -j .rodata - | FileCheck --check-prefixes=OBJ %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=-xnack --filetype=obj < %s | llvm-objdump -s -j .amdhsa.kd - | FileCheck --check-prefixes=OBJ %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=-xnack --filetype=obj < %s | llvm-readelf --notes - | FileCheck --check-prefixes=ELF %s
; TODO: Update to check for granulated sgpr count directive once one is added.
@@ -10,7 +10,7 @@ define amdgpu_kernel void @kern() #0 {
; ASM: .amdhsa_reserve_xnack_mask 0
; Verify that an extra SGPR block is not reserved with XNACK "off" tid setting.
-; OBJ: Contents of section .rodata:
+; OBJ: Contents of section .amdhsa.kd:
; OBJ-NEXT: 0000 00000000 00000000 00000000 00000000 ................
; OBJ-NEXT: 0010 00000000 00000000 00000000 00000000 ................
; OBJ-NEXT: 0020 00000000 00000000 00000000 00000000 ................
diff --git a/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-on.ll b/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-on.ll
index 775c62e73261a9..492abc9edd5d3c 100644
--- a/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-on.ll
+++ b/llvm/test/CodeGen/AMDGPU/tid-kd-xnack-on.ll
@@ -1,5 +1,5 @@
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+xnack < %s | FileCheck --check-prefixes=ASM %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+xnack --filetype=obj < %s | llvm-objdump -s -j .rodata - | FileCheck --check-prefixes=OBJ %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+xnack --filetype=obj < %s | llvm-objdump -s -j .amdhsa.kd - | FileCheck --check-prefixes=OBJ %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+xnack --filetype=obj < %s | llvm-readelf --notes - | FileCheck --check-prefixes=ELF %s
; TODO: Update to check for granulated sgpr count directive once one is added.
@@ -10,7 +10,7 @@ define amdgpu_kernel void @kern() #0 {
; ASM: .amdhsa_reserve_xnack_mask 1
; Verify that an extra SGPR block is reserved with XNACK "on" tid setting.
-; OBJ: Contents of section .rodata:
+; OBJ: Contents of section .amdhsa.kd:
; OBJ-NEXT: 0000 00000000 00000000 00000000 00000000 ................
; OBJ-NEXT: 0010 00000000 00000000 00000000 00000000 ................
; OBJ-NEXT: 0020 00000000 00000000 00000000 00000000 ................
diff --git a/llvm/test/MC/AMDGPU/amdhsa-kd-kernarg-preload.s b/llvm/test/MC/AMDGPU/amdhsa-kd-kernarg-preload.s
index f4ae23fc0aa7b9..b2f70edf156906 100644
--- a/llvm/test/MC/AMDGPU/amdhsa-kd-kernarg-preload.s
+++ b/llvm/test/MC/AMDGPU/amdhsa-kd-kernarg-preload.s
@@ -1,4 +1,4 @@
-// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx940 -filetype=obj < %s -o - | llvm-objdump -s -j .rodata - | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx940 -filetype=obj < %s -o - | llvm-objdump -s -j .amdhsa.kd - | FileCheck --check-prefix=OBJDUMP %s
.amdgcn_target "amdgcn-amd-amdhsa--gfx940"
@@ -6,7 +6,7 @@
// Account for preload kernarg SGPRs in KD field GRANULATED_WAVEFRONT_SGPR_COUNT.
-// OBJDUMP: Contents of section .rodata:
+// OBJDUMP: Contents of section .amdhsa.kd:
// OBJDUMP-NEXT: 0000 00000000 00000000 00000000 00000000 ................
// OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000 ................
// OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000000 ................
diff --git a/llvm/test/MC/AMDGPU/hsa-amdgpu-exprs.s b/llvm/test/MC/AMDGPU/hsa-amdgpu-exprs.s
index 4623500987be88..fd2010519b1495 100644
--- a/llvm/test/MC/AMDGPU/hsa-amdgpu-exprs.s
+++ b/llvm/test/MC/AMDGPU/hsa-amdgpu-exprs.s
@@ -1,6 +1,6 @@
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx90a < %s | FileCheck --check-prefix=ASM %s
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx90a -filetype=obj < %s > %t
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
// OBJDUMP: 0000 00000000 0f000000 00000000 00000000
diff --git a/llvm/test/MC/AMDGPU/hsa-gfx12-v4.s b/llvm/test/MC/AMDGPU/hsa-gfx12-v4.s
index ea649bc76116ad..75e32150d3a9c0 100644
--- a/llvm/test/MC/AMDGPU/hsa-gfx12-v4.s
+++ b/llvm/test/MC/AMDGPU/hsa-gfx12-v4.s
@@ -1,13 +1,14 @@
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx1200 < %s | FileCheck --check-prefix=ASM %s
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx1200 -filetype=obj < %s > %t
// RUN: llvm-readelf -S -r -s %t | FileCheck --check-prefix=READOBJ %s
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
// READOBJ: Section Headers
// READOBJ: .text PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} {{[0-9a-f]+}} {{[0-9]+}} AX {{[0-9]+}} {{[0-9]+}} 256
-// READOBJ: .rodata PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} 000100 {{[0-9]+}} A {{[0-9]+}} {{[0-9]+}} 64
+// READOBJ: .rodata PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} 000000 {{[0-9]+}} A {{[0-9]+}} {{[0-9]+}} 64
+// READOBJ: .amdhsa.kd PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} 000100 {{[0-9]+}} AR {{[0-9]+}} {{[0-9]+}} 8
-// READOBJ: Relocation section '.rela.rodata' at offset
+// READOBJ: Relocation section '.rela.amdhsa.kd' at offset
// READOBJ: 0000000000000010 {{[0-9a-f]+}}00000005 R_AMDGPU_REL64 0000000000000000 .text + 10
// READOBJ: 0000000000000050 {{[0-9a-f]+}}00000005 R_AMDGPU_REL64 0000000000000000 .text + 110
// READOBJ: 0000000000000090 {{[0-9a-f]+}}00000005 R_AMDGPU_REL64 0000000000000000 .text + 210
@@ -18,12 +19,12 @@
// READOBJ-NEXT: 0000000000000100 0 FUNC LOCAL PROTECTED 2 complete
// READOBJ-NEXT: 0000000000000200 0 FUNC LOCAL PROTECTED 2 special_sgpr
// READOBJ-NEXT: 0000000000000300 0 FUNC LOCAL PROTECTED 2 disabled_user_sgpr
-// READOBJ-NEXT: 0000000000000000 64 OBJECT LOCAL DEFAULT 3 minimal.kd
-// READOBJ-NEXT: 0000000000000040 64 OBJECT LOCAL DEFAULT 3 complete.kd
-// READOBJ-NEXT: 0000000000000080 64 OBJECT LOCAL DEFAULT 3 special_sgpr.kd
-// READOBJ-NEXT: 00000000000000c0 64 OBJECT LOCAL DEFAULT 3 disabled_user_sgpr.kd
+// READOBJ-NEXT: 0000000000000000 64 OBJECT LOCAL DEFAULT 4 minimal.kd
+// READOBJ-NEXT: 0000000000000040 64 OBJECT LOCAL DEFAULT 4 complete.kd
+// READOBJ-NEXT: 0000000000000080 64 OBJECT LOCAL DEFAULT 4 special_sgpr.kd
+// READOBJ-NEXT: 00000000000000c0 64 OBJECT LOCAL DEFAULT 4 disabled_user_sgpr.kd
-// OBJDUMP: Contents of section .rodata
+// OBJDUMP: Contents of section .amdhsa.kd
// Note, relocation for KERNEL_CODE_ENTRY_BYTE_OFFSET is not resolved here.
// minimal
// OBJDUMP-NEXT: 0000 00000000 00000000 00000000 00000000
diff --git a/llvm/test/MC/AMDGPU/hsa-sgpr-init-bug-v3.s b/llvm/test/MC/AMDGPU/hsa-sgpr-init-bug-v3.s
index 2c83329df03543..8ebeb36d521ab3 100644
--- a/llvm/test/MC/AMDGPU/hsa-sgpr-init-bug-v3.s
+++ b/llvm/test/MC/AMDGPU/hsa-sgpr-init-bug-v3.s
@@ -1,10 +1,10 @@
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx802 -filetype=obj < %s > %t
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
// Check that SGPR init bug on gfx803 is corrected by the assembler, setting
// GRANULATED_WAVEFRONT_SGPR_COUNT to 11.
-// OBJDUMP: Contents of section .rodata
+// OBJDUMP: Contents of section .amdhsa.kd
// OBJDUMP-NEXT: 0000 00000000 00000000 00000000 00000000
// OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
// OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000000
diff --git a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx10.s b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx10.s
index bec717e4137df2..f7282248e5aaee 100644
--- a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx10.s
+++ b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx10.s
@@ -1,11 +1,11 @@
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx1010 < %s | FileCheck --check-prefix=ASM %s
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx1010 -filetype=obj < %s > %t
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
// When going from asm -> asm, the expressions should remain the same (i.e., symbolic).
// When going from asm -> obj, the expressions should get resolved (through fixups),
-// OBJDUMP: Contents of section .rodata
+// OBJDUMP: Contents of section .amdhsa.kd
// expr_defined_later
// OBJDUMP-NEXT: 0000 2b000000 2c000000 00000000 00000000
// OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
diff --git a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx11.s b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx11.s
index 85a7ad05b00f48..d88b066413e808 100644
--- a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx11.s
+++ b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx11.s
@@ -1,11 +1,11 @@
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx1100 < %s | FileCheck --check-prefix=ASM %s
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx1100 -filetype=obj < %s > %t
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
// When going from asm -> asm, the expressions should remain the same (i.e., symbolic).
// When going from asm -> obj, the expressions should get resolved (through fixups),
-// OBJDUMP: Contents of section .rodata
+// OBJDUMP: Contents of section .amdhsa.kd
// expr_defined_later
// OBJDUMP-NEXT: 0000 2b000000 2c000000 00000000 00000000
// OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
diff --git a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx12.s b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx12.s
index 51d0fb30b320c5..ce76b7370c8b8c 100644
--- a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx12.s
+++ b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx12.s
@@ -1,11 +1,11 @@
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx1200 < %s | FileCheck --check-prefix=ASM %s
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx1200 -filetype=obj < %s > %t
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
// When going from asm -> asm, the expressions should remain the same (i.e., symbolic).
// When going from asm -> obj, the expressions should get resolved (through fixups),
-// OBJDUMP: Contents of section .rodata
+// OBJDUMP: Contents of section .amdhsa.kd
// expr_defined_later
// OBJDUMP-NEXT: 0000 2b000000 2c000000 00000000 00000000
// OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
diff --git a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx7.s b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx7.s
index 485f48c695c4de..7123b2c7469ea0 100644
--- a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx7.s
+++ b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx7.s
@@ -1,11 +1,11 @@
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx700 < %s | FileCheck --check-prefix=ASM %s
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx700 -filetype=obj < %s > %t
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
// When going from asm -> asm, the expressions should remain the same (i.e., symbolic).
// When going from asm -> obj, the expressions should get resolved (through fixups),
-// OBJDUMP: Contents of section .rodata
+// OBJDUMP: Contents of section .amdhsa.kd
// expr_defined_later
// OBJDUMP-NEXT: 0000 2b000000 2c000000 00000000 00000000
// OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
diff --git a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx8.s b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx8.s
index 0d2e066113ee86..49d66004e46e3d 100644
--- a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx8.s
+++ b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx8.s
@@ -1,12 +1,12 @@
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx801 < %s | FileCheck --check-prefix=ASM %s
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx801 -filetype=obj < %s > %t
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
// When going from asm -> asm, the expressions should remain the same (i.e., symbolic).
// When going from asm -> obj, the expressions should get resolved (through fixups),
-// OBJDUMP: Contents of section .rodata
+// OBJDUMP: Contents of section .amdhsa.kd
// expr_defined_later
// OBJDUMP-NEXT: 0000 2b000000 2c000000 00000000 00000000
// OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
diff --git a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx90a.s b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx90a.s
index 88b5e23a6f2c5f..7ecda2bb0c08ee 100644
--- a/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx90a.s
+++ b/llvm/test/MC/AMDGPU/hsa-sym-exprs-gfx90a.s
@@ -1,11 +1,11 @@
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx90a < %s | FileCheck --check-prefix=ASM %s
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx90a -filetype=obj < %s > %t
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
// When going from asm -> asm, the expressions should remain the same (i.e., symbolic).
// When going from asm -> obj, the expressions should get resolved (through fixups),
-// OBJDUMP: Contents of section .rodata
+// OBJDUMP: Contents of section .amdhsa.kd
// expr_defined_later
// OBJDUMP-NEXT: 0000 00000000 00000000 00000000 00000000
// OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
diff --git a/llvm/test/MC/AMDGPU/hsa-tg-split.s b/llvm/test/MC/AMDGPU/hsa-tg-split.s
index ca3de214a64a05..4d8a8abcb3f9ea 100644
--- a/llvm/test/MC/AMDGPU/hsa-tg-split.s
+++ b/llvm/test/MC/AMDGPU/hsa-tg-split.s
@@ -1,8 +1,8 @@
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+xnack,+tgsplit < %s | FileCheck --check-prefix=ASM %s
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+xnack,+tgsplit -filetype=obj < %s > %t
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
-// OBJDUMP: Contents of section .rodata
+// OBJDUMP: Contents of section .amdhsa.kd
// OBJDUMP-NEXT: 0000 00000000 00000000 00000000 00000000
// OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
// OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000100
diff --git a/llvm/test/MC/AMDGPU/hsa-v4.s b/llvm/test/MC/AMDGPU/hsa-v4.s
index 931b4e874630b4..ffbe644e222fd8 100644
--- a/llvm/test/MC/AMDGPU/hsa-v4.s
+++ b/llvm/test/MC/AMDGPU/hsa-v4.s
@@ -1,13 +1,14 @@
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx904 -mattr=+xnack < %s | FileCheck --check-prefix=ASM %s
// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx904 -mattr=+xnack -filetype=obj < %s > %t
// RUN: llvm-readelf -S -r -s %t | FileCheck --check-prefix=READOBJ %s
-// RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+// RUN: llvm-objdump -s -j .amdhsa.kd %t | FileCheck --check-prefix=OBJDUMP %s
// READOBJ: Section Headers
// READOBJ: .text PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} {{[0-9a-f]+}} {{[0-9]+}} AX {{[0-9]+}} {{[0-9]+}} 256
-// READOBJ: .rodata PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} 000100 {{[0-9]+}} A {{[0-9]+}} {{[0-9]+}} 64
+// READOBJ: .rodata PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} 000000 {{[0-9]+}} A {{[0-9]+}} {{[0-9]+}} 64
+// READOBJ: .amdhsa.kd PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} 000100 {{[0-9]+}} AR {{[0-9]+}} {{[0-9]+}} 8
-// READOBJ: Relocation section '.rela.rodata' at offset
+// READOBJ: Relocation section '.rela.amdhsa.kd' at o...
[truncated]
|
This seems to be an ABI change and likely affect tools. It seems like we would need a new code object version. |
Unlikely, all the tools I've looked at did a brute force
Which is unaffected by this patch. However, once we have a new code object version that also contains this code, we can change the runtime to instead memcpy this section without scanning the whole symbol table, that'll save a handful of cycles. |
Do we need ROCm runtime change in place before this change? |
We can advertise this in the next version to get tools to use the section instead of relying on the symbol names, but I don't think we need to leave LTO broken for the current versions |
I don't think so, but I'll do some more tests in the meantime. |
Hm, this does seem to cause some issues with |
The call to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also add this to the release notes
@@ -980,7 +980,12 @@ void AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor( | |||
if (KernelCodeSymbol->getVisibility() == ELF::STV_DEFAULT) | |||
KernelCodeSymbol->setVisibility(ELF::STV_PROTECTED); | |||
|
|||
Streamer.pushSection(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This functions seems to be called during parser time. Otherwise, if it's only called at finish
time`, pushSection/popSection would be unnecessary
Sorry this one got away from me, it seems that this influences something weird inside of HSA. The actual metadata lookup that gets these sections works fine, but then it has issues when it tries to launch the kernel. I don't think there's a way to solve this in a reasonable amount of time with the cycles I have as it will likely require updating the runtime. We might need to do what was suggested prior and just emit some dummy relocations for these, or just mark the entire data section as |
Darn, understandable. Dummy relocations or GNU_RETAIN sound reasonable and I'm sure would help me out. My workaround today is either to patch one of the changes in or glob all my files together into a single compilation unit so I don't need LTO - anything is better than that :) |
Summary:
This patch emits all kernel descriptors to a new section rather than
mixing them in with the other symbols in
.rodata
or.text
. Theadvantage to this approach is that it will allow the runtime to memcpy
the whole section rather than copying individual symbols whose names end
in
.kd
.Additionally, this fixes an issue where
--gc-sections
will trim thesesymbols as they are not used by anything and simply exist as metadata.
Adding the
SHF_GNU_RETAIN
prevents this entirely.Fixes: #119479