AMDGPUTargetStreamer generates .kd symbols, breaking LTO requirement, may be discarded by --gc-sections

The caching added in https://github.com/llvm/llvm-project/commit/3733ed6f1c6b0eef1e13e175ac81ad309fc0b080 by @MaskRay seems to have broken LTO and `--gc-sections` for certain use cases. Specifically the change made in `lld/ELF/MarkLive.cpp` to use the cached `isExported` value instead of calling `includeInDynsym` seems to be causing additional sections to be dropped that should not be (or at least were not before the change): https://github.com/llvm/llvm-project/commit/3733ed6f1c6b0eef1e13e175ac81ad309fc0b080#diff-3c88c62d912008cc04f796b330a035ecda925645264eaef43185ad43991cb8e9L224)

The AMDGPU target inserts special kernel descriptor object symbols that must be preserved into the final ELF for the runtime to load. These match any exported kernel in name with a `.kd` suffix and are emitted by `AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor`. Prior to the referenced commit these symbols existed and after they don't.

By reverting the mentioned line in `MarkLive.cpp` the original behavior is restored. I'm not familiar with the codebase but I suspect `isExported` is not initialized or not safe to cache at that location.

The following repro shows the issue (`lld_lto_bug.c`):
```c
[[clang::amdgpu_kernel, gnu::visibility("protected")]] void some_kernel(int n) {
  //
}
```
compiled using
```sh
$ clang \
  -x c -std=c23 \
  -target amdgcn-amd-amdhsa -march=gfx1100 \
  -nogpulib \
  -fgpu-rdc \
  -fno-ident \
  -fvisibility=hidden \
  -O3 \
  lld_lto_bug.c \
  -c -emit-llvm -o lld_lto_bug.bc
```

or since bc files cannot be attached:
```ll
; ModuleID = 'lld_lto_bug.bc'
source_filename = "lld_lto_bug.c"
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
target triple = "amdgcn-amd-amdhsa"

@__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 500

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
define protected amdgpu_kernel void @some_kernel(i32 noundef %n) local_unnamed_addr #0 {
entry:
  ret void
}

attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx1100" "target-features"="+16-bit-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot10-insts,+dot12-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32" "uniform-work-group-size"="false" }

!llvm.module.flags = !{!0, !1, !2}

!0 = !{i32 1, !"amdhsa_code_object_version", i32 500}
!1 = !{i32 1, !"wchar_size", i32 4}
!2 = !{i32 8, !"PIC Level", i32 2}
```

Linking with LTO and gc-sections:
```sh
lld \
  -flavor gnu \
  -m elf64_amdgpu \
  -shared \
  -plugin-opt=mcpu=gfx1100 \
  -plugin-opt=O3 \
  --lto-CGO3 \
  --gc-sections \
  --print-gc-sections \
  --strip-debug \
  --discard-all \
  --discard-locals \
  -o lld_lto_bug.so \
  lld_lto_bug.bc
```

Before the commit this will print the expected output (no removal of the rodata):
```console
removing unused section lld_lto_bug_patched.so.lto.o:(.text)
```

After the commit with the regression removing the rodata:
```console
removing unused section lld_lto_bug.so.lto.o:(.text)
removing unused section lld_lto_bug.so.lto.o:(.rodata)
```

This can be verified with llvm-readelf as before:
```console
Symbol table '.dynsym' contains 3 entries:
   Num:    Value          Size Type    Bind   Vis       Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   UND 
     1: 0000000000001500     4 FUNC    GLOBAL PROTECTED   7 some_kernel
     2: 0000000000000480    64 OBJECT  GLOBAL PROTECTED   6 some_kernel.kd
```
The `some_kernel.kd` `OBJECT` is what is required at runtime to use the ELF.

And after:
```console
Symbol table '.dynsym' contains 2 entries:
   Num:    Value          Size Type    Bind   Vis       Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   UND 
     1: 0000000000001500     4 FUNC    GLOBAL PROTECTED   6 some_kernel
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMDGPUTargetStreamer generates .kd symbols, breaking LTO requirement, may be discarded by --gc-sections #119479

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AMDGPUTargetStreamer generates .kd symbols, breaking LTO requirement, may be discarded by --gc-sections #119479

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions