lld produces broken executable with CUDA #30572

ismail · 2016-12-01T16:30:18Z


Bugzilla Link	31224
Resolution	WONTFIX
Resolved on	Nov 21, 2020 10:21
Version	unspecified
OS	Linux
Attachments	Crashing executable
CC	@hfinkel,@MaskRay,@orivej,@rui314

Extended Description

havana ~/Downloads > clang-4.0 -v
openSUSE Linux clang version 4.0.0 (trunk 288322) (based on LLVM 4.0.0svn)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /suse/idoenmez/bin
Found candidate GCC installation: /usr/lib64/gcc/x86_64-suse-linux/6
Selected GCC installation: /usr/lib64/gcc/x86_64-suse-linux/6
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64

/opt/clang/bin/clang++ --cuda-gpu-arch=sm_50 --cuda-path=/havana/cuda-8.0 axpy.cu -fuse-ld=lld -L/havana/cuda-8.0/lib64 -lcudart_static -ldl -lrt -pthread -Wl,-rpath,/opt/clang/lib64 -Wl,-rpath,/havana/cuda-8.0/lib64

./a.out
zsh: segmentation fault (core dumped) ./a.out

Same works with gold (or bfd too):

/opt/clang/bin/clang++ --cuda-gpu-arch=sm_50 --cuda-path=/havana/cuda-8.0 axpy.cu -fuse-ld=gold -L/havana/cuda-8.0/lib64 -lcudart_static -ldl -lrt -pthread -Wl,-rpath,/opt/clang/lib64 -Wl,-rpath,/havana/cuda-8.0/lib64
./a.out
y[0] = 2
y[1] = 4
y[2] = 6
y[3] = 8

Attached is the produced binary.

rui314 · 2016-12-03T02:16:05Z

Can you attach a reproduce file? Add -Wl,--reproduce,repro to your command line, then the linker will create repro.cpio containing all input files. The cpio is an uncompressed archive format, so please gzip before attaching.

ismail · 2016-12-03T08:56:31Z

repro.cpio

rui314 · 2016-12-04T18:52:35Z

Realized that in order to see the bug, I had to run the executable, but I don't run an executable I downloaded from the internet. Could you attach source code?

ismail · 2016-12-04T18:54:03Z

Realized that in order to see the bug, I had to run the executable, but I
don't run an executable I downloaded from the internet. Could you attach
source code?

Well running random code off the internet is bad too :) Save https://gist.github.com/anonymous/855e277884eb6b388cd2f00d956c2fd4 axpy.cu which actually comes from http://llvm.org/docs/CompileCudaWithLLVM.html

rui314 · 2016-12-04T19:30:00Z

Thanks. I took a look at the source code as well as the object file but didn't find anything apparently wrong. Investigating it more is probably hard for me because I don't have a build environment for CUDA. Do you think you can debug?

ismail · 2016-12-04T19:44:00Z

It crashes in closed source cuda runtime code:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000232918 in cudart::globalState::registerFatBinary(void***, void*) ()
(gdb) bt
#0 0x0000000000232918 in cudart::globalState::registerFatBinary(void***, void*) ()
#1 0x0000000000217e58 in __cudaRegisterFatBinary ()
#2 0x0000000000216467 in __cuda_module_ctor () at /havana/cuda-8.0/include/cuda_runtime.h:545
#3 0x000000000026717d in __libc_csu_init (argc=1, argv=0x7ffe4d4bceb8, envp=0x7ffe4d4bcec8) at elf-init.c:88
#4 0x00007f78018f7220 in __libc_start_main () from /lib64/libc.so.6
#5 0x000000000021602a in _start () at ../sysdeps/x86_64/start.S:120

I am not sure how to debug that.

hfinkel · 2016-12-05T04:51:00Z

It crashes in closed source cuda runtime code:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000232918 in cudart::globalState::registerFatBinary(void***, void*)
()
(gdb) bt
#0 0x0000000000232918 in cudart::globalState::registerFatBinary(void***,
void*) ()
#1 0x0000000000217e58 in __cudaRegisterFatBinary ()
#2 0x0000000000216467 in __cuda_module_ctor () at
/havana/cuda-8.0/include/cuda_runtime.h:545
#3 0x000000000026717d in __libc_csu_init (argc=1, argv=0x7ffe4d4bceb8,
envp=0x7ffe4d4bcec8) at elf-init.c:88
#4 0x00007f78018f7220 in __libc_start_main () from /lib64/libc.so.6
#5 0x000000000021602a in _start () at ../sysdeps/x86_64/start.S:120

I am not sure how to debug that.

Justin, can you help with this?

llvmbot · 2016-12-05T15:28:43Z

It crashes in closed source cuda runtime code:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000232918 in cudart::globalState::registerFatBinary(void***, void*)
()
(gdb) bt
#0 0x0000000000232918 in cudart::globalState::registerFatBinary(void***,
void*) ()
#1 0x0000000000217e58 in __cudaRegisterFatBinary ()
#2 0x0000000000216467 in __cuda_module_ctor () at
/havana/cuda-8.0/include/cuda_runtime.h:545
#3 0x000000000026717d in __libc_csu_init (argc=1, argv=0x7ffe4d4bceb8,
envp=0x7ffe4d4bcec8) at elf-init.c:88
#4 0x00007f78018f7220 in __libc_start_main () from /lib64/libc.so.6
#5 0x000000000021602a in _start () at ../sysdeps/x86_64/start.S:120

I am not sure how to debug that.

Justin, can you help with this?

I'm not familiar with cudart internals, but at first glance it looks like it may be related to global ctors that cudart uses for initialization.

ismail · 2017-06-13T12:36:14Z

Not sure which revision fixed this but r305278 works fine now. Thanks!

orivej · 2017-07-13T15:59:47Z

This bug is not fixed yet. Here is an example of a constructor like those used by CUDA that is not called when linked with LLD: https://gist.github.com/orivej/29b9834e4621f2c69bfddf0bfc1baa1f
In this example main.ld is stopped with "Trace/breakpoint trap", but main.lld finishes without calling ctor.

llvmbot · 2017-07-13T16:05:48Z

This bug is not fixed yet. Here is an example of a constructor like those
used by CUDA that is not called when linked with LLD:
https://gist.github.com/orivej/29b9834e4621f2c69bfddf0bfc1baa1f
In this example main.ld is stopped with "Trace/breakpoint trap", but
main.lld finishes without calling ctor.

Can you provide a .tar created with --reproduce?

orivej · 2017-07-13T20:11:37Z

--reproduce archive with libc6=2.23-0ubuntu9 on Ubuntu 16: https://s3.amazonaws.com/orivej/bugs/llvm/31224/inputs.tar.xz

orivej · 2017-07-13T21:27:52Z

I have updated the example at https://gist.github.com/orivej/29b9834e4621f2c69bfddf0bfc1baa1f to be independent from libc.

orivej · 2017-07-14T04:12:29Z

Here is how ld decided to put .ctors into .init_array: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46770
Here is their test case: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/testsuite/ld-elf/init-mixed.c;h=f401ded4d702be49669ecdc7893f65cc70e9fa7c;hb=HEAD
Here crtbegin.o dropped support for .ctors: gcc-mirror/gcc@ef1da80

orivej · 2017-07-14T04:26:40Z

Here is the current version of cudart [1]; it uses .ctors in usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudart_static.a

[1] http://deb.rug.nl/ppa/mirror/developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-cudart-dev-8-0_8.0.61-1_amd64.deb

llvmbot · 2017-07-14T10:00:36Z

I have updated the example at
https://gist.github.com/orivej/29b9834e4621f2c69bfddf0bfc1baa1f to be
independent from libc.

So issue happens because sample code assumes that .ctors will be placed into .init_array,
like ld.bfd do, but LLD does not do merging of them and emits .ctors output section as well as .init_array section separatelly.

I cannot call it a bug, that was implemented intentionally initially in LLD I think. And sample code relies on a specific implementation of bfd which is just different from LLD.

I would be happy to work on this one if we deside we want to mimic bfd behavior here though. Should we ?

orivej · 2017-07-14T12:32:36Z

It is true that for LLD this is rather a missing feature than a bug. However, it causes programs linked with LLD to fail, it is difficult to debug, and crtbegin.o normally assumes that linker moves .ctors into .init_array and does not handle them.

When gcc was switching from .ctors to .init_array, they had to update the linker with the rationale that also binds future linkers such as LLD:

My opinion is that we can't switch to .init_array unless we either (a)
make the linker detect the problem and fix it, or (b) at least make the
linker detect the problem and issue an error. I do not think a
warning is sufficient.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46770#c46

gold provides a toggle --ctors-in-init-array (default) / --no-ctors-in-init-array: http://manpages.ubuntu.com/manpages/precise/man1/ld.1.html

rui314 · 2017-07-14T19:48:31Z

I wonder why are you still using .ctors/.dtors. .{init,fini}_array were invented almost 20 years ago and pretty much everybody is using them now instead of .ctors/.dtors. I do not see a reason to choose .ctors/.dtors when creating something new for CUDA.

orivej · 2017-07-15T04:44:39Z

This is not about my code: Nvidia ships static libraries without source code that use .ctors, see #30572 #c15 for an example.

orivej · 2017-07-16T09:17:52Z

Implement --ctors-in-init-array
This patch implements --ctors-in-init-array and passes init-mixed test [1]. I'm not sure how to rewrite this test to incorporate into LLD test suite.

[1] https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/testsuite/ld-elf/init-mixed.c;h=f401ded4d702be49669ecdc7893f65cc70e9fa7c;hb=HEAD

orivej · 2017-07-16T10:59:11Z

Implement --ctors-in-init-array
Added a test based on init-mixed.c and ctors_dtors_priority.s

orivej · 2017-07-16T12:12:04Z

Implement --ctors-in-init-array
Updated based on gold source code

orivej · 2017-07-16T13:37:07Z

Implement --ctors-in-init-array
Refactored the patch to map .ctors to .init_array in ELF/Writer.

orivej · 2017-07-16T15:22:53Z

Implement --ctors-in-init-array
Ensure correct output section type (SHT_INIT_ARRAY or SHT_FINI_ARRAY).

llvmbot · 2017-07-17T15:25:59Z

I suggest to represent .init_array/.fini_array sections as synthetic:
https://reviews.llvm.org/D35487
With that it should be easy to mix them together with .ctors/.dtors (after one more patch for those).

rui314 · 2017-07-17T21:31:20Z

Orivej,

What you are doing seems basically correct, but it wasn't written in lld-ish way. We have SyntheticSection data structure to represent virtual sections.

Please take a look at this: https://reviews.llvm.org/D35509

This patch is incomplete. If you want me to finish this up, I'll do that for you. Or you can take it over.

llvmbot · 2017-07-24T23:02:32Z

This is not about my code: Nvidia ships static libraries without source code
that use .ctors, see #30572 #c15 for an
example.

Do you know how to report a bug to nvidia on this? Even if lld does get support for converting .ctors to .init_array it would be good to try to drop it some time in the future.

MaskRay · 2020-02-03T01:23:32Z

Looks like a wontfix.

FWIW https://reviews.llvm.org/D71434 I changed clang to not use .ctors/.dtors on generic ELF platforms.

orivej · 2020-11-21T18:19:57Z

Implement --ctors-in-init-array for LLD 11
CUDA 11 has finally switched from .ctors to .init_array!
Yet in order to support older CUDAs for a while I've ported my patch to LLD 11 (and fixed the sorting order of sections — the previous patch has sorted .ctors.64534 as .init_array.64534 rather than as .init_array.1001). (I do not propose to merge this patch into LLD due to the lack of interested users.)

orivej · 2020-11-21T18:21:43Z

Self-contained C test

orivej · 2021-11-27T02:49:05Z

mentioned in issue llvm/llvm-bugzilla-archive#44698

MaskRay · 2021-11-27T03:46:55Z

mentioned in issue llvm/llvm-bugzilla-archive#48096

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

Quuxplusone added the wontfix Issue is real, but we can't or won't fix it. Not invalid label Jan 20, 2022

christopherbate mentioned this issue Nov 8, 2023

lld silently creates an non-working executable if both .ctors and .init_array exist #68071

Closed

lqd mentioned this issue Jan 24, 2025

Linking with rust-lld causes SIGSEGV in FFI code rust-lang/rust#128286

Open

lqd mentioned this issue Apr 30, 2025

Use lld by default on x86_64-unknown-linux-gnu stable rust-lang/rust#140525

Open

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lld produces broken executable with CUDA #30572

lld produces broken executable with CUDA #30572

ismail commented Dec 1, 2016

rui314 commented Dec 3, 2016

ismail commented Dec 3, 2016

rui314 commented Dec 4, 2016

ismail commented Dec 4, 2016

rui314 commented Dec 4, 2016

ismail commented Dec 4, 2016

hfinkel commented Dec 5, 2016

llvmbot commented Dec 5, 2016

ismail commented Jun 13, 2017

orivej commented Jul 13, 2017

llvmbot commented Jul 13, 2017

orivej commented Jul 13, 2017

orivej commented Jul 13, 2017

orivej commented Jul 14, 2017

orivej commented Jul 14, 2017

llvmbot commented Jul 14, 2017

orivej commented Jul 14, 2017

rui314 commented Jul 14, 2017

orivej commented Jul 15, 2017

orivej commented Jul 16, 2017

orivej commented Jul 16, 2017

orivej commented Jul 16, 2017

orivej commented Jul 16, 2017

orivej commented Jul 16, 2017

llvmbot commented Jul 17, 2017

rui314 commented Jul 17, 2017

llvmbot commented Jul 24, 2017

MaskRay commented Feb 3, 2020

orivej commented Nov 21, 2020

orivej commented Nov 21, 2020

orivej commented Nov 27, 2021

MaskRay commented Nov 27, 2021

lld produces broken executable with CUDA #30572

lld produces broken executable with CUDA #30572

Comments

ismail commented Dec 1, 2016

Extended Description

rui314 commented Dec 3, 2016

ismail commented Dec 3, 2016

rui314 commented Dec 4, 2016

ismail commented Dec 4, 2016

rui314 commented Dec 4, 2016

ismail commented Dec 4, 2016

hfinkel commented Dec 5, 2016

llvmbot commented Dec 5, 2016

ismail commented Jun 13, 2017

orivej commented Jul 13, 2017

llvmbot commented Jul 13, 2017

orivej commented Jul 13, 2017

orivej commented Jul 13, 2017

orivej commented Jul 14, 2017

orivej commented Jul 14, 2017

llvmbot commented Jul 14, 2017

orivej commented Jul 14, 2017

rui314 commented Jul 14, 2017

orivej commented Jul 15, 2017

orivej commented Jul 16, 2017

orivej commented Jul 16, 2017

orivej commented Jul 16, 2017

orivej commented Jul 16, 2017

orivej commented Jul 16, 2017

llvmbot commented Jul 17, 2017

rui314 commented Jul 17, 2017

llvmbot commented Jul 24, 2017

MaskRay commented Feb 3, 2020

orivej commented Nov 21, 2020

orivej commented Nov 21, 2020

orivej commented Nov 27, 2021

MaskRay commented Nov 27, 2021