Skip to content

lld produces broken executable with CUDA #30572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ismail opened this issue Dec 1, 2016 · 32 comments
Closed

lld produces broken executable with CUDA #30572

ismail opened this issue Dec 1, 2016 · 32 comments
Labels
bugzilla Issues migrated from bugzilla lld wontfix Issue is real, but we can't or won't fix it. Not invalid

Comments

@ismail
Copy link
Contributor

ismail commented Dec 1, 2016

Bugzilla Link 31224
Resolution WONTFIX
Resolved on Nov 21, 2020 10:21
Version unspecified
OS Linux
Attachments Crashing executable
CC @hfinkel,@MaskRay,@orivej,@rui314

Extended Description

havana ~/Downloads > clang-4.0 -v
openSUSE Linux clang version 4.0.0 (trunk 288322) (based on LLVM 4.0.0svn)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /suse/idoenmez/bin
Found candidate GCC installation: /usr/lib64/gcc/x86_64-suse-linux/6
Selected GCC installation: /usr/lib64/gcc/x86_64-suse-linux/6
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64

/opt/clang/bin/clang++ --cuda-gpu-arch=sm_50 --cuda-path=/havana/cuda-8.0 axpy.cu -fuse-ld=lld -L/havana/cuda-8.0/lib64 -lcudart_static -ldl -lrt -pthread -Wl,-rpath,/opt/clang/lib64 -Wl,-rpath,/havana/cuda-8.0/lib64

./a.out
zsh: segmentation fault (core dumped) ./a.out

Same works with gold (or bfd too):

/opt/clang/bin/clang++ --cuda-gpu-arch=sm_50 --cuda-path=/havana/cuda-8.0 axpy.cu -fuse-ld=gold -L/havana/cuda-8.0/lib64 -lcudart_static -ldl -lrt -pthread -Wl,-rpath,/opt/clang/lib64 -Wl,-rpath,/havana/cuda-8.0/lib64
./a.out
y[0] = 2
y[1] = 4
y[2] = 6
y[3] = 8

Attached is the produced binary.

@rui314
Copy link
Member

rui314 commented Dec 3, 2016

Can you attach a reproduce file? Add -Wl,--reproduce,repro to your command line, then the linker will create repro.cpio containing all input files. The cpio is an uncompressed archive format, so please gzip before attaching.

@ismail
Copy link
Contributor Author

ismail commented Dec 3, 2016

repro.cpio

@rui314
Copy link
Member

rui314 commented Dec 4, 2016

Realized that in order to see the bug, I had to run the executable, but I don't run an executable I downloaded from the internet. Could you attach source code?

@ismail
Copy link
Contributor Author

ismail commented Dec 4, 2016

Realized that in order to see the bug, I had to run the executable, but I
don't run an executable I downloaded from the internet. Could you attach
source code?

Well running random code off the internet is bad too :) Save https://gist.github.com/anonymous/855e277884eb6b388cd2f00d956c2fd4 axpy.cu which actually comes from http://llvm.org/docs/CompileCudaWithLLVM.html

@rui314
Copy link
Member

rui314 commented Dec 4, 2016

Thanks. I took a look at the source code as well as the object file but didn't find anything apparently wrong. Investigating it more is probably hard for me because I don't have a build environment for CUDA. Do you think you can debug?

@ismail
Copy link
Contributor Author

ismail commented Dec 4, 2016

It crashes in closed source cuda runtime code:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000232918 in cudart::globalState::registerFatBinary(void***, void*) ()
(gdb) bt
#​0 0x0000000000232918 in cudart::globalState::registerFatBinary(void***, void*) ()
#​1 0x0000000000217e58 in __cudaRegisterFatBinary ()
#​2 0x0000000000216467 in __cuda_module_ctor () at /havana/cuda-8.0/include/cuda_runtime.h:545
#​3 0x000000000026717d in __libc_csu_init (argc=1, argv=0x7ffe4d4bceb8, envp=0x7ffe4d4bcec8) at elf-init.c:88
#​4 0x00007f78018f7220 in __libc_start_main () from /lib64/libc.so.6
#​5 0x000000000021602a in _start () at ../sysdeps/x86_64/start.S:120

I am not sure how to debug that.

@hfinkel
Copy link
Collaborator

hfinkel commented Dec 5, 2016

It crashes in closed source cuda runtime code:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000232918 in cudart::globalState::registerFatBinary(void***, void*)
()
(gdb) bt
#​0 0x0000000000232918 in cudart::globalState::registerFatBinary(void***,
void*) ()
#​1 0x0000000000217e58 in __cudaRegisterFatBinary ()
#​2 0x0000000000216467 in __cuda_module_ctor () at
/havana/cuda-8.0/include/cuda_runtime.h:545
#​3 0x000000000026717d in __libc_csu_init (argc=1, argv=0x7ffe4d4bceb8,
envp=0x7ffe4d4bcec8) at elf-init.c:88
#​4 0x00007f78018f7220 in __libc_start_main () from /lib64/libc.so.6
#​5 0x000000000021602a in _start () at ../sysdeps/x86_64/start.S:120

I am not sure how to debug that.

Justin, can you help with this?

@llvmbot
Copy link
Member

llvmbot commented Dec 5, 2016

It crashes in closed source cuda runtime code:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000232918 in cudart::globalState::registerFatBinary(void***, void*)
()
(gdb) bt
#​0 0x0000000000232918 in cudart::globalState::registerFatBinary(void***,
void*) ()
#​1 0x0000000000217e58 in __cudaRegisterFatBinary ()
#​2 0x0000000000216467 in __cuda_module_ctor () at
/havana/cuda-8.0/include/cuda_runtime.h:545
#​3 0x000000000026717d in __libc_csu_init (argc=1, argv=0x7ffe4d4bceb8,
envp=0x7ffe4d4bcec8) at elf-init.c:88
#​4 0x00007f78018f7220 in __libc_start_main () from /lib64/libc.so.6
#​5 0x000000000021602a in _start () at ../sysdeps/x86_64/start.S:120

I am not sure how to debug that.

Justin, can you help with this?

I'm not familiar with cudart internals, but at first glance it looks like it may be related to global ctors that cudart uses for initialization.

@ismail
Copy link
Contributor Author

ismail commented Jun 13, 2017

Not sure which revision fixed this but r305278 works fine now. Thanks!

@orivej
Copy link
Contributor

orivej commented Jul 13, 2017

This bug is not fixed yet. Here is an example of a constructor like those used by CUDA that is not called when linked with LLD: https://gist.github.com/orivej/29b9834e4621f2c69bfddf0bfc1baa1f
In this example main.ld is stopped with "Trace/breakpoint trap", but main.lld finishes without calling ctor.

@llvmbot
Copy link
Member

llvmbot commented Jul 13, 2017

This bug is not fixed yet. Here is an example of a constructor like those
used by CUDA that is not called when linked with LLD:
https://gist.github.com/orivej/29b9834e4621f2c69bfddf0bfc1baa1f
In this example main.ld is stopped with "Trace/breakpoint trap", but
main.lld finishes without calling ctor.

Can you provide a .tar created with --reproduce?

@orivej
Copy link
Contributor

orivej commented Jul 13, 2017

--reproduce archive with libc6=2.23-0ubuntu9 on Ubuntu 16: https://s3.amazonaws.com/orivej/bugs/llvm/31224/inputs.tar.xz

@orivej
Copy link
Contributor

orivej commented Jul 13, 2017

I have updated the example at https://gist.github.com/orivej/29b9834e4621f2c69bfddf0bfc1baa1f to be independent from libc.

@orivej
Copy link
Contributor

orivej commented Jul 14, 2017

Here is how ld decided to put .ctors into .init_array: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46770
Here is their test case: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/testsuite/ld-elf/init-mixed.c;h=f401ded4d702be49669ecdc7893f65cc70e9fa7c;hb=HEAD
Here crtbegin.o dropped support for .ctors: gcc-mirror/gcc@ef1da80

@orivej
Copy link
Contributor

orivej commented Jul 14, 2017

Here is the current version of cudart [1]; it uses .ctors in usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudart_static.a

[1] http://deb.rug.nl/ppa/mirror/developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-cudart-dev-8-0_8.0.61-1_amd64.deb

@llvmbot
Copy link
Member

llvmbot commented Jul 14, 2017

I have updated the example at
https://gist.github.com/orivej/29b9834e4621f2c69bfddf0bfc1baa1f to be
independent from libc.

So issue happens because sample code assumes that .ctors will be placed into .init_array,
like ld.bfd do, but LLD does not do merging of them and emits .ctors output section as well as .init_array section separatelly.

I cannot call it a bug, that was implemented intentionally initially in LLD I think. And sample code relies on a specific implementation of bfd which is just different from LLD.

I would be happy to work on this one if we deside we want to mimic bfd behavior here though. Should we ?

@orivej
Copy link
Contributor

orivej commented Jul 14, 2017

It is true that for LLD this is rather a missing feature than a bug. However, it causes programs linked with LLD to fail, it is difficult to debug, and crtbegin.o normally assumes that linker moves .ctors into .init_array and does not handle them.

When gcc was switching from .ctors to .init_array, they had to update the linker with the rationale that also binds future linkers such as LLD:

My opinion is that we can't switch to .init_array unless we either (a)
make the linker detect the problem and fix it, or (b) at least make the
linker detect the problem and issue an error. I do not think a
warning is sufficient.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46770#c46

gold provides a toggle --ctors-in-init-array (default) / --no-ctors-in-init-array: http://manpages.ubuntu.com/manpages/precise/man1/ld.1.html

@rui314
Copy link
Member

rui314 commented Jul 14, 2017

I wonder why are you still using .ctors/.dtors. .{init,fini}_array were invented almost 20 years ago and pretty much everybody is using them now instead of .ctors/.dtors. I do not see a reason to choose .ctors/.dtors when creating something new for CUDA.

@orivej
Copy link
Contributor

orivej commented Jul 15, 2017

This is not about my code: Nvidia ships static libraries without source code that use .ctors, see #30572 #c15 for an example.

@orivej
Copy link
Contributor

orivej commented Jul 16, 2017

Implement --ctors-in-init-array
This patch implements --ctors-in-init-array and passes init-mixed test [1]. I'm not sure how to rewrite this test to incorporate into LLD test suite.

[1] https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/testsuite/ld-elf/init-mixed.c;h=f401ded4d702be49669ecdc7893f65cc70e9fa7c;hb=HEAD

@orivej
Copy link
Contributor

orivej commented Jul 16, 2017

Implement --ctors-in-init-array
Added a test based on init-mixed.c and ctors_dtors_priority.s

@orivej
Copy link
Contributor

orivej commented Jul 16, 2017

Implement --ctors-in-init-array
Updated based on gold source code

@orivej
Copy link
Contributor

orivej commented Jul 16, 2017

Implement --ctors-in-init-array
Refactored the patch to map .ctors to .init_array in ELF/Writer.

@orivej
Copy link
Contributor

orivej commented Jul 16, 2017

Implement --ctors-in-init-array
Ensure correct output section type (SHT_INIT_ARRAY or SHT_FINI_ARRAY).

@llvmbot
Copy link
Member

llvmbot commented Jul 17, 2017

I suggest to represent .init_array/.fini_array sections as synthetic:
https://reviews.llvm.org/D35487
With that it should be easy to mix them together with .ctors/.dtors (after one more patch for those).

@rui314
Copy link
Member

rui314 commented Jul 17, 2017

Orivej,

What you are doing seems basically correct, but it wasn't written in lld-ish way. We have SyntheticSection data structure to represent virtual sections.

Please take a look at this: https://reviews.llvm.org/D35509

This patch is incomplete. If you want me to finish this up, I'll do that for you. Or you can take it over.

@llvmbot
Copy link
Member

llvmbot commented Jul 24, 2017

This is not about my code: Nvidia ships static libraries without source code
that use .ctors, see #30572 #c15 for an
example.

Do you know how to report a bug to nvidia on this? Even if lld does get support for converting .ctors to .init_array it would be good to try to drop it some time in the future.

@MaskRay
Copy link
Member

MaskRay commented Feb 3, 2020

Looks like a wontfix.

FWIW https://reviews.llvm.org/D71434 I changed clang to not use .ctors/.dtors on generic ELF platforms.

@orivej
Copy link
Contributor

orivej commented Nov 21, 2020

Implement --ctors-in-init-array for LLD 11
CUDA 11 has finally switched from .ctors to .init_array!
Yet in order to support older CUDAs for a while I've ported my patch to LLD 11 (and fixed the sorting order of sections — the previous patch has sorted .ctors.64534 as .init_array.64534 rather than as .init_array.1001). (I do not propose to merge this patch into LLD due to the lack of interested users.)

@orivej
Copy link
Contributor

orivej commented Nov 21, 2020

Self-contained C test

@orivej
Copy link
Contributor

orivej commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#44698

@MaskRay
Copy link
Member

MaskRay commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#48096

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla lld wontfix Issue is real, but we can't or won't fix it. Not invalid
Projects
None yet
Development

No branches or pull requests

7 participants