Skip to content

Linking with lld stopped working on linux/Android since lld 13 and this repo enabled --gc-sections #5698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
finagolfin opened this issue Jul 30, 2022 · 23 comments · Fixed by #5707
Labels

Comments

@finagolfin
Copy link
Member

finagolfin commented Jul 30, 2022

Description

After dead-stripping was enabled in #4135, various people reported problems with using lld, though it wouldn't always show up. It appears to be some change with lld 13, as lld 12 works fine, perhaps the commit @drodriguez linked (I'll try reverting that commit and report back later).

Obviously, this is really a regression in how Swift interacts with lld 13 and 14, but I'm reporting it here first as we'll probably want to amend SPM 5.7 quickly before the underlying lld issue is figured out. @keith, maybe we should disable stripping on ELF if the linker is lld for now?

@3405691582, can you reproduce on OpenBSD?

Expected behavior

Linking to work fine with lld, as it does with gold.

Actual behavior

Several linker errors like this show up instead:

ld.lld: error: undefined hidden symbol: __start_swift5_protocols
>>> referenced by SwiftRT-ELF.cpp
>>>               /home/butta/swift-DEVELOPMENT-SNAPSHOT-2022-07-25-a-ubuntu20.04/usr/lib/swift/linux/x86_64/swiftrt.o (swift_image_constructor())

Steps to reproduce

  1. Clone swift-argument-parser
  2. Remove a file because of some unrelated bug: rm Sources/ArgumentParserTestHelpers/TestHelpers.swift
  3. Build the package in release mode with lld: ../swift-DEVELOPMENT-SNAPSHOT-2022-07-25-a-ubuntu20.04/usr/bin/swift build -v -c release -Xswiftc -use-ld=lld

Swift Package Manager version/commit hash

latest Jul. 25 trunk snapshot and 5.7 Jul. 23 snapshot

Swift & OS version (output of swift --version && uname -a)

Swift 5.8 and 5.7 on Ubuntu 20.04 x86_64

@finagolfin finagolfin added the bug label Jul 30, 2022
@finagolfin
Copy link
Member Author

I can confirm that reverting llvm/llvm-project@6d2d3bd fixes this issue. We're going to have to adapt that lld ELF commit for Swift.

@keith
Copy link
Member

keith commented Aug 2, 2022

Here's a fix for this, it's not fully complete yet because the problem is the R directive I'm adding here wasn't supported until clang 13+, and binutils 2.36, so I think I need to conditionalize this somehow based on those versions. swiftlang/swift#60357

Can you test with your real world example to make sure this works there too?

@keith
Copy link
Member

keith commented Aug 2, 2022

I think disabling --gc-sections if we need to as a safer fix is fine. We can leave -dead_strip on darwin in that case still

@keith
Copy link
Member

keith commented Aug 2, 2022

For future reference for others the default change that introduced the issue here was highly controversial and I believe also broke rust, here is some further reading:

@finagolfin
Copy link
Member Author

Here's a fix for this, it's not fully complete yet because the problem is the R directive I'm adding here wasn't supported until clang 13+, and binutils 2.36, so I think I need to conditionalize this somehow based on those versions

Yikes, but since Swift 5.7 and trunk both use clang/lld 13+, maybe that's enough?

Can you test with your real world example to make sure this works there too?

Just tried it natively on Android, can confirm your pull fixes the problem with lld 14.

I think disabling --gc-sections if we need to as a safer fix is fine.

It all depends if we can get your fix in quickly before the 5.7 release. If we can, no problem. If not, probably best to disable --gc-sections for 5.7 till we can get your fix into 5.7, then maybe re-enable that flag in SPM later with a 5.7 patch release.

@compnerd and @tomerd, what do you think?

@keith
Copy link
Member

keith commented Aug 2, 2022

Yikes, but since Swift 5.7 and trunk both use clang/lld 13+, maybe that's enough?

The problem is that if the host compiler isn't up to date you can't build swift itself. I asked and it doesn't seem like there's an official matrix of supported versions here. This issue is really the worse case scenario for this IMO because you could theoretically use a GCC to build swift and then lld to link your swift program with the compiler, in which case this fix still wouldn't work.

@drodriguez
Copy link
Contributor

Around February I tried the "R" in our internal builds, and while it fixed many things, it didn't fix all (I commented about it in #4135 (comment)). I haven't tried since then, and maybe something in the environment made my testing not be successful, but I wanted to point it out again. I really hope for something like that to work, but it might not be enough.

@keith
Copy link
Member

keith commented Aug 2, 2022

Do you happen to have any of the example failures around still?

@finagolfin
Copy link
Member Author

The problem is that if the host compiler isn't up to date you can't build swift itself.

Your fix is in the stdlib, which already has to be built by the Swift-forked clang, so the host compiler should be irrelevant.

This issue is really the worse case scenario for this IMO because you could theoretically use a GCC to build swift and then lld to link your swift program with the compiler, in which case this fix still wouldn't work.

I don't think we can worry about all that, particularly since Swift concurrency already requires compilation by the Swift-forked clang.

I haven't tried since then, and maybe something in the environment made my testing not be successful, but I wanted to point it out again. I really hope for something like that to work, but it might not be enough.

@drodriguez, I haven't tried running the compiler validation suite on this fix, only tried my repro case above after recompiling the stdlib. It would be good to know what else failed for you and why, hopefully not a result of this fix.

@keith
Copy link
Member

keith commented Aug 2, 2022

Your fix is in the stdlib, which already has to be built by the Swift-forked clang, so the host compiler should be irrelevant.

Is this always the case? That you use the host compiler to build the forked clang and then that to build the stdlib? If so that would definitely solve that issue!

@finagolfin
Copy link
Member Author

Is this always the case? That you use the host compiler to build the forked clang and then that to build the stdlib? If so that would definitely solve that issue!

Yes, that is the default for how the Swift toolchain is currently built, because of changes like swiftlang/swift#39245. That change eventually made it into clang 14, llvm/llvm-project@f670c5a, so building the stdlib may work with a stock clang 14 now, or there may be other changes specific to the Swift fork too.

keith added a commit to keith/swift that referenced this issue Aug 2, 2022
@drodriguez
Copy link
Contributor

Do you happen to have any of the example failures around still?

Sadly no. I am recovering --gc-sections internally and I am building again. It is from a completely different branch, but I don't think it is very interesting to know if it was really failing with a branch from back in March.

It would be good to know what else failed for you and why, hopefully not a result of this fix.

I do not think it was a result of the fix. I think when I tried, the fix was very good, but not enough. Hopefully it will all be green and that's the fix.

@keith
Copy link
Member

keith commented Aug 2, 2022

I submitted #5707 in case we can't fix it for real, my Swift PR is blocked on unrelated build failures so I can't run tests. Let me know if you get some useful test data here.

@drodriguez
Copy link
Contributor

The results of my testing:

  • I need to pass SWIFTC_TEST_OPTIONS=" -use-ld=lld" SWIFT_DRIVER_TEST_OPTIONS=" -use-ld=lld" to the test suite to force the usage of lld. Otherwise, it seems that ld.gold is used, which fails with the "R" property (it might be too old, we normally don't use that linker, but it is there).
  • Before all this, in March, we used to compile our toolchain with --gc-sections (and also with -ffunction-sections -fdata-sections). This applies to all the toolchain pieces. We build unified, so LLVM/Clang/Swift build at the same time. Those flags are passed on CMAKE_<LANG>_FLAGS and CMAKE_[EXE|SHARED]_LINKER_FLAGS. This works fine for building and all check-llvm and check-clang tests pass. Because the usage of CMAKE_<LANG>_FLAGS and CMAKE_[EXE|SHARED]_LINKER_FLAGS, those are also applied to the Swift stdlib for the host. check-swift-validation fails hundreds of tests.
  • Removing those linking optimizations (but keeping the "R" and still using the TEST_OPTIONS environment variables) brings the failing tests to 10.
    • Quickly skimming through them, there are around 5 of them that seems to fail trying to match ld in some output, but since I have forced the usage of lld, the strings will never match.
    • There are 2 that are C++ Interop, which I am going to skip, since it might be unrelated or a bad interaction.
    • 2 are the using swift-reflection-dump (Reflection/capture_descriptors.sil and Reflection/typeref_decoding_imported.swift).
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.	Program arguments: <PATH>/bin/swift-reflection-dump -arch x86_64 -binary-filename <PATH>/tools/swift/test-linux-x86_64/Reflection/Output/typeref_decoding_imported.swift.tmp/libTypesToReflect.so
 #0 0x0000000000669273 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (<PATH>/bin/swift-reflection-dump+0x669273)
 #1 0x000000000066703e llvm::sys::RunSignalHandlers() (<PATH>/bin/swift-reflection-dump+0x66703e)
 #2 0x000000000066973a SignalHandler(int) Signals.cpp:0:0
 #3 0x00007fa7ffd065a0 __restore_rt <PATH>/glibc-2.34/signal/../sysdeps/unix/sysv/linux/libc_sigaction.c:13:0
 #4 0x00007fa7ffe72f41 __strnlen_avx2_rtm <PATH>/glibc-2.34/string/../sysdeps/x86_64/multiarch/strlen-avx2.S:74:0
 #5 0x0000000000682f49 bool swift::reflection::ReflectionContext<swift::External<swift::NoObjCInterop<swift::RuntimeTarget<8u> > > >::readELFSections<(anonymous namespace)::ELFTraits<(unsigned char)2> >(swift::remote::RemoteAddress, llvm::Optional<llvm::sys::MemoryBlock>, llvm::SmallVector<llvm::StringRef, 1u>)::'lambda'(llvm::StringRef)::operator()(llvm::StringRef) const ObjectFileContext.cpp:0:0
 #6 0x00000000006818ed swift::reflection::ReflectionContext<swift::External<swift::NoObjCInterop<swift::RuntimeTarget<8u> > > >::readELF(swift::remote::RemoteAddress, llvm::Optional<llvm::sys::MemoryBlock>, llvm::SmallVector<llvm::StringRef, 1u>) ObjectFileContext.cpp:0:0
 #7 0x000000000067b93e swift::reflection::ReflectionContext<swift::External<swift::NoObjCInterop<swift::RuntimeTarget<8u> > > >::addImage(swift::remote::RemoteAddress, llvm::SmallVector<llvm::StringRef, 1u>) ObjectFileContext.cpp:0:0
 #8 0x000000000066f8d8 std::unique_ptr<swift::static_mirror::ReflectionContextHolder, std::default_delete<swift::static_mirror::ReflectionContextHolder> > swift::static_mirror::makeReflectionContextForMetadataReader<swift::External<swift::NoObjCInterop<swift::RuntimeTarget<8u> > > >(std::shared_ptr<swift::static_mirror::ObjectMemoryReader>, unsigned char) (<PATH>/bin/swift-reflection-dump+0x66f8d8)
 #9 0x000000000066f141 swift::static_mirror::makeReflectionContextForObjectFiles(std::vector<llvm::object::ObjectFile const*, std::allocator<llvm::object::ObjectFile const*> > const&) (<PATH>/bin/swift-reflection-dump+0x66f141)
#10 0x00000000004d5502 main (<PATH>/bin/swift-reflection-dump+0x4d5502)
#11 0x00007fa7ffcee657 __libc_start_call_main <PATH>/glibc-2.34/csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#12 0x00007fa7ffcee718 call_init <PATH>/glibc-2.34/csu/../csu/libc-start.c:128:20
#13 0x00007fa7ffcee718 __libc_start_main@GLIBC_2.2.5 <PATH>/glibc-2.34/csu/../csu/libc-start.c:379:5
#14 0x00000000004d4ac1 _start <PATH>/glibc-2.34/csu/../sysdeps/x86_64/start.S:118:0
    • LinkerSections/function_sections.swift also fails:
<PATH>/swift/test/LinkerSections/function_sections.swift:7:11: error: CHECK: expected string not found in input
// CHECK: Discarded input sections
          ^
<stdin>:1:1: note: scanning from here
 VMA LMA Size Align Out In Symbol
^
<stdin>:50:19: note: possible intended match here
 2828 2828 0 1 swift5_accessible_functions
                  ^

Input file: <stdin>
Check file: <PATH>/swift/test/LinkerSections/function_sections.swift

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1:  VMA LMA Size Align Out In Symbol
check:7'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
           2:  350 350 28 1 .interp
check:7'0     ~~~~~~~~~~~~~~~~~~~~~~
           3:  350 350 28 1 <internal>:(.interp)
check:7'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           4:  378 378 20 4 .note.ABI-tag
check:7'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           5:  378 378 20 4 <PATH>/lib/Scrt1.o:(.note.ABI-tag)
check:7'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           6:  378 378 20 1 __abi_tag
check:7'0     ~~~~~~~~~~~~~~~~~~~~~~~~
           .
           .
           .
          45:  2828 2828 0 1 <PATH>/lib/swift/linux/x86_64/swiftrt.o:(swift5_builtin)
check:7'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          46:  2828 2828 0 1 swift5_capture
check:7'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          47:  2828 2828 0 1 <PATH>/lib/swift/linux/x86_64/swiftrt.o:(swift5_capture)
check:7'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          48:  2828 2828 0 1 swift5_mpenum
check:7'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          49:  2828 2828 0 1 <PATH>/lib/swift/linux/x86_64/swiftrt.o:(swift5_mpenum)
check:7'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          50:  2828 2828 0 1 swift5_accessible_functions
check:7'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:7'1                       ?                         possible intended match
          51:  2828 2828 0 1 <PATH>/lib/swift/linux/x86_64/swiftrt.o:(swift5_accessible_functions)
check:7'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          52:  2828 2828 2c 4 .eh_frame_hdr
check:7'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          53:  2828 2828 2c 4 <internal>:(.eh_frame_hdr)
check:7'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          54:  2858 2858 a4 8 .eh_frame
check:7'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~
          55:  2858 2858 2c 1 <PATH>/lib/Scrt1.o:(.eh_frame+0x0)
check:7'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
           .
           .

Keith's tests in swiftlang/swift#60357 didn't seem to find the same problem, so it might be something in our environment, but that every failure relates to metadata does not put me at ease. I think the situation is slightly better than last time I checked, but there seems to be something still going on. If the solution works for upstream, please feel free to use it. We will need to find what's going on at our systems, and if we find something, we will see if it is something specific to us, or something that should be contributed back.

@finagolfin
Copy link
Member Author

@drodriguez, can you do a test run with --gc-sections -z nostart-stop-gc also? That will determine if this new flag is causing the 10 failures or other differences between lld and gold are the real culprit.

@finagolfin
Copy link
Member Author

Or maybe I misunderstood, the 10 failures are with this fix but with --gc-sections disabled?

@drodriguez
Copy link
Contributor

@drodriguez, can you do a test run with --gc-sections -z nostart-stop-gc also? That will determine if this new flag is causing the 10 failures or other differences between lld and gold are the real culprit.

I tried with SWIFTC_TEST_OPTIONS=" -use-ld=lld -Xlinker --gc-sections -Xlinker -z -Xlinker nostart-stop-gc" SWIFT_DRIVER_TEST_OPTIONS=" -use-ld=lld -Xlinker --gc-sections -Xlinker -z -Xlinker nostart-stop-gc" in the environment. Instead of less failures I got more. It seems to share the same 10 as before, but adds tests that use lldb-moduleimport-test (2 in DebugInfo/, 18 in TypeDecoder/). lldb-moduleimport-test doesn't seem to print anything, which makes the test fail (lldb-moduleimport-test should not have been modified from the previous run, since the environment variables should only affect the test compilation, not the testing tools).

Or maybe I misunderstood, the 10 failures are with this fix but with --gc-sections disabled?

Correct. Trying to run with --gc-sections enabled for the stdlib was not going to work (lots of extra failures), so I tried the test suite with LLD (if I understand correctly, the problem is using LLD, which if it creates problems when used in the test suite, it will create problems in real-life usage). The idea is that maybe fixing the things broken in the test suite might have fixed the rest of the problems.

@keith
Copy link
Member

keith commented Aug 3, 2022

For LinkerSections/function_sections.swift I investigated and the problem with that test specifically is it's validating the map file format from ld, which lld does not replicate. The behavior appears to still be "passing" if you adapt it to the lld output.

keith added a commit to keith/swift that referenced this issue Aug 3, 2022
@finagolfin
Copy link
Member Author

Correct. Trying to run with --gc-sections enabled for the stdlib was not going to work (lots of extra failures), so I tried the test suite with LLD

OK, so the original 10 failures were the baseline for lld without --gc-sections (well, the real baseline would be to revert the stdlib fix also, but I would bet that fix did not cause those 10 failures, that lld did) then these additional lldb failures are caused by --gc-sections even with nostart-stop-gc somehow presumably.

I suggest we get #5708 in for now, then we can spend more time validating the stdlib fix and get that in once we're sure of it. You may want to check a lld test run with no linker flags and no stdlib fix to get a true baseline. I will run those different flags combos with the validation suite natively on Android too, which only uses lld now, and report back my results.

@keith
Copy link
Member

keith commented Aug 3, 2022

The 2 swift-reflection-dump crashes reproduce for me locally as well.

Also the build failure on my swift PR is actually real, the issue appears to be that when not using lld the section that I'm forcing the be retained, and the sections that we actually want from the object files are not merged. So you end up with duplicate sections of metadata, which unsurprisingly causes issues.

@keith
Copy link
Member

keith commented Aug 3, 2022

I imagine that the difference in the section flags caused by adding R leads to the gold not merging them. I think that difference would probably be the next thing to investigate. It's probably safe to add that attribute to the actual object file sections as well 🤔

I agree merging the revert for 5.7 is safest. I'm not sure if I'm going to spend much more time on this.

@keith
Copy link
Member

keith commented Aug 3, 2022

For reference this post also covers many pieces of this change https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order

@finagolfin
Copy link
Member Author

Opened swiftlang/swift#60406 with my Android test info, let's take discussion of this underlying lld issue there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants