Skip to content

[RISCV][LLD] Unable to resolve R_RISCV_ALIGN when mixing +C and -C object files #63964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
preames opened this issue Jul 19, 2023 · 5 comments · Fixed by #73977
Closed

[RISCV][LLD] Unable to resolve R_RISCV_ALIGN when mixing +C and -C object files #63964

preames opened this issue Jul 19, 2023 · 5 comments · Fixed by #73977

Comments

@preames
Copy link
Collaborator

preames commented Jul 19, 2023

When linking object files which were generated with and without compressed instructions, we will sometimes fail to resolve a R_RISCV_ALIGN relocation.

In assert builds, this results in an assert failure with the following message: ld.lld: ../llvm-project/lld/ELF/Arch/RISCV.cpp:687: bool relax(lld::elf::InputSection&): Assertion static_cast<int32_t>(remove) >= 0 && "R_RISCV_ALIGN needs expanding the content"' failed.`

In non-assert builds, you sometimes get linker errors such as "section size decrease is too large: 4294967472" . This is likely not the only symptom of the broken internal invariant implied by the assert above.

Here's a small stand alone reproducer:

$ cat without-c.c

// Other is left undefined in the link.  Calls to this function are simply
// spacing to get the offsets just right
void other();

void  __attribute__((noinline, aligned(32))) foo() {
  other();
  other();
  other();
  // LLD uses a compressed instruction for relaxing this call
  foo();
  other();
  return;
}

// It's the alignment of this function that we can not satisfy.
void  __attribute__((noinline, aligned(32))) bar() {
  return;
}

$ cat with-c.c
// Contents of this file don't matter, we just need an object file
// with 'C' in the attributes

$ cat repo.sh 
CC="/home/preames/llvm-dev/build/bin/clang -target riscv64-unknown-linux-gnu"

rm *.o
$CC -c with-c.c -march=rv64gc -O3
#$CC -S with-c.c -march=rv64gc -O3

$CC -c without-c.c -march=rv64g -O3
#$CC -S without-c.c -march=rv64g -O3
/home/preames/llvm-dev/build/bin/ld.lld with-c.o without-c.o --shared

I believe the root issue here is that we're enabling call relaxation based on whether any object file in the link had compressed enabled, not whether the object file corresponding to the section being processed does. The following diff seems to fix at least my motivating case, but I don't know if this is a full fix or not.

diff --git a/lld/ELF/Arch/RISCV.cpp b/lld/ELF/Arch/RISCV.cpp
index d0d75118e30d..ebbc9601ebf3 100644
--- a/lld/ELF/Arch/RISCV.cpp
+++ b/lld/ELF/Arch/RISCV.cpp
@@ -589,7 +589,7 @@ static void initSymbolAnchors() {
 // Relax R_RISCV_CALL/R_RISCV_CALL_PLT auipc+jalr to c.j, c.jal, or jal.
 static void relaxCall(const InputSection &sec, size_t i, uint64_t loc,
                       Relocation &r, uint32_t &remove) {
-  const bool rvc = config->eflags & EF_RISCV_RVC;
+  const bool rvc = getEFlags(sec.file) & EF_RISCV_RVC;
   const Symbol &sym = *r.sym;
   const uint64_t insnPair = read64le(sec.content().data() + r.offset);
   const uint32_t rd = extractBits(insnPair, 32 + 11, 32 + 7);

Here's a related patch which turns the assertion into a recoverable error. In this test case, the problem was an internal invariant violation, but I don't think there's anything preventing an invalid object file from reaching here.

diff --git a/lld/ELF/Arch/RISCV.cpp b/lld/ELF/Arch/RISCV.cpp
index 637e24b5ce2d..6cf21bc482dc 100644
--- a/lld/ELF/Arch/RISCV.cpp
+++ b/lld/ELF/Arch/RISCV.cpp
@@ -689,8 +689,16 @@ static bool relax(InputSection &sec) {
       const uint64_t align = PowerOf2Ceil(r.addend + 2);
       // All bytes beyond the alignment boundary should be removed.
       remove = nextLoc - ((loc + align - 1) & -align);
-      assert(static_cast<int32_t>(remove) >= 0 &&
-             "R_RISCV_ALIGN needs expanding the content");
+      if (LLVM_UNLIKELY(static_cast<int32_t>(remove) < 0)) {
+        // This is a malformed input file (or a broken internal invariant).  In
+        // case it's the former, report an error.  In either case, adjust the
+        // amount remove to avoid cascading errors.
+        error(getErrorLocation((const uint8_t*)loc) +
+              "insufficient padding bytes for " + lld::toString(r.type) +
+              ": " + Twine(r.addend) + " bytes available, but " +
+              Twine(r.addend - remove) + " bytes required");
+        remove = r.addend;
+      }
       break;
     }
     case R_RISCV_CALL:

@llvmbot
Copy link
Member

llvmbot commented Jul 19, 2023

@llvm/issue-subscribers-backend-risc-v

@EugeneZelenko EugeneZelenko added lld:ELF and removed lld labels Jul 19, 2023
@llvmbot
Copy link
Member

llvmbot commented Jul 19, 2023

@llvm/issue-subscribers-lld-elf

@preames
Copy link
Collaborator Author

preames commented Jul 19, 2023

@MaskRay @jrtc27 - Am I barking up the right tree with my analysis here? Really not my area.

@MaskRay
Copy link
Member

MaskRay commented Jul 23, 2023

I haven't checked the case in detail but from the looking of it, this symptom is very likely the same as riscv-collab/riscv-gnu-toolchain#445
We will need RVC bearing semantics in R_RISCV_RELAX.

(I shall add more information to https://maskray.me/blog/2021-03-14-the-dark-side-of-riscv-linker-relaxation "Linker friendly R_RISCV_ALIGN")

On the lld side, an errorOrWarn should be reasonable...

@MaskRay
Copy link
Member

MaskRay commented Oct 30, 2023

I think your analysis is correct. We need the following change.

-  const bool rvc = config->eflags & EF_RISCV_RVC;
+  const bool rvc = getEFlags(sec.file) & EF_RISCV_RVC;

Without it, we have the reported crash:

cat > without-c.c <<e
// Other is left undefined in the link.  Calls to this function are simply
// spacing to get the offsets just right
void other();

void  __attribute__((noinline, aligned(32))) foo() {
  other();
  other();
  other();
  // LLD uses a compressed instruction for relaxing this call
  foo();
  other();
  return;
}

// It's the alignment of this function that we can not satisfy.
void  __attribute__((noinline, aligned(32))) bar() {
  return;
}
e
truncate -s0 with-c.c
clang --target=riscv64-unknown-linux-gnu -c -fpic with-c.c -march=rv64gc -O3
clang --target=riscv64-unknown-linux-gnu -c -fpic without-c.c -march=rv64g -O3
ld.lld -shared with-c.o without-c.o   # crash

preames added a commit to preames/llvm-project that referenced this issue Nov 30, 2023
This fixes a mis-link when mixing compressed and non-compressed input to
LLD.  When relaxing calls, we must respect the source file that the section
came from when deciding whether it's legal to use compressed instructions.
If the call in question comes from a non-rvc source, then it will not expect
2-byte alignments and cascading failures may result.

This fixes llvm#63964.  The symptom
seen there is that a latter RISCV_ALIGN can't be satisfied and we either
fail an assert or produce a totally bogus link result.  (It can be easily
reproduced by putting .p2align 5 right before the nop in the reduced test
case and running check-lld on an assertions enabled build.)  However,
it's important to note this is just one possible symptom of the problem.

If the resulting binary has a runtime switch between rvc and non-rvc routines
(via e.g. ifuncs), then even if we manage to link we may execute invalid
instructions on a machine which doesn't implement compressed instructions.
preames added a commit that referenced this issue Dec 1, 2023
…cts (#73977)

This fixes a mis-link when mixing compressed and non-compressed input to
LLD. When relaxing calls, we must respect the source file that the
section came from when deciding whether it's legal to use compressed
instructions. If the call in question comes from a non-rvc source, then it will not
expect 2-byte alignments and cascading failures may result.

This fixes #63964. The symptom 
seen there is that a latter RISCV_ALIGN can't be satisfied and we either
fail an assert or produce a totally bogus link result. (It can be easily
reproduced by putting .p2align 5 right before the nop in the reduced
test case and running check-lld on an assertions enabled build.)  However,
it's important to note this is just one possible symptom of the problem.

If the resulting binary has a runtime switch between rvc and non-rvc
routines (via e.g. ifuncs), then even if we manage to link we may execute invalid
instructions on a machine which doesn't implement compressed instructions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants