Skip to content

Boot hang with certain PowerPC configurations after da0e5b885b25cf4ded0fa89b965dc6979ac02ca9 #1581

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nathanchance opened this issue Feb 8, 2022 · 12 comments
Labels
[ARCH] powerpc This bug impacts ARCH=powerpc [BUG] linux A bug that should be fixed in the mainline kernel. [FIXED][LINUX] 5.18 This bug was fixed in Linux 5.18 [TOOL] lld The issue is relevant to LLD linker

Comments

@nathanchance
Copy link
Member

nathanchance commented Feb 8, 2022

After llvm/llvm-project@da0e5b8, both Fedora and OpenSUSE's powerpc64le configurations fail to boot for me in QEMU. On v5.17-rc3:

$ curl -LSso .config https://src.fedoraproject.org/rpms/kernel/raw/rawhide/f/kernel-ppc64le-fedora.config

$ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 LLVM_IAS=0 olddefconfig zImage.epapr

$ boot-qemu.sh -a ppc64le -k . -t 45s
...
[   10.042804807,5] INIT: Waiting for kernel...
[   10.042963994,5] INIT: platform wait for kernel load failed
[   10.043178236,5] INIT: Assuming kernel at 0x20000000
[   10.043400737,5] INIT: 64-bit LE kernel discovered
[   10.043816082,2] NVRAM: Failed to load
[   10.092415012,5] INIT: Starting kernel at 0x20010000, fdt at 0x3068dbd0 16441 bytes

zImage starting: loaded at 0x0000000020010000 (sp: 0x0000000020cdcda0)
Allocating 0x2c37501 bytes for kernel...
Decompressing (0x0000000000000000 <- 0x0000000020023000:0x0000000020cdb275)...
Done! Decompressed 0x2a60000 bytes

Linux/PowerPC load:
Finalizing device tree... flat tree at 0x20cddca0
qemu-system-ppc64: terminating on signal 15 from pid 863670 (timeout)
+ RET=124
+ set +x

At the commit prior to that one, the kernel fires right up:

[   10.046432374,5] INIT: Waiting for kernel...
[   10.046600313,5] INIT: platform wait for kernel load failed
[   10.046828764,5] INIT: Assuming kernel at 0x20000000
[   10.047069394,5] INIT: 64-bit LE kernel discovered
[   10.047512051,2] NVRAM: Failed to load
[   10.097703665,5] INIT: Starting kernel at 0x20010000, fdt at 0x3068dbd0 16441 bytes

zImage starting: loaded at 0x0000000020010000 (sp: 0x0000000020cdcda0)
Allocating 0x2c37501 bytes for kernel...
Decompressing (0x0000000000000000 <- 0x0000000020023000:0x0000000020cdb254)...
Done! Decompressed 0x2a60000 bytes

Linux/PowerPC load:
Finalizing device tree... flat tree at 0x20cddca0
[    0.000000] dt-cpu-ftrs: setup for ISA 2070
[    0.000000] dt-cpu-ftrs: not enabling: subcore (unknown and unsupported by kernel)
[    0.000000] dt-cpu-ftrs: final cpu/mmu features = 0x000000fb8f5db187 0x3c006001
[    0.000000] hash-mmu: Page sizes from device-tree:
[    0.000000] hash-mmu: base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0
[    0.000000] hash-mmu: base_shift=12: shift=16, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=7
[    0.000000] hash-mmu: base_shift=12: shift=24, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=56
[    0.000000] hash-mmu: base_shift=16: shift=16, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=1
[    0.000000] hash-mmu: base_shift=16: shift=24, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=8
[    0.000000] hash-mmu: base_shift=24: shift=24, sllp=0x0100, avpnm=0x00000001, tlbiel=0, penc=0
[    0.000000] hash-mmu: base_shift=34: shift=34, sllp=0x0120, avpnm=0x000007ff, tlbiel=0, penc=3
[    0.000000] Enabling pkeys with max key count 32
[    0.000000] Activating Kernel Userspace Access Prevention
[    0.000000] Activating Kernel Userspace Execution Prevention
[    0.000000] Using 1TB segments
[    0.000000] hash-mmu: Initializing hash mmu with SLB
[    0.000000] Linux version 5.17.0-rc3 ([email protected]) (ClangBuiltLinux clang version 14.0.0 (https://github.com/llvm/llvm-project 460830a9c664e8cce959c660648faa7747ad8bdc), LLD 14.0.0) #1 SMP Mon Feb 7 16:59:00 MST 2022
[    0.000000] Found initrd at 0xc000000028000000:0xc000000028856200
...

I have done no further triage, as I am currently testing a few different things. cc @MaskRay in case something obvious sticks out to you. I am happy to provide any information that might make figuring out what is going wrong any easier.

@nathanchance nathanchance added [BUG] Untriaged Something isn't working [TOOL] lld The issue is relevant to LLD linker labels Feb 8, 2022
@MaskRay
Copy link
Member

MaskRay commented Feb 9, 2022

I encounter a pahole (my apt info dwarves is 1.20-1+gl0) issue.

  LD      .tmp_vmlinux.btf
  BTF     .btf.vmlinux.bin.o
[750483] INT DW_ATE_unsigned_56 Error emitting BTF type
Encountered error while encoding BTF.
  LD      .tmp_vmlinux.kallsyms1
  KSYMS   .tmp_vmlinux.kallsyms1.S
  AS      .tmp_vmlinux.kallsyms1.S
  LD      .tmp_vmlinux.kallsyms2
  KSYMS   .tmp_vmlinux.kallsyms2.S
  AS      .tmp_vmlinux.kallsyms2.S
  LD      vmlinux
  BTFIDS  vmlinux
FAILED: load BTF from vmlinux: Invalid argument
make[2]: *** [/home/maskray/Dev/linux/Makefile:1155: vmlinux] Error 255
make[2]: *** Deleting file 'vmlinux'
make[1]: *** [/home/maskray/Dev/linux/Makefile:350: __build_one_by_one] Error 2
make[1]: Leaving directory '/tmp/linux/ppc64le'
make: *** [Makefile:219: __sub-make] Error 2

The change refactored how DT_RELACOUNT (shown as RELACOUNT in readelf -d output), which is defined for -z combreloc (default): the number of R_*_RELATIVE relocations in the beginning of .rela.dyn. The tag is optional, and when existent, serves as an optimization for glibc ld.so. Many ld.so implementations (FreeBSD rtld-elf, musl, Android bionic) just ignore DT_RELACOUNT.

I can find some RELACOUNT usage in arch/powerpc/ . Is the DT_RELACOUNT value now wrong?

% rg RELACOUNT
include/uapi/linux/elf.h
108:#define DT_RELACOUNT        0x6ffffff9

arch/powerpc/kernel/reloc_64.S
11:RELACOUNT = 0x6ffffff9
30:      * Scan the dynamic section for the RELA and RELACOUNT entries.
41:2:   addis   r6,r6,(-RELACOUNT)@ha
42:     cmpdi   r6,RELACOUNT@l
44:     ld      r8,8(r11)       /* get RELACOUNT value in r8 */
47:4:   cmpdi   r7,0            /* check we have both RELA and RELACOUNT */

arch/powerpc/boot/crt0.S
11:RELACOUNT = 0x6ffffff9
68:      * We need the RELA and RELACOUNT entries. */
78:11:  addis   r8,r8,(-RELACOUNT)@ha
79:     cmpwi   r8,RELACOUNT@l
81:     lwz     r0,4(r12)       /* get RELACOUNT value in r0 */
88:      * The R_PPC_RELATIVE ones come first and there are RELACOUNT
163:10: addis   r12,r12,(-RELACOUNT)@ha
164:    cmpdi   r12,RELACOUNT@l
166:    ld      r8,8(r11)       /* get RELACOUNT value in r8 */
170:    cmpdi   r13,0            /* check we have both RELA and RELACOUNT */

@nickdesaulniers
Copy link
Member

I encounter a pahole (my apt info dwarves is 1.20-1+gl0) issue.

./scripts/config -d CONFIG_DEBUG_INFO_BTF

Is the DT_RELACOUNT value now wrong?

cc @mpe

@nickdesaulniers nickdesaulniers added the [ARCH] powerpc This bug impacts ARCH=powerpc label Feb 9, 2022
@MaskRay
Copy link
Member

MaskRay commented Feb 9, 2022

vmlinux has some position-independent long branch thunks which need R_*_RELATIVE relocations. In such a case, ld.lld generated DT_RELACOUNT value is smaller than the actual value. In my build, the ld.lld produced value is 185685 and off by 27 (see below I think the small difference is acceptable).

I can make the value correct but.. well, it's just more code without many benefits. It's also a bit tricky to implement (I really want to get rid of if (reloc.type == target->relativeRel) ++numRelativeRelocs; in lld/ELF).

glibc logic is the following and a smaller DT_LACOUNT value will still work.

Assert the first DT_RELACOUNT relocations are R_PPC64_RELATIVE. Perform R_PPC64_RELATIVE.

For the rest relocations, check the type and perform one of R_PPC64_RELATIVE/R_PPC64_ADDR64/...

The arch/powerpc code just seems to assume that there is no R_PPC64_RELATIVE after the first DT_RELACOUNT relocations.

I'd hope that the arch/powerpc code just be benign and allow the DT_RELACOUNT value to be any of 0 <= DT_RELACOUNT <= actual.

The DT_RELACOUNT optimization just skips a ELF64_R_TYPE(r_info) == R_PPC64_RELATIVE test when applying relocations... The code will need to be updated anyway to support DT_RELR (https://maskray.me/blog/2021-10-31-relative-relocations-and-relr#glibc) and the DT_RELACOUNT optimization will be irrelevant anyway, then...

powerpc maintainers may search for RELR on https://sourceware.org/pipermail/binutils/2022-January/ and refactor the code to support DT_RELR.

@nickdesaulniers
Copy link
Member

There's probably some important context as why the number of that type of relocation is hard coded in the source...

@nathanchance
Copy link
Member Author

It appears that arch/powerpc/kernel/reloc_64.S is the code that handles these relocations. I am not sure how it should be reworked to handle ld.lld's imprecise DT_RELACOUNT value.

cc @aik since I noticed you opened llvm/llvm-project#53786 above.

nathanchance added a commit to nathanchance/continuous-integration2 that referenced this issue Mar 5, 2022
…4le-llvm-ias"

This reverts commit fdee477, reversing
changes made to 3aa7dcd.

An ld.lld change in main breaks booting CONFIG_RELOCATABLE kernels,
which includes Fedora and OpenSUSE's configurations.

Revert this change for now, until a solution or appropriate workaround
can be created.

Link: ClangBuiltLinux/linux#1581
Signed-off-by: Nathan Chancellor <[email protected]>
@MaskRay
Copy link
Member

MaskRay commented Mar 9, 2022

This patch can fix the issue

From 9822ba2edb1808a2cf3a4690178d20a0ea8dfd36 Mon Sep 17 00:00:00 2001
From: Fangrui Song <hidden>
Date: Tue, 8 Mar 2022 21:31:51 -0800
Subject: [PATCH] powerpc: Replace ppc64 DT_RELACOUNT usage with DT_RELASZ

DT_RELACOUNT is an ELF dynamic tag inherited from SunOS indicating the number
of R_*_RELATIVE relocations. It is optional but {ld.lld,ld.lld} -z combreloc
always creates it (if non-zero) to slightly speed up glibc ld.so relocation
resolving by avoiding R_*R_PPC64_RELATIVE type comparison. The tag is otherwise
nearly unused in the wild.

lld>=15 (since commit da0e5b885b25cf4ded0fa89b965dc6979ac02ca9) underestimates
DT_RELACOUNT for ppc64 when position-independent long branch thunks are used.
Since our code always compares the relocation type with R_PPC64_RELATIVE,
replacing every occurrence of DT_RELACOUNT with
DT_RELASZ/sizeof(Elf64_Rela)=DT_RELASZ/24 is correct. Make such changes to
avoid using the obscure DT_RELACOUNT.

For an unsigned 32-bit integer (the bound of DT_RELASZ), x divided by 24
can be implemented as x*0xaaaaaaab >> 4.

Link: https://github.com/ClangBuiltLinux/linux/issues/1581
Reported-by: Nathan Chancellor <hidden>
Signed-off-by: Fangrui Song <hidden>
---
 arch/powerpc/boot/crt0.S       | 28 +++++++++++++++++-----------
 arch/powerpc/kernel/reloc_64.S | 15 +++++++++------
 2 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/boot/crt0.S b/arch/powerpc/boot/crt0.S
index feadee18e271..1c96ebe7ef1a 100644
--- a/arch/powerpc/boot/crt0.S
+++ b/arch/powerpc/boot/crt0.S
@@ -8,7 +8,7 @@
 #include "ppc_asm.h"
 
 RELA = 7
-RELACOUNT = 0x6ffffff9
+RELASZ = 8
 
 	.data
 	/* A procedure descriptor used when booting this as a COFF file.
@@ -65,7 +65,7 @@ p_base:	mflr	r10		/* r10 now points to runtime addr of p_base */
 	subf	r11,r11,r12	/* runtime - linktime offset */
 
 	/* The dynamic section contains a series of tagged entries.
-	 * We need the RELA and RELACOUNT entries. */
+	 * We need the RELA and RELASZ entries. */
 	li	r9,0
 	li	r0,0
 9:	lwz	r8,0(r12)	/* get tag */
@@ -75,18 +75,21 @@ p_base:	mflr	r10		/* r10 now points to runtime addr of p_base */
 	bne	11f
 	lwz	r9,4(r12)	/* get RELA pointer in r9 */
 	b	12f
-11:	addis	r8,r8,(-RELACOUNT)@ha
-	cmpwi	r8,RELACOUNT@l
+11:	cmpwi	r8,RELASZ
 	bne	12f
-	lwz	r0,4(r12)	/* get RELACOUNT value in r0 */
+	lwz	r0,4(r12)	/* get RELASZ / 24 in r0 */
+	lis     r8,0xaaaa
+	ori     r8,r8,0xaaab
+	mulhwu  r0,r0,r8
+	srwi    r0,r0,4
 12:	addi	r12,r12,8
 	b	9b
 
 	/* The relocation section contains a list of relocations.
 	 * We now do the R_PPC_RELATIVE ones, which point to words
 	 * which need to be initialized with addend + offset.
-	 * The R_PPC_RELATIVE ones come first and there are RELACOUNT
-	 * of them. */
+	 * The R_PPC_RELATIVE ones come first and there are at most
+         * RELASZ/24 of them. */
 10:	/* skip relocation if we don't have both */
 	cmpwi	r0,0
 	beq	3f
@@ -160,14 +163,17 @@ p_base:	mflr	r10		/* r10 now points to runtime addr of p_base */
 	bne	10f
 	ld	r13,8(r11)       /* get RELA pointer in r13 */
 	b	11f
-10:	addis	r12,r12,(-RELACOUNT)@ha
-	cmpdi	r12,RELACOUNT@l
+10:	cmpdi	r12,RELASZ
 	bne	11f
-	ld	r8,8(r11)       /* get RELACOUNT value in r8 */
+	ld	r8,8(r11)       /* get RELASZ / 24 in r8 */
+	lis     r0,0xaaaa
+	ori     r0,r0,0xaaab
+	mulhwu  r8,r8,r0
+	srwi    r8,r8,4
 11:	addi	r11,r11,16
 	b	9b
 12:
-	cmpdi	r13,0            /* check we have both RELA and RELACOUNT */
+	cmpdi	r13,0            /* check we have both RELA and RELASZ */
 	cmpdi	cr1,r8,0
 	beq	3f
 	beq	cr1,3f
diff --git a/arch/powerpc/kernel/reloc_64.S b/arch/powerpc/kernel/reloc_64.S
index 02d4719bf43a..362be759609f 100644
--- a/arch/powerpc/kernel/reloc_64.S
+++ b/arch/powerpc/kernel/reloc_64.S
@@ -8,7 +8,7 @@
 #include <asm/ppc_asm.h>
 
 RELA = 7
-RELACOUNT = 0x6ffffff9
+RELASZ = 8
 R_PPC64_RELATIVE = 22
 
 /*
@@ -27,7 +27,7 @@ _GLOBAL(relocate)
 	add	r10,r10,r12	/* r10 has runtime addr of _stext */
 
 	/*
-	 * Scan the dynamic section for the RELA and RELACOUNT entries.
+	 * Scan the dynamic section for the RELA and RELASZ entries.
 	 */
 	li	r7,0
 	li	r8,0
@@ -38,13 +38,16 @@ _GLOBAL(relocate)
 	bne	2f
 	ld	r7,8(r11)	/* get RELA pointer in r7 */
 	b	3f
-2:	addis	r6,r6,(-RELACOUNT)@ha
-	cmpdi	r6,RELACOUNT@l
+2:	cmpdi	r6,RELASZ
 	bne	3f
-	ld	r8,8(r11)	/* get RELACOUNT value in r8 */
+	ld	r8,8(r11)	/* get RELA / 24 in r8 */
+	lis     r0,0xaaaa
+	ori     r0,r0,0xaaab
+	mulhwu  r8,r8,r0
+	srwi    r8,r8,4
 3:	addi	r11,r11,16
 	b	1b
-4:	cmpdi	r7,0		/* check we have both RELA and RELACOUNT */
+4:	cmpdi	r7,0		/* check we have both RELA and RELASZ */
 	cmpdi	cr1,r8,0
 	beq	6f
 	beq	cr1,6f
-- 
2.35.1

@nathanchance nathanchance added [BUG] linux A bug that should be fixed in the mainline kernel. and removed [BUG] Untriaged Something isn't working labels Mar 9, 2022
@nathanchance
Copy link
Member Author

Thanks a lot Fangrui! That patch works for me, I have provided a tag upstream.

https://lore.kernel.org/r/[email protected]/

@nickdesaulniers
Copy link
Member

@MaskRay noted

I rebased the patch on
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
master and got a conflict.
Seems that https://lore.kernel.org/linuxppc-dev/[email protected]/T/#u
("[PATCH kernel v4] powerpc/64: Add UADDR64 relocation support") fixed
the issue.
It just doesn't change arch/powerpc/boot/crt0.S

So this seems implicitly fixed in -next.
commit d799769 ("powerpc/64: Add UADDR64 relocation support")

After llvm/llvm-project@da0e5b8, both Fedora and OpenSUSE's powerpc64le configurations fail to boot for me in QEMU.

fedora and suse CI green runs.

@nickdesaulniers
Copy link
Member

cc @aik @mpe on @MaskRay 's comment about arch/powerpc/boot/crt0.S. Should @MaskRay resend just his hunks against that patch?

@nathanchance
Copy link
Member Author

After llvm/llvm-project@da0e5b8, both Fedora and OpenSUSE's powerpc64le configurations fail to boot for me in QEMU.

fedora and suse CI green runs.

Do note that both of those runs did not use LD=ld.lld (see the CC=clang). Let me test next-20220310 with both of those configurations locally then I will update ClangBuiltLinux/continuous-integration2#326 by dropping Fangrui's patch and applying Alexey's patch to mainline.

@nathanchance
Copy link
Member Author

Merged into mainline: https://git.kernel.org/linus/d799769188529abc6cbf035a10087a51f7832b6b

I am not sure this would be an acceptable backport for stable, so I am just going to close this up for now.

@nathanchance nathanchance added [FIXED][LINUX] 5.18 This bug was fixed in Linux 5.18 and removed [PATCH] Submitted A patch has been submitted for review labels Mar 30, 2022
@nathanchance
Copy link
Member Author

As d799769188529abc6cbf035a10087a51f7832b6b also resolves #1594, I have started a thread about whether or not applying that change to stable is appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[ARCH] powerpc This bug impacts ARCH=powerpc [BUG] linux A bug that should be fixed in the mainline kernel. [FIXED][LINUX] 5.18 This bug was fixed in Linux 5.18 [TOOL] lld The issue is relevant to LLD linker
Projects
None yet
Development

No branches or pull requests

3 participants