-
Notifications
You must be signed in to change notification settings - Fork 13.6k
powerpc64 parameter save area handling does not match PPC64 ELF ABI #13995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
On closer inspection, the code from the example shows arguments being stored to slots in the local variable area, not the caller's parameter save area. The 64-bit SVR ABI requires that, if any argument has its address taken by the callee, all arguments passed in registers shall be stored by the callee in the caller's parameter save area. This requirement is not implemented at all. This requirement only applies to the 64-bit SVR ABI. |
Correction, this also applies to the 32-bit SVR4 ABI. This is not currently well-documented in the ABI spec, but that will be corrected shortly. |
After discussing this with some colleagues, this ABI requirement is considered optional in practice. Its purpose is to allow the callee to avoid having to generate a stack frame when taking the address of a parameter. This makes it a performance optimization, rather than an interoperability issue of any kind. Therefore I'm planning to treat this as lower priority, and work on more urgent matters for now. Here are some notes on the issues I've encountered, and a partial patch that solves part of the problem. (1) The first difficulty is determining when the callee takes the address of one of its arguments. I discussed this briefly with Duncan Sands on IRC. Clang will generate code like this for each formal argument: %x.addr = alloca i32, align 4 The idea is that, if %x.addr is not used otherwise, the optimizers will remove the alloca. If the alloca remains in place, the argument's address has been used. Duncan believed this would occur at all optimization levels. Unfortunately this is not the case at -O0. Presumably no dead local store elimination is done, so the alloca remains in the code. I was able to work around this (see attached patch) by further walking the use-chain for %x.addr. If any use is not a memory operation, then the argument is considered to have its address taken. This seemed to work on simple tests that I put together, but would need review. (2) I've put together code in the attached patch that does the work of forcing the arguments passed in registers into the caller's parameter save area. However, this is only part of what needs to happen. LLVM still makes additional copies of the arguments and stores them into the callee's stack frame. For the use of the caller's frame to be useful, the extra copies need to be eradicated. It seems to me the best way to do this would be to make a pass over the entry block in processFunctionBeforeFrameFinalized to detect the duplications and map the positive-valued frame indices (callee slots) to the corresponding negative-valued frame indices (caller slots). It should be sufficient to look in the entry block for stores from the formal arguments into the stack. Once the mapping is known, a pass over the code can be done to replace the callee frame indices with caller frame indices, and remove the extra stores. For extra credit, the "holes" in the positive-valued frame indices should be compressed out, renumbering the remaining slots to be contiguous and thus reducing the size of the stack frame. (3) For testing purposes, the following provides good coverage of the possibilities: typedef int v4si attribute ((vector_size (16))); void addrtaken(float f, int i, double d, v4si v, long m, v4sf w) { Currently this test fails at lower optimization levels due to a couple of bugs I've filed patches for today. With those fixed, it runs cleanly and produces correct stores to the parameter save area. |
This has been idle for quite a while. Since it isn't a bug, and is very low priority compared to other correctness, completeness, and performance issues, I don't foresee it being looked at anytime soon. Marking as WONTFIX. It can be reopened someday if it seems important. |
$ cat test.s // llvm-mc -triple=riscv32 -mattr=+Xsecure test.s -show-inst // llvm-mc -triple=riscv32 -mattr=+Xsecure test.s -filetype=obj -o test.o // llvm-objdump -d --mattr=+Xsecure test.o .text .globl _start _start: # Test the new instructions: openstealth.rol x5, x6, x7 openstealth.ror x5, x6, x7 openstealth.rori x5, x6, 13 # Return or jump to an infinite loop to avoid crashing ret $ llvm-mc -triple=riscv32 -mattr=+Xsecure test.s -show-inst .text .globl _start _start: openstealth.rol t0, t1, t2 # <MCInst llvm#13993 XSECURE_ROL # <MCOperand Reg:48> # <MCOperand Reg:49> # <MCOperand Reg:50>> openstealth.ror t0, t1, t2 # <MCInst llvm#13994 XSECURE_ROR # <MCOperand Reg:48> # <MCOperand Reg:49> # <MCOperand Reg:50>> openstealth.rori t0, t1, 13 # <MCInst llvm#13995 XSECURE_RORI # <MCOperand Reg:48> # <MCOperand Reg:49> # <MCOperand Imm:13>> ret # <MCInst llvm#12777 JALR # <MCOperand Reg:43> # <MCOperand Reg:44> # <MCOperand Imm:0>> $ llvm-objdump -d --mattr=+Xsecure test.o test.o: file format elf32-littleriscv Disassembly of section .text: 00000000 <_start>: 0: 607312b3 openstealth.rol t0, t1, t2 4: 607352b3 openstealth.ror t0, t1, t2 8: 60d35293 openstealth.rori t0, t1, 0xd c: 00008067 ret
$ cat test.s // llvm-mc -triple=riscv32 -mattr=+Xsecure test.s -show-inst // llvm-mc -triple=riscv32 -mattr=+Xsecure test.s -filetype=obj -o test.o // llvm-objdump -d --mattr=+Xsecure test.o .text .globl _start _start: # Test the new instructions: openstealth.rol x5, x6, x7 openstealth.ror x5, x6, x7 openstealth.rori x5, x6, 13 # Return or jump to an infinite loop to avoid crashing ret $ llvm-mc -triple=riscv32 -mattr=+Xsecure test.s -show-inst .text .globl _start _start: openstealth.rol t0, t1, t2 # <MCInst llvm#13993 XSECURE_ROL # <MCOperand Reg:48> # <MCOperand Reg:49> # <MCOperand Reg:50>> openstealth.ror t0, t1, t2 # <MCInst llvm#13994 XSECURE_ROR # <MCOperand Reg:48> # <MCOperand Reg:49> # <MCOperand Reg:50>> openstealth.rori t0, t1, 13 # <MCInst llvm#13995 XSECURE_RORI # <MCOperand Reg:48> # <MCOperand Reg:49> # <MCOperand Imm:13>> ret # <MCInst llvm#12777 JALR # <MCOperand Reg:43> # <MCOperand Reg:44> # <MCOperand Imm:0>> $ llvm-objdump -d --mattr=+Xsecure test.o test.o: file format elf32-littleriscv Disassembly of section .text: 00000000 <_start>: 0: 607312b3 openstealth.rol t0, t1, t2 4: 607352b3 openstealth.ror t0, t1, t2 8: 60d35293 openstealth.rori t0, t1, 0xd c: 00008067 ret $ clang --target=riscv32 -mcpu=attorv32 test.s -c
Extended Description
On powerpc64-unknown-linux-gnu, the layout of the parameter save area is not compatible with the 64-bit PowerPC ELF Application Binary Interface Supplement (see http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi-1.9.html). In particular, simple integer types and single-precision floating-point types are supposed to each be mapped to a separate doubleword (see section 3.2.3). Instead, two adjacent "int" and/or "float" parameters share the same doubleword.
Consider:
static int
passem (int a, int b, int c, int d, int e, int f, int g, int h, int i, int j)
{
return ((a * b) + (c * d) - (e * f)) / (g + h - i + j);
}
The beginning of the generated assembly code shows the parameters being stored in adjacent four-byte fields:
.L.passem:
std 29, -24(1)
std 30, -16(1)
mr 11, 3
stw 11, -28(1)
mr 11, 4
stw 11, -32(1)
mr 11, 5
stw 11, -36(1)
mr 11, 6
stw 11, -40(1)
mr 11, 7
stw 11, -44(1)
mr 11, 8
stw 11, -48(1)
lwz 11, 116(1)
mr 12, 9
lwz 30, 124(1)
stw 12, -52(1)
mr 12, 10
stw 12, -56(1)
stw 11, -60(1)
stw 30, -64(1)
lwz 11, -40(1)
lwz 12, -36(1)
A similar bit of example code shows the same thing for floating-point:
static float
passem (float a, float b, float c, float d, float e,
float f, float g, float h, float i, float j)
{
return ((a * b) + (c * d) - (e * f)) / (g + h - i + j);
}
.L.passem:
stfd 28, -32(1)
stfd 29, -24(1)
stfd 30, -16(1)
stfd 31, -8(1)
stfs 1, -36(1)
stfs 2, -40(1)
stfs 3, -44(1)
stfs 4, -48(1)
stfs 5, -52(1)
stfs 6, -56(1)
stfs 7, -60(1)
stfs 8, -64(1)
stfs 9, -68(1)
stfs 10, -72(1)
In both cases, the addresses should be eight bytes apart, with ints sign-extended to an eight-byte value, and floats residing in the second word of the doubleword.
The text was updated successfully, but these errors were encountered: