-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[ARM] R11 not pushed adjacent to link register with PAC-M and AAPCS frame chain fix #82801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ARM] R11 not pushed adjacent to link register with PAC-M and AAPCS frame chain fix #82801
Conversation
When code for M class architecture was compiled with AAPCS and PAC enabled, the frame pointer, r11, was not pushed to the stack adjacent to the link register. Due to PAC being enabled, r12 was placed between r11 and lr. This patch fixes this by adding an extra case to the already existing code that splits the GPR push in two when R11 is the frame pointer and certain paremeters are met. The differential revision for this previous change can be found here: https://reviews.llvm.org/D125649. This now ensures that r11 and lr are pushed in a separate push instruction to the other GPRs when PAC and AAPCS are enabled, meaning the frame pointer and link register are now pushed onto the stack adjacent to each other.
@llvm/pr-subscribers-backend-arm Author: James Westwood (jwestwood921) ChangesWhen code for M class architecture was compiled with AAPCS and PAC enabled, the frame pointer, r11, was not pushed to the stack adjacent to the link register. Due to PAC being enabled, r12 was placed between r11 and lr. This patch fixes this by adding an extra case to the already existing code that splits the GPR push in two when R11 is the frame pointer and certain paremeters are met. The differential revision for this previous change can be found here: https://reviews.llvm.org/D125649. This now ensures that r11 and lr are pushed in a separate push instruction to the other GPRs when PAC and AAPCS are enabled, meaning the frame pointer and link register are now pushed onto the stack adjacent to each other. Patch is 21.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/82801.diff 6 Files Affected:
diff --git a/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp b/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp
index 9adf758b46c481..42e2b89260e16a 100644
--- a/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp
@@ -62,14 +62,14 @@ ARMBaseRegisterInfo::ARMBaseRegisterInfo()
const MCPhysReg*
ARMBaseRegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {
const ARMSubtarget &STI = MF->getSubtarget<ARMSubtarget>();
- bool UseSplitPush = STI.splitFramePushPop(*MF);
+ bool UseSplitPush = STI.splitFramePushPopR7(*MF);
const Function &F = MF->getFunction();
if (F.getCallingConv() == CallingConv::GHC) {
// GHC set of callee saved regs is empty as all those regs are
// used for passing STG regs around
return CSR_NoRegs_SaveList;
- } else if (STI.splitFramePointerPush(*MF)) {
+ } else if (STI.framePointerRequiredForSEHUnwind(*MF)) {
return CSR_Win_SplitFP_SaveList;
} else if (F.getCallingConv() == CallingConv::CFGuard_Check) {
return CSR_Win_AAPCS_CFGuard_Check_SaveList;
diff --git a/llvm/lib/Target/ARM/ARMFrameLowering.cpp b/llvm/lib/Target/ARM/ARMFrameLowering.cpp
index eeb7f64aa5810e..f288e8be7d3816 100644
--- a/llvm/lib/Target/ARM/ARMFrameLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMFrameLowering.cpp
@@ -718,9 +718,13 @@ static int getMaxFPOffset(const ARMSubtarget &STI, const ARMFunctionInfo &AFI,
// This is a conservative estimation: Assume the frame pointer being r7 and
// pc("r15") up to r8 getting spilled before (= 8 registers).
int MaxRegBytes = 8 * 4;
- if (STI.splitFramePointerPush(MF)) {
+ if (STI.r11AndLRNotAdjacent(MF) &&
+ STI.getRegisterInfo()->getFrameRegister(MF) == ARM::R11)
// Here, r11 can be stored below all of r4-r15 (3 registers more than
- // above), plus d8-d15.
+ // above).
+ MaxRegBytes = 11 * 4;
+ if (STI.framePointerRequiredForSEHUnwind(MF)) {
+ // Here, r11 can be stored below all of r4-r15 plus d8-d15.
MaxRegBytes = 11 * 4 + 8 * 8;
}
int FPCXTSaveSize =
@@ -788,7 +792,7 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF,
}
// Determine spill area sizes.
- if (STI.splitFramePointerPush(MF)) {
+ if (STI.splitFramePushPopR11(MF)) {
for (const CalleeSavedInfo &I : CSI) {
Register Reg = I.getReg();
int FI = I.getFrameIdx();
@@ -834,7 +838,7 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF,
case ARM::R10:
case ARM::R11:
case ARM::R12:
- if (STI.splitFramePushPop(MF)) {
+ if (STI.splitFramePushPopR7(MF)) {
GPRCS2Size += 4;
break;
}
@@ -897,13 +901,13 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF,
unsigned GPRCS2Offset = GPRCS1Offset - GPRCS2Size;
Align DPRAlign = DPRCSSize ? std::min(Align(8), Alignment) : Align(4);
unsigned DPRGapSize = GPRCS1Size + FPCXTSaveSize + ArgRegsSaveSize;
- if (!STI.splitFramePointerPush(MF)) {
+ if (!STI.framePointerRequiredForSEHUnwind(MF)) {
DPRGapSize += GPRCS2Size;
}
DPRGapSize %= DPRAlign.value();
unsigned DPRCSOffset;
- if (STI.splitFramePointerPush(MF)) {
+ if (STI.framePointerRequiredForSEHUnwind(MF)) {
DPRCSOffset = GPRCS1Offset - DPRGapSize - DPRCSSize;
GPRCS2Offset = DPRCSOffset - GPRCS2Size;
} else {
@@ -922,8 +926,8 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF,
AFI->setGPRCalleeSavedArea2Offset(GPRCS2Offset);
AFI->setDPRCalleeSavedAreaOffset(DPRCSOffset);
- // Move past area 2.
- if (GPRCS2Size > 0 && !STI.splitFramePointerPush(MF)) {
+ // Move past area 2, unless following Win_AAPCS_CFGuard calling convention.
+ if (GPRCS2Size > 0 && !STI.framePointerRequiredForSEHUnwind(MF)) {
GPRCS2Push = LastPush = MBBI++;
DefCFAOffsetCandidates.addInst(LastPush, GPRCS2Size);
}
@@ -963,13 +967,14 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF,
} else
NumBytes = DPRCSOffset;
- if (GPRCS2Size > 0 && STI.splitFramePointerPush(MF)) {
+ // Move past area 2 if following Win_AAPCS_CFGuard calling convention.
+ if (GPRCS2Size > 0 && STI.framePointerRequiredForSEHUnwind(MF)) {
GPRCS2Push = LastPush = MBBI++;
DefCFAOffsetCandidates.addInst(LastPush, GPRCS2Size);
}
bool NeedsWinCFIStackAlloc = NeedsWinCFI;
- if (STI.splitFramePointerPush(MF) && HasFP)
+ if (STI.framePointerRequiredForSEHUnwind(MF) && HasFP)
NeedsWinCFIStackAlloc = false;
if (STI.isTargetWindows() && WindowsRequiresStackProbe(MF, NumBytes)) {
@@ -1074,7 +1079,7 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF,
AfterPush = std::next(GPRCS1Push);
unsigned PushSize = sizeOfSPAdjustment(*GPRCS1Push);
int FPOffset = PushSize + FramePtrOffsetInPush;
- if (STI.splitFramePointerPush(MF)) {
+ if (STI.splitFramePushPopR11(MF)) {
AfterPush = std::next(GPRCS2Push);
emitRegPlusImmediate(!AFI->isThumbFunction(), MBB, AfterPush, dl, TII,
FramePtr, ARM::SP, 0, MachineInstr::FrameSetup);
@@ -1106,7 +1111,7 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF,
// instructions below don't need to be replayed to unwind the stack.
if (NeedsWinCFI && MBBI != MBB.begin()) {
MachineBasicBlock::iterator End = MBBI;
- if (HasFP && STI.splitFramePointerPush(MF))
+ if (HasFP && STI.framePointerRequiredForSEHUnwind(MF))
End = AfterPush;
insertSEHRange(MBB, {}, End, TII, MachineInstr::FrameSetup);
BuildMI(MBB, End, dl, TII.get(ARM::SEH_PrologEnd))
@@ -1118,61 +1123,114 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF,
// the necessary DWARF cf instructions to describe the situation. Start by
// recording where each register ended up:
if (GPRCS1Size > 0 && !NeedsWinCFI) {
- MachineBasicBlock::iterator Pos = std::next(GPRCS1Push);
- int CFIIndex;
- for (const auto &Entry : CSI) {
- Register Reg = Entry.getReg();
- int FI = Entry.getFrameIdx();
- switch (Reg) {
- case ARM::R8:
- case ARM::R9:
- case ARM::R10:
- case ARM::R11:
- case ARM::R12:
- if (STI.splitFramePushPop(MF))
+ if (STI.r11AndLRNotAdjacent(MF) &&
+ RegInfo->getFrameRegister(MF) == ARM::R11) {
+ MachineBasicBlock::iterator Pos = std::next(GPRCS1Push);
+ int CFIIndex;
+ for (const auto &Entry : CSI) {
+ Register Reg = Entry.getReg();
+ int FI = Entry.getFrameIdx();
+ switch (Reg) {
+ case ARM::R0:
+ case ARM::R1:
+ case ARM::R2:
+ case ARM::R3:
+ case ARM::R4:
+ case ARM::R5:
+ case ARM::R6:
+ case ARM::R7:
+ case ARM::R8:
+ case ARM::R9:
+ case ARM::R10:
+ case ARM::R12:
+ CFIIndex = MF.addFrameInst(MCCFIInstruction::createOffset(
+ nullptr, MRI->getDwarfRegNum(Reg, true),
+ MFI.getObjectOffset(FI)));
+ BuildMI(MBB, Pos, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
+ .addCFIIndex(CFIIndex)
+ .setMIFlags(MachineInstr::FrameSetup);
break;
- [[fallthrough]];
- case ARM::R0:
- case ARM::R1:
- case ARM::R2:
- case ARM::R3:
- case ARM::R4:
- case ARM::R5:
- case ARM::R6:
- case ARM::R7:
- case ARM::LR:
- CFIIndex = MF.addFrameInst(MCCFIInstruction::createOffset(
- nullptr, MRI->getDwarfRegNum(Reg, true), MFI.getObjectOffset(FI)));
- BuildMI(MBB, Pos, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
- .addCFIIndex(CFIIndex)
- .setMIFlags(MachineInstr::FrameSetup);
- break;
+ }
+ }
+ } else {
+ MachineBasicBlock::iterator Pos = std::next(GPRCS1Push);
+ int CFIIndex;
+ for (const auto &Entry : CSI) {
+ Register Reg = Entry.getReg();
+ int FI = Entry.getFrameIdx();
+ switch (Reg) {
+ case ARM::R8:
+ case ARM::R9:
+ case ARM::R10:
+ case ARM::R11:
+ case ARM::R12:
+ if (STI.splitFramePushPopR7(MF))
+ break;
+ [[fallthrough]];
+ case ARM::R0:
+ case ARM::R1:
+ case ARM::R2:
+ case ARM::R3:
+ case ARM::R4:
+ case ARM::R5:
+ case ARM::R6:
+ case ARM::R7:
+ case ARM::LR:
+ CFIIndex = MF.addFrameInst(MCCFIInstruction::createOffset(
+ nullptr, MRI->getDwarfRegNum(Reg, true),
+ MFI.getObjectOffset(FI)));
+ BuildMI(MBB, Pos, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
+ .addCFIIndex(CFIIndex)
+ .setMIFlags(MachineInstr::FrameSetup);
+ break;
+ }
}
}
}
if (GPRCS2Size > 0 && !NeedsWinCFI) {
MachineBasicBlock::iterator Pos = std::next(GPRCS2Push);
- for (const auto &Entry : CSI) {
- Register Reg = Entry.getReg();
- int FI = Entry.getFrameIdx();
- switch (Reg) {
- case ARM::R8:
- case ARM::R9:
- case ARM::R10:
- case ARM::R11:
- case ARM::R12:
- if (STI.splitFramePushPop(MF)) {
- unsigned DwarfReg = MRI->getDwarfRegNum(
- Reg == ARM::R12 ? ARM::RA_AUTH_CODE : Reg, true);
+ if (STI.r11AndLRNotAdjacent(MF) &&
+ RegInfo->getFrameRegister(MF) == ARM::R11) {
+ for (const auto &Entry : CSI) {
+ Register Reg = Entry.getReg();
+ int FI = Entry.getFrameIdx();
+ switch (Reg) {
+ case ARM::R11:
+ case ARM::LR:
+ unsigned DwarfReg = MRI->getDwarfRegNum(Reg, true);
unsigned Offset = MFI.getObjectOffset(FI);
unsigned CFIIndex = MF.addFrameInst(
MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
BuildMI(MBB, Pos, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
.addCFIIndex(CFIIndex)
.setMIFlags(MachineInstr::FrameSetup);
+ break;
+ }
+ }
+ } else {
+ MachineBasicBlock::iterator Pos = std::next(GPRCS2Push);
+ for (const auto &Entry : CSI) {
+ Register Reg = Entry.getReg();
+ int FI = Entry.getFrameIdx();
+ switch (Reg) {
+ case ARM::R8:
+ case ARM::R9:
+ case ARM::R10:
+ case ARM::R11:
+ case ARM::R12:
+ if (STI.splitFramePushPopR7(MF)) {
+ unsigned DwarfReg = MRI->getDwarfRegNum(
+ Reg == ARM::R12 ? ARM::RA_AUTH_CODE : Reg, true);
+ unsigned Offset = MFI.getObjectOffset(FI);
+ unsigned CFIIndex = MF.addFrameInst(
+ MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
+ BuildMI(MBB, Pos, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
+ .addCFIIndex(CFIIndex)
+ .setMIFlags(MachineInstr::FrameSetup);
+ }
+ break;
}
- break;
}
}
}
@@ -1382,7 +1440,8 @@ void ARMFrameLowering::emitEpilogue(MachineFunction &MF,
MachineInstr::FrameDestroy);
// Increment past our save areas.
- if (AFI->getGPRCalleeSavedArea2Size() && STI.splitFramePointerPush(MF))
+ if (AFI->getGPRCalleeSavedArea2Size() &&
+ STI.framePointerRequiredForSEHUnwind(MF))
MBBI++;
if (MBBI != MBB.end() && AFI->getDPRCalleeSavedAreaSize()) {
@@ -1399,7 +1458,8 @@ void ARMFrameLowering::emitEpilogue(MachineFunction &MF,
MachineInstr::FrameDestroy);
}
- if (AFI->getGPRCalleeSavedArea2Size() && !STI.splitFramePointerPush(MF))
+ if (AFI->getGPRCalleeSavedArea2Size() &&
+ !STI.framePointerRequiredForSEHUnwind(MF))
MBBI++;
if (AFI->getGPRCalleeSavedArea1Size()) MBBI++;
@@ -1539,7 +1599,8 @@ void ARMFrameLowering::emitPushInst(MachineBasicBlock &MBB,
unsigned LastReg = 0;
for (; i != 0; --i) {
Register Reg = CSI[i-1].getReg();
- if (!(Func)(Reg, STI.splitFramePushPop(MF))) continue;
+ if (!(Func)(Reg, STI.splitFramePushPopR7(MF)))
+ continue;
// D-registers in the aligned area DPRCS2 are NOT spilled here.
if (Reg >= ARM::D8 && Reg < ARM::D8 + NumAlignedDPRCS2Regs)
@@ -1632,7 +1693,8 @@ void ARMFrameLowering::emitPopInst(MachineBasicBlock &MBB,
for (; i != 0; --i) {
CalleeSavedInfo &Info = CSI[i-1];
Register Reg = Info.getReg();
- if (!(Func)(Reg, STI.splitFramePushPop(MF))) continue;
+ if (!(Func)(Reg, STI.splitFramePushPopR7(MF)))
+ continue;
// The aligned reloads from area DPRCS2 are not inserted here.
if (Reg >= ARM::D8 && Reg < ARM::D8 + NumAlignedDPRCS2Regs)
@@ -1640,7 +1702,7 @@ void ARMFrameLowering::emitPopInst(MachineBasicBlock &MBB,
if (Reg == ARM::LR && !isTailCall && !isVarArg && !isInterrupt &&
!isCmseEntry && !isTrap && AFI->getArgumentStackToRestore() == 0 &&
STI.hasV5TOps() && MBB.succ_empty() && !hasPAC &&
- !STI.splitFramePointerPush(MF)) {
+ !STI.splitFramePushPopR11(MF)) {
Reg = ARM::PC;
// Fold the return instruction into the LDM.
DeleteRet = true;
@@ -2001,7 +2063,7 @@ bool ARMFrameLowering::spillCalleeSavedRegisters(
.addImm(-4)
.add(predOps(ARMCC::AL));
}
- if (STI.splitFramePointerPush(MF)) {
+ if (STI.splitFramePushPopR11(MF)) {
emitPushInst(MBB, MI, CSI, PushOpc, PushOneOpc, false,
&isSplitFPArea1Register, 0, MachineInstr::FrameSetup);
emitPushInst(MBB, MI, CSI, FltOpc, 0, true, &isARMArea3Register,
@@ -2046,7 +2108,7 @@ bool ARMFrameLowering::restoreCalleeSavedRegisters(
unsigned LdrOpc =
AFI->isThumbFunction() ? ARM::t2LDR_POST : ARM::LDR_POST_IMM;
unsigned FltOpc = ARM::VLDMDIA_UPD;
- if (STI.splitFramePointerPush(MF)) {
+ if (STI.splitFramePushPopR11(MF)) {
emitPopInst(MBB, MI, CSI, PopOpc, LdrOpc, isVarArg, false,
&isSplitFPArea2Register, 0);
emitPopInst(MBB, MI, CSI, FltOpc, 0, isVarArg, true, &isARMArea3Register,
@@ -2362,7 +2424,7 @@ void ARMFrameLowering::determineCalleeSaves(MachineFunction &MF,
if (Spilled) {
NumGPRSpills++;
- if (!STI.splitFramePushPop(MF)) {
+ if (!STI.splitFramePushPopR7(MF)) {
if (Reg == ARM::LR)
LRSpilled = true;
CS1Spilled = true;
@@ -2384,7 +2446,7 @@ void ARMFrameLowering::determineCalleeSaves(MachineFunction &MF,
break;
}
} else {
- if (!STI.splitFramePushPop(MF)) {
+ if (!STI.splitFramePushPopR7(MF)) {
UnspilledCS1GPRs.push_back(Reg);
continue;
}
diff --git a/llvm/lib/Target/ARM/ARMSubtarget.cpp b/llvm/lib/Target/ARM/ARMSubtarget.cpp
index 922fa93226f298..cc86afb835d0c4 100644
--- a/llvm/lib/Target/ARM/ARMSubtarget.cpp
+++ b/llvm/lib/Target/ARM/ARMSubtarget.cpp
@@ -494,11 +494,47 @@ bool ARMSubtarget::ignoreCSRForAllocationOrder(const MachineFunction &MF,
ARM::GPRRegClass.contains(PhysReg);
}
-bool ARMSubtarget::splitFramePointerPush(const MachineFunction &MF) const {
+bool ARMSubtarget::r11AndLRNotAdjacent(const MachineFunction &MF) const {
+ const std::vector<CalleeSavedInfo> CSI =
+ MF.getFrameInfo().getCalleeSavedInfo();
+
+ if (CSI.size() > 1 &&
+ MF.getInfo<ARMFunctionInfo>()->shouldSignReturnAddress()) {
+
+ bool r11InCSI = false;
+ bool lrInCSI = false;
+ unsigned long r11Idx = 0;
+ unsigned long lrIdx = 0;
+
+ for (unsigned long i = 0; i < CSI.size(); i++) {
+ if (CSI[i].getReg() == ARM::LR) {
+ lrIdx = i;
+ lrInCSI = true;
+ } else if (CSI[i].getReg() == ARM::R11) {
+ r11Idx = i;
+ r11InCSI = true;
+ }
+ }
+ if (lrIdx + 1 != r11Idx && r11InCSI && lrInCSI)
+ return true;
+ }
+ return false;
+}
+
+bool ARMSubtarget::framePointerRequiredForSEHUnwind(
+ const MachineFunction &MF) const {
const Function &F = MF.getFunction();
+ const std::vector<CalleeSavedInfo> CSI =
+ MF.getFrameInfo().getCalleeSavedInfo();
+
if (!MF.getTarget().getMCAsmInfo()->usesWindowsCFI() ||
!F.needsUnwindTableEntry())
return false;
const MachineFrameInfo &MFI = MF.getFrameInfo();
return MFI.hasVarSizedObjects() || getRegisterInfo()->hasStackRealignment(MF);
}
+
+bool ARMSubtarget::splitFramePushPopR11(const MachineFunction &MF) const {
+ return (r11AndLRNotAdjacent(MF) && getFramePointerReg() == ARM::R11) ||
+ framePointerRequiredForSEHUnwind(MF);
+}
diff --git a/llvm/lib/Target/ARM/ARMSubtarget.h b/llvm/lib/Target/ARM/ARMSubtarget.h
index 91f3978b041a3a..210d8fdb1440ee 100644
--- a/llvm/lib/Target/ARM/ARMSubtarget.h
+++ b/llvm/lib/Target/ARM/ARMSubtarget.h
@@ -441,18 +441,28 @@ class ARMSubtarget : public ARMGenSubtargetInfo {
}
/// Returns true if the frame setup is split into two separate pushes (first
- /// r0-r7,lr then r8-r11), principally so that the frame pointer is adjacent
- /// to lr. This is always required on Thumb1-only targets, as the push and
- /// pop instructions can't access the high registers.
- bool splitFramePushPop(const MachineFunction &MF) const {
- if (MF.getInfo<ARMFunctionInfo>()->shouldSignReturnAddress())
+ /// r0-r7,lr then r8-r11), principally so that the frame pointer r7 is
+ /// adjacent to lr. This is always required on Thumb1-only targets, as the
+ /// push and pop instructions can't access the high registers.
+ bool splitFramePushPopR7(const MachineFunction &MF) const {
+ if (MF.getInfo<ARMFunctionInfo>()->shouldSignReturnAddress() &&
+ !createAAPCSFrameChain())
return true;
return (getFramePointerReg() == ARM::R7 &&
MF.getTarget().Options.DisableFramePointerElim(MF)) ||
isThumb1Only();
}
- bool splitFramePointerPush(const MachineFunction &MF) const;
+ bool framePointerRequiredForSEHUnwind(const MachineFunction &MF) const;
+
+ // Returns true if R11 and lr are not adjacent to each other in the list of
+ // callee saved registers in a frame.
+ bool r11AndLRNotAdjacent(const MachineFunction &MF) const;
+
+ // Returns true if the frame setup is split into two separate pushes (first
+ // r0-r10,r12 then r11,lr), principally so that the frame pointer r11 is
+ // adjacent to lr.
+ bool splitFramePushPopR11(const MachineFunction &MF) const;
bool useStride4VFPs() const;
diff --git a/llvm/lib/Target/ARM/Thumb1FrameLowering.cpp b/llvm/lib/Target/ARM/Thumb1FrameLowering.cpp
index 0f4ece64bff532..0ad7b96453d1c5 100644
--- a/llvm/lib/Target/ARM/Thumb1FrameLowering.cpp
+++ b/llvm/lib/Target/ARM/Thumb1FrameLowering.cpp
@@ -222,7 +222,7 @@ void Thumb1FrameLowering::emitPrologue(MachineFunction &MF,
case ARM::R8:
case ARM::R9:
case ARM::R10:
- if (STI.splitFramePushPop(MF)) {
+ if (STI.splitFramePushPopR7(MF)) {
GPRCS2Size += 4;
break;
}
@@ -366,7 +366,7 @@ void Thumb1FrameLowering::emitPrologue(MachineFunction &MF,
case ARM::R10:
case ARM::R11:
case ARM::R12:
- if (STI.splitFramePushPop(MF))
+ if (STI.splitFramePushPopR7(MF))
break;
[[fallthrough]];
case ARM::R0:
diff --git a/llvm/test/CodeGen/Thumb2/pacbti-m-frame-chain.ll b/llvm/test/CodeGen/Thumb2/pacbti-m-frame-chain.ll
new file mode 100644
index 00000000000000..77759355e576a2
--- /dev/null
+++ b/llvm/test/CodeGen/Thumb2/pacbti-m-frame-chain.ll
@@ -0,0 +1,82 @@
+; RUN: llc -filetype asm -o - %s --frame-pointer=all -mattr=+aapcs-frame-chain -mattr=+aapcs-frame-chain-leaf -force-dwarf-frame-section | FileCheck %s
+target triple = "thumbv8m.main-none-none-eabi"
+
+; int f() {
+; return 0;
+; }
+;
+; int x(int, char *);
+; int y(int n) {
+; char a[n];
+; return 1 + x(n, a);
+; }
+
+define hidden i32 @f() local_unnamed_addr {
+entry:
+ ret i32 0;
+}
+
+define hidden i32 @x(i32 noundef %n) local_unnamed_addr {
+entry:
+ %vla = alloca i8, i32 %n, align 1
+ %call = call i32 @y(i32 noundef %n, ptr noundef nonnull %vla)
+ %add = add nsw i32 %call, 1
+ ret i32 %add
+}
+
+declare dso_local i32 @y(i32 noundef, ptr noundef) local_unnamed_addr
+
+; CHECK-LABEL: f:
+; CHECK: pac r12, lr, sp
+; CHECK-NEXT: .save {ra_auth_code}
+; CHECK-NEXT: str r12, [sp, #-4]!
+; CHECK-NEXT: .cfi_def_cfa_offset 4
+; CHECK-NEXT: .cfi_offset r12, -8
+; CHECK-NEXT: .save {r11, lr}
+; CHECK-NEXT: push.w {r11, lr}
+; CHECK-NEXT: .cfi_offset lr, -4
+; CHECK-NEXT: .cfi_offset r11, -12
+; CHECK-NEXT: .setfp r11, sp
+; CHECK-NEXT: mov r11, sp
+; CHECK-NEXT: .cfi_def_cfa r11, 12
+; CHECK-NEXT: m...
[truncated]
|
llvm/lib/Target/ARM/ARMSubtarget.h
Outdated
// Returns true if the frame setup is split into two separate pushes (first | ||
// r0-r10,r12 then r11,lr), principally so that the frame pointer r11 is | ||
// adjacent to lr. | ||
bool splitFramePushPopR11(const MachineFunction &MF) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to introduce an enum of some sort here? Having a bunch of separate boolean methods representing different variations is confusing.
As far as I can tell, the following variations exist:
- The normal push: push r4-r11+lr all at once
- The Thumb1 split push: push r4-r7+lr, then r8-r11
- The Thumb1 r11 frame chain split push: push lr, then r11, then r0-r7, then r8-r11.
- The branch signing unsplit push: push r4-r12+lr all at once.
- The branch signing split push: push r4-r7+lr, then r8-r12
- The branch signing r11 frame chain split push: push r4-r10+r12, then r11+lr
- The Windows SEH split push: push r4-r10, then r11+lr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that using an enum is a better option. I think having a separate enum value for each push variation may be overkill, as many push variations are implemented in the same way by the frame lowering files. For example, the normal push and branch signing unsplit push are handled the same way, there is just an extra register in the latter variation. It might therefore be a better option to have an enum with values based around which register, if any, causes the register push to be split, instead of a value for each specific push variation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds fine.
llvm/lib/Target/ARM/ARMSubtarget.h
Outdated
|
||
// Returns true if R11 and lr are not adjacent to each other in the list of | ||
// callee saved registers in a frame. | ||
bool r11AndLRNotAdjacent(const MachineFunction &MF) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method doesn't seem to do what the description says... it unconditionally returns false when signing is disabled, but there are other reasons they might not be adjacent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method has been deleted, but functionality moved to the getPushPopSpiltVariation method. The description for this functionality has been updated to be more accurate.
llvm/lib/Target/ARM/ARMSubtarget.h
Outdated
return true; | ||
return (getFramePointerReg() == ARM::R7 && | ||
MF.getTarget().Options.DisableFramePointerElim(MF)) || | ||
isThumb1Only(); | ||
} | ||
|
||
bool splitFramePointerPush(const MachineFunction &MF) const; | ||
bool framePointerRequiredForSEHUnwind(const MachineFunction &MF) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"framePointerRequiredForSEHUnwind" doesn't describe what the actual effect is here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method name has now been deleted.
…hes should be split with a single enum.
llvm/lib/Target/ARM/ARMSubtarget.h
Outdated
/// r4-r11+lr (+r12 if necessary) can be pushed in a single instruction. | ||
NoSplit, | ||
/// The registers need to be split into a push of r4-r7+lr and another | ||
/// containing r8-r11 (+r12 if necessary). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth noting the interaction between R7Split and the Thumb1 AAPCS split push (HasFrameRecordArea).
llvm/lib/Target/ARM/ARMSubtarget.cpp
Outdated
if ((MF.getInfo<ARMFunctionInfo>()->shouldSignReturnAddress() && | ||
!createAAPCSFrameChain()) || | ||
((getFramePointerReg() == ARM::R7 && | ||
MF.getTarget().Options.DisableFramePointerElim(MF)) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MF.getTarget().Options.DisableFramePointerElim(MF)) || | |
MF.getTarget().Options.DisableFramePointerElim(MF))) || |
@@ -222,7 +222,8 @@ void Thumb1FrameLowering::emitPrologue(MachineFunction &MF, | |||
case ARM::R8: | |||
case ARM::R9: | |||
case ARM::R10: | |||
if (STI.splitFramePushPop(MF)) { | |||
if (STI.getPushPopSplitVariation(MF) == | |||
ARMSubtarget::PushPopSplitVariation::R7Split) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is Thumb1FrameLowering... isn't this always true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like Thumb1FrameLowering is initialized only when isThumb1Only() is true, so the variation will always be R7Split. Therefore the if statement is unnecessary.
@@ -963,13 +976,18 @@ void ARMFrameLowering::emitPrologue(MachineFunction &MF, | |||
} else | |||
NumBytes = DPRCSOffset; | |||
|
|||
if (GPRCS2Size > 0 && STI.splitFramePointerPush(MF)) { | |||
// Move past area 2 if following Win_AAPCS_CFGuard calling convention. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment doesn't look right? Functions defined with the CFGuard_Check should be extremely rare, if they show up at all; it's used to model the behavior of runtime routines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a mistake on my end, thanks for catching it. The comment should refer to the CSR_Win_SplitFP calling convention.
llvm/lib/Target/ARM/ARMSubtarget.cpp
Outdated
// Returns R11SplitAAPCSBranchSigning if R11 and lr are not adjacent to each | ||
// other in the list of callee saved registers in a frame, and branch | ||
// signing is enabled. | ||
if (CSI.size() > 1 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simplify this code? It seems sort of awkward to dig through the callee-saves here. And given the normal way this works, the only register that can show up between R11 and LR (R14), is R12.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can probably be simplified. All that needs to be checked is if R11 is the frame pointer and R12 is in the callee saved register list. These are both already checked in the if statement, so I think the rest of the code is unnecessary. The rest of the code was there as if R11 and LR were next to each other, only one push instruction is needed. However, with the way the callee saved register list is constructed when R12 is required, I don't think this is possible.
case ARM::R10: | ||
case ARM::R11: | ||
case ARM::R12: | ||
if (STI.getPushPopSplitVariation(MF) == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it actually possible for this condition to be false? We know GPRCS2Size is non-zero, so it's not unsplit. We're not on Windows, so it's not the SEH split. And the new code explicitly checks for the branch signing split. So by process of elimination, this must be R7Split.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that reasoning is correct. I'll remove the if statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
… AAPCS frame chain fix (llvm#82801)" This reverts commit 00e4a41. This patch was found to cause miscompilations and compilation failures.
We have two different ways of splitting the pushes of callee-saved registers onto the stack, controlled by the confusingly similar names STI.splitFramePushPop() and STI.splitFramePointerPush(). This removes those functions and replaces them with a single function which returns an enum. This is in preparation for adding another value to that enum. The original work of this patch was done by James Westwood, reviewed as llvm#82801 and llvm#81249, with some tidy-ups done by Mark Murray and myself.
When using AAPCS-compliant frame chains with PACBTI return address signing, there ware a number of bugs in the generation of the frame pointer and function prologues. The most obvious was that we sometimes would modify r11 before pushing it to the stack, so it wasn't preserved as required by the PCS. We also sometimes did not push R11 and LR adjacent to one another on the stack, or used R11 as a frame pointer without pointing it at the saved value of R11, both of which are required to have an AAPCS compliant frame chain. The original work of this patch was done by James Westwood, reviewed as llvm#82801 and llvm#81249, with some tidy-ups done by Mark Murray and myself.
We have two different ways of splitting the pushes of callee-saved registers onto the stack, controlled by the confusingly similar names STI.splitFramePushPop() and STI.splitFramePointerPush(). This removes those functions and replaces them with a single function which returns an enum. This is in preparation for adding another value to that enum. The original work of this patch was done by James Westwood, reviewed as #82801 and #81249, with some tidy-ups done by Mark Murray and myself.
When using AAPCS-compliant frame chains with PACBTI return address signing, there ware a number of bugs in the generation of the frame pointer and function prologues. The most obvious was that we sometimes would modify r11 before pushing it to the stack, so it wasn't preserved as required by the PCS. We also sometimes did not push R11 and LR adjacent to one another on the stack, or used R11 as a frame pointer without pointing it at the saved value of R11, both of which are required to have an AAPCS compliant frame chain. The original work of this patch was done by James Westwood, reviewed as llvm#82801 and llvm#81249, with some tidy-ups done by Mark Murray and myself.
When using AAPCS-compliant frame chains with PACBTI return address signing, there ware a number of bugs in the generation of the frame pointer and function prologues. The most obvious was that we sometimes would modify r11 before pushing it to the stack, so it wasn't preserved as required by the PCS. We also sometimes did not push R11 and LR adjacent to one another on the stack, or used R11 as a frame pointer without pointing it at the saved value of R11, both of which are required to have an AAPCS compliant frame chain. The original work of this patch was done by James Westwood, reviewed as #82801 and #81249, with some tidy-ups done by Mark Murray and myself.
When using AAPCS-compliant frame chains with PACBTI return address signing, there ware a number of bugs in the generation of the frame pointer and function prologues. The most obvious was that we sometimes would modify r11 before pushing it to the stack, so it wasn't preserved as required by the PCS. We also sometimes did not push R11 and LR adjacent to one another on the stack, or used R11 as a frame pointer without pointing it at the saved value of R11, both of which are required to have an AAPCS compliant frame chain. The original work of this patch was done by James Westwood, reviewed as #82801 and #81249, with some tidy-ups done by Mark Murray and myself.
When using AAPCS-compliant frame chains with PACBTI return address signing, there ware a number of bugs in the generation of the frame pointer and function prologues. The most obvious was that we sometimes would modify r11 before pushing it to the stack, so it wasn't preserved as required by the PCS. We also sometimes did not push R11 and LR adjacent to one another on the stack, or used R11 as a frame pointer without pointing it at the saved value of R11, both of which are required to have an AAPCS compliant frame chain. The original work of this patch was done by James Westwood, reviewed as llvm#82801 and llvm#81249, with some tidy-ups done by Mark Murray and myself.
When code for M class architecture was compiled with AAPCS and PAC enabled, the frame pointer, r11, was not pushed to the stack adjacent to the link register. Due to PAC being enabled, r12 was placed between r11 and lr. This patch fixes this by adding an extra case to the already existing code that splits the GPR push in two when R11 is the frame pointer and certain paremeters are met. The differential revision for this previous change can be found here: https://reviews.llvm.org/D125649. This now ensures that r11 and lr are pushed in a separate push instruction to the other GPRs when PAC and AAPCS are enabled, meaning the frame pointer and link register are now pushed onto the stack adjacent to each other.