[lldb] Allow fetching of RA register when above fault handler #98566

jasonmolenda · 2024-07-12T00:14:06Z

In RegisterContextUnwind::SavedLocationForRegister we have special logic for retrieving the Return Address register when it has the caller's return address in it. An example would be the lr register on AArch64.

This register is never retrieved from a newer stack frame because it is necessarly overwritten by a normal ABI function call. We allow frame 0 to provide its lr value to get the caller's return address, if it has not been overwritten/saved to stack yet.

When a function is interrupted asynchronously by a POSIX signal (sigtramp), or a fault handler more generally, the sigtramp/fault handler has the entire register context available. In this situation, if the fault handler is frame 0, the function that was async interrupted is frame 1 and frame 2's return address may still be stored in lr. We need to get the lr value for frame 1 from the fault handler in frame 0, to get the return address for frame 2.

Without this fix, a frameless function that faults in a firmware environment (that's where we've seen this issue most commonly) hasn't spilled lr to stack, so we need to retrieve it from the fault handler's full-register-context to find the caller of the frameless function that faulted.

It's an unsurprising fix, all of the work was finding exactly where in RegisterContextUnwind we were only allowing RA register use for frame 0, when it should have been frame 0 or above a fault handler function.

rdar://127518945

In RegisterContextUnwind::SavedLocationForRegister we have special logic for retrieving the Return Address register when it has the caller's return address in it. An example would be the lr register on AArch64. This register is never retrieved from a newer stack frame because it is necessarly overwritten by a normal ABI function call. We allow frame 0 to provide its lr value to get the caller's return address, if it has not been overwritten/saved to stack yet. When a function is interrupted asynchronously by a POSIX signal (sigtramp), or a fault handler more generally, the sigtramp/fault handler has the entire register context available. In this situation, if the fault handler is frame 0, the function that was async interrupted is frame 1 and frame 2's return address may still be stored in lr. We need to get the lr value for frame 1 from the fault handler in frame 0, to get the return address for frame 2. Without this fix, a frameless function that faults in a firmware environment (that's where we've seen this issue most commonly) hasn't spilled lr to stack, so we need to retrieve it from the fault handler's full-register-context to find the caller of the frameless function that faulted. It's an unsurprising fix, all of the work was finding exactly where in RegisterContextUnwind we were only allowing RA register use for frame 0, when it should have been frame 0 or above a fault handler function. rdar://127518945

llvmbot · 2024-07-12T00:14:40Z

@llvm/pr-subscribers-lldb

Author: Jason Molenda (jasonmolenda)

Changes

In RegisterContextUnwind::SavedLocationForRegister we have special logic for retrieving the Return Address register when it has the caller's return address in it. An example would be the lr register on AArch64.

This register is never retrieved from a newer stack frame because it is necessarly overwritten by a normal ABI function call. We allow frame 0 to provide its lr value to get the caller's return address, if it has not been overwritten/saved to stack yet.

When a function is interrupted asynchronously by a POSIX signal (sigtramp), or a fault handler more generally, the sigtramp/fault handler has the entire register context available. In this situation, if the fault handler is frame 0, the function that was async interrupted is frame 1 and frame 2's return address may still be stored in lr. We need to get the lr value for frame 1 from the fault handler in frame 0, to get the return address for frame 2.

Without this fix, a frameless function that faults in a firmware environment (that's where we've seen this issue most commonly) hasn't spilled lr to stack, so we need to retrieve it from the fault handler's full-register-context to find the caller of the frameless function that faulted.

It's an unsurprising fix, all of the work was finding exactly where in RegisterContextUnwind we were only allowing RA register use for frame 0, when it should have been frame 0 or above a fault handler function.

rdar://127518945

Full diff: https://github.com/llvm/llvm-project/pull/98566.diff

1 Files Affected:

(modified) lldb/source/Target/RegisterContextUnwind.cpp (+1-1)

diff --git a/lldb/source/Target/RegisterContextUnwind.cpp b/lldb/source/Target/RegisterContextUnwind.cpp
index 95e8abd763d53..bc8081f4e3b31 100644
--- a/lldb/source/Target/RegisterContextUnwind.cpp
+++ b/lldb/source/Target/RegisterContextUnwind.cpp
@@ -1401,7 +1401,7 @@ RegisterContextUnwind::SavedLocationForRegister(
       // it's still live in the actual register. Handle this specially.
 
       if (!have_unwindplan_regloc && return_address_reg.IsValid() &&
-          IsFrameZero()) {
+          BehavesLikeZerothFrame()) {
         if (return_address_reg.GetAsKind(eRegisterKindLLDB) !=
             LLDB_INVALID_REGNUM) {
           lldb_private::UnwindLLDB::RegisterLocation new_regloc;

DavidSpickett · 2024-07-12T11:07:41Z

Without this fix, a frameless function that faults in a firmware environment (that's where we've seen this issue most commonly) hasn't spilled lr to stack, so we need to retrieve it from the fault handler's full-register-context to find the caller of the frameless function that faulted.

So the difference between being interrupted and making a function call is that the latter allows you to store the link register then make the call. A signal may come in at any time, so there may be no saved lr in the frame record at the time the interrupt happens.

And this fix means specifically that if you're inside the function that was interrupted, we will read its lr from the fault handler context?

Sounds good to me.

Testing this is in theory possible, tricky bit is guaranteeing a frameless function. There is the naked attribute but it's not portable https://godbolt.org/z/s9117Gr7a. Or you could write the function in an assembly file, or define and call it inside an inline assembly block, inside a normal C function. That function would branch to self waiting for SIGALRM for example.

Maybe that has its own problems, I haven't tried it. Maybe it wouldn't generate enough debug info for us to know that the assembly function was there?

jasonmolenda · 2024-07-12T17:25:20Z

Testing this is in theory possible, tricky bit is guaranteeing a frameless function. There is the naked attribute but it's not portable https://godbolt.org/z/s9117Gr7a. Or you could write the function in an assembly file, or define and call it inside an inline assembly block, inside a normal C function. That function would branch to self waiting for SIGALRM for example.

Maybe that has its own problems, I haven't tried it. Maybe it wouldn't generate enough debug info for us to know that the assembly function was there?

There's (I'd argue) three parts to the unwind system. First is converting the different unwind info formats (eh_frame, debug_frame, compact unwind, arm idx, assembly instruciton scanning) into the intermediate representation of UnwindPlans. Second is the unwind engine itself, which encodes rules about which type of unwind plan to use for a given stack frame, which registers can be passed up the stack, and rules about behavior on the 0th frame or above a fault handler/sigtramp. And third are correctly fetching the register value for a row in an UnwindPlan (often, dereferencing memory offset from the Canonical Frame Address which is set in terms of another register most often) -- these often end up being dwarf expressions.

That middle bit, the unwind engine logic, is hard to test today without making hand-written assembly programs that set up specific unwind scenarios with metadata (.cfi directives) about what they've done. Source level tests are at the mercy of compiler codegen and not stable, or requires capturing a corefile and object binary when the necessary conditions are achieved.

But here's the idea I had the other day. With a scripted process, an ObjectFileJSON to create a fake binary with function start addresses, and a way to specify UnwindPlans for those functions, where all of the register rules would be "and the value of fp is " instead of "read stack memory to get the value of fp", I bet there's a way we could write unwind engine tests entirely in these terms. And honest, the unwind engine method has a lot of very tricky corner cases and because it's not directly tested, it's easy to make mistakes - I am genuinely not thrilled about the state of it. And without strong test infrastructure, it's going to be very intimidating to try to rewrite if anyone wanted to do that some day.

This is only "shower thoughts" level detail, but it's the first time I can see a testing strategy that I actually think could work well.

…8566) In RegisterContextUnwind::SavedLocationForRegister we have special logic for retrieving the Return Address register when it has the caller's return address in it. An example would be the lr register on AArch64. This register is never retrieved from a newer stack frame because it is necessarly overwritten by a normal ABI function call. We allow frame 0 to provide its lr value to get the caller's return address, if it has not been overwritten/saved to stack yet. When a function is interrupted asynchronously by a POSIX signal (sigtramp), or a fault handler more generally, the sigtramp/fault handler has the entire register context available. In this situation, if the fault handler is frame 0, the function that was async interrupted is frame 1 and frame 2's return address may still be stored in lr. We need to get the lr value for frame 1 from the fault handler in frame 0, to get the return address for frame 2. Without this fix, a frameless function that faults in a firmware environment (that's where we've seen this issue most commonly) hasn't spilled lr to stack, so we need to retrieve it from the fault handler's full-register-context to find the caller of the frameless function that faulted. It's an unsurprising fix, all of the work was finding exactly where in RegisterContextUnwind we were only allowing RA register use for frame 0, when it should have been frame 0 or above a fault handler function. rdar://127518945 (cherry picked from commit fd42417)

…8566) In RegisterContextUnwind::SavedLocationForRegister we have special logic for retrieving the Return Address register when it has the caller's return address in it. An example would be the lr register on AArch64. This register is never retrieved from a newer stack frame because it is necessarly overwritten by a normal ABI function call. We allow frame 0 to provide its lr value to get the caller's return address, if it has not been overwritten/saved to stack yet. When a function is interrupted asynchronously by a POSIX signal (sigtramp), or a fault handler more generally, the sigtramp/fault handler has the entire register context available. In this situation, if the fault handler is frame 0, the function that was async interrupted is frame 1 and frame 2's return address may still be stored in lr. We need to get the lr value for frame 1 from the fault handler in frame 0, to get the return address for frame 2. Without this fix, a frameless function that faults in a firmware environment (that's where we've seen this issue most commonly) hasn't spilled lr to stack, so we need to retrieve it from the fault handler's full-register-context to find the caller of the frameless function that faulted. It's an unsurprising fix, all of the work was finding exactly where in RegisterContextUnwind we were only allowing RA register use for frame 0, when it should have been frame 0 or above a fault handler function. rdar://127518945

…nction-fault-backtrace-20230725 [lldb] Allow fetching of RA register when above fault handler (llvm#98566)

…nction-fault-backtrace-6.0 [lldb] Allow fetching of RA register when above fault handler (llvm#98566)

…#98566)" This reverts commit fd42417. This patch has two problems. First, it is unnecessary, Pavel landed a fix a week or so before mine which solves this problem in bbd54e0 . Second, the fix is incorrect; for a function above a trap handler, where all registers are available, this patch would have lldb fetch the return address register from frame 0. This might be 10 frames up in the stack; the frame 0 return address register is incorrect. The change would have been correct a short bit later than this, but Pavel's fix is executed earlier in the function and none of this is needed.

…llvm#98566)" This reverts commit fd42417. This patch has two problems. First, it is unnecessary, Pavel landed a fix a week or so before mine which solves this problem in bbd54e0 . Second, the fix is incorrect; for a function above a trap handler, where all registers are available, this patch would have lldb fetch the return address register from frame 0. This might be 10 frames up in the stack; the frame 0 return address register is incorrect. The change would have been correct a short bit later than this, but Pavel's fix is executed earlier in the function and none of this is needed. (cherry picked from commit d29a50f)

…e0-lr-on-leafless-frame-that-trapped Revert "[lldb] Allow fetching of RA register when above fault handler(llvm#98566)"

jasonmolenda requested a review from JDevlieghere as a code owner July 12, 2024 00:14

llvmbot added the lldb label Jul 12, 2024

JDevlieghere approved these changes Jul 12, 2024

View reviewed changes

jasonmolenda merged commit fd42417 into llvm:main Jul 12, 2024
8 checks passed

jasonmolenda deleted the allow-fetch-of-lr-register-above-fault-handler branch July 12, 2024 17:44

jasonmolenda added a commit to swiftlang/llvm-project that referenced this pull request Jul 15, 2024

Merge pull request #8976 from jasonmolenda/cp/r127518945-frameless-fu…

188df9b

…nction-fault-backtrace-20230725 [lldb] Allow fetching of RA register when above fault handler (llvm#98566)

JDevlieghere added a commit to swiftlang/llvm-project that referenced this pull request Jul 15, 2024

Merge pull request #8977 from jasonmolenda/cp/r127518945-frameless-fu…

febd42d

…nction-fault-backtrace-6.0 [lldb] Allow fetching of RA register when above fault handler (llvm#98566)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lldb] Allow fetching of RA register when above fault handler #98566

[lldb] Allow fetching of RA register when above fault handler #98566

jasonmolenda commented Jul 12, 2024

llvmbot commented Jul 12, 2024

DavidSpickett commented Jul 12, 2024 •

edited

Loading

jasonmolenda commented Jul 12, 2024

[lldb] Allow fetching of RA register when above fault handler #98566

[lldb] Allow fetching of RA register when above fault handler #98566

Conversation

jasonmolenda commented Jul 12, 2024

llvmbot commented Jul 12, 2024

DavidSpickett commented Jul 12, 2024 • edited Loading

jasonmolenda commented Jul 12, 2024

DavidSpickett commented Jul 12, 2024 •

edited

Loading