Skip to content

[lldb] Allow fetching of RA register when above fault handler #98566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jasonmolenda
Copy link
Collaborator

In RegisterContextUnwind::SavedLocationForRegister we have special logic for retrieving the Return Address register when it has the caller's return address in it. An example would be the lr register on AArch64.

This register is never retrieved from a newer stack frame because it is necessarly overwritten by a normal ABI function call. We allow frame 0 to provide its lr value to get the caller's return address, if it has not been overwritten/saved to stack yet.

When a function is interrupted asynchronously by a POSIX signal (sigtramp), or a fault handler more generally, the sigtramp/fault handler has the entire register context available. In this situation, if the fault handler is frame 0, the function that was async interrupted is frame 1 and frame 2's return address may still be stored in lr. We need to get the lr value for frame 1 from the fault handler in frame 0, to get the return address for frame 2.

Without this fix, a frameless function that faults in a firmware environment (that's where we've seen this issue most commonly) hasn't spilled lr to stack, so we need to retrieve it from the fault handler's full-register-context to find the caller of the frameless function that faulted.

It's an unsurprising fix, all of the work was finding exactly where in RegisterContextUnwind we were only allowing RA register use for frame 0, when it should have been frame 0 or above a fault handler function.

rdar://127518945

In RegisterContextUnwind::SavedLocationForRegister we have special
logic for retrieving the Return Address register when it has
the caller's return address in it. An example would be the lr
register on AArch64.

This register is never retrieved from a newer stack frame because
it is necessarly overwritten by a normal ABI function call.  We
allow frame 0 to provide its lr value to get the caller's return
address, if it has not been overwritten/saved to stack yet.

When a function is interrupted asynchronously by a POSIX signal
(sigtramp), or a fault handler more generally, the sigtramp/fault
handler has the entire register context available. In this situation,
if the fault handler is frame 0, the function that was async
interrupted is frame 1 and frame 2's return address may still be
stored in lr.  We need to get the lr value for frame 1 from the
fault handler in frame 0, to get the return address for frame 2.

Without this fix, a frameless function that faults in a firmware
environment (that's where we've seen this issue most commonly)
hasn't spilled lr to stack, so we need to retrieve it from the
fault handler's full-register-context to find the caller of the
frameless function that faulted.

It's an unsurprising fix, all of the work was finding exactly where
in RegisterContextUnwind we were only allowing RA register use for
frame 0, when it should have been frame 0 or above a fault handler
function.

rdar://127518945
@llvmbot
Copy link
Member

llvmbot commented Jul 12, 2024

@llvm/pr-subscribers-lldb

Author: Jason Molenda (jasonmolenda)

Changes

In RegisterContextUnwind::SavedLocationForRegister we have special logic for retrieving the Return Address register when it has the caller's return address in it. An example would be the lr register on AArch64.

This register is never retrieved from a newer stack frame because it is necessarly overwritten by a normal ABI function call. We allow frame 0 to provide its lr value to get the caller's return address, if it has not been overwritten/saved to stack yet.

When a function is interrupted asynchronously by a POSIX signal (sigtramp), or a fault handler more generally, the sigtramp/fault handler has the entire register context available. In this situation, if the fault handler is frame 0, the function that was async interrupted is frame 1 and frame 2's return address may still be stored in lr. We need to get the lr value for frame 1 from the fault handler in frame 0, to get the return address for frame 2.

Without this fix, a frameless function that faults in a firmware environment (that's where we've seen this issue most commonly) hasn't spilled lr to stack, so we need to retrieve it from the fault handler's full-register-context to find the caller of the frameless function that faulted.

It's an unsurprising fix, all of the work was finding exactly where in RegisterContextUnwind we were only allowing RA register use for frame 0, when it should have been frame 0 or above a fault handler function.

rdar://127518945


Full diff: https://github.com/llvm/llvm-project/pull/98566.diff

1 Files Affected:

  • (modified) lldb/source/Target/RegisterContextUnwind.cpp (+1-1)
diff --git a/lldb/source/Target/RegisterContextUnwind.cpp b/lldb/source/Target/RegisterContextUnwind.cpp
index 95e8abd763d53..bc8081f4e3b31 100644
--- a/lldb/source/Target/RegisterContextUnwind.cpp
+++ b/lldb/source/Target/RegisterContextUnwind.cpp
@@ -1401,7 +1401,7 @@ RegisterContextUnwind::SavedLocationForRegister(
       // it's still live in the actual register. Handle this specially.
 
       if (!have_unwindplan_regloc && return_address_reg.IsValid() &&
-          IsFrameZero()) {
+          BehavesLikeZerothFrame()) {
         if (return_address_reg.GetAsKind(eRegisterKindLLDB) !=
             LLDB_INVALID_REGNUM) {
           lldb_private::UnwindLLDB::RegisterLocation new_regloc;

@DavidSpickett
Copy link
Collaborator

DavidSpickett commented Jul 12, 2024

Without this fix, a frameless function that faults in a firmware environment (that's where we've seen this issue most commonly) hasn't spilled lr to stack, so we need to retrieve it from the fault handler's full-register-context to find the caller of the frameless function that faulted.

So the difference between being interrupted and making a function call is that the latter allows you to store the link register then make the call. A signal may come in at any time, so there may be no saved lr in the frame record at the time the interrupt happens.

And this fix means specifically that if you're inside the function that was interrupted, we will read its lr from the fault handler context?

Sounds good to me.

Testing this is in theory possible, tricky bit is guaranteeing a frameless function. There is the naked attribute but it's not portable https://godbolt.org/z/s9117Gr7a. Or you could write the function in an assembly file, or define and call it inside an inline assembly block, inside a normal C function. That function would branch to self waiting for SIGALRM for example.

Maybe that has its own problems, I haven't tried it. Maybe it wouldn't generate enough debug info for us to know that the assembly function was there?

@jasonmolenda
Copy link
Collaborator Author

Testing this is in theory possible, tricky bit is guaranteeing a frameless function. There is the naked attribute but it's not portable https://godbolt.org/z/s9117Gr7a. Or you could write the function in an assembly file, or define and call it inside an inline assembly block, inside a normal C function. That function would branch to self waiting for SIGALRM for example.

Maybe that has its own problems, I haven't tried it. Maybe it wouldn't generate enough debug info for us to know that the assembly function was there?

There's (I'd argue) three parts to the unwind system. First is converting the different unwind info formats (eh_frame, debug_frame, compact unwind, arm idx, assembly instruciton scanning) into the intermediate representation of UnwindPlans. Second is the unwind engine itself, which encodes rules about which type of unwind plan to use for a given stack frame, which registers can be passed up the stack, and rules about behavior on the 0th frame or above a fault handler/sigtramp. And third are correctly fetching the register value for a row in an UnwindPlan (often, dereferencing memory offset from the Canonical Frame Address which is set in terms of another register most often) -- these often end up being dwarf expressions.

That middle bit, the unwind engine logic, is hard to test today without making hand-written assembly programs that set up specific unwind scenarios with metadata (.cfi directives) about what they've done. Source level tests are at the mercy of compiler codegen and not stable, or requires capturing a corefile and object binary when the necessary conditions are achieved.

But here's the idea I had the other day. With a scripted process, an ObjectFileJSON to create a fake binary with function start addresses, and a way to specify UnwindPlans for those functions, where all of the register rules would be "and the value of fp is " instead of "read stack memory to get the value of fp", I bet there's a way we could write unwind engine tests entirely in these terms. And honest, the unwind engine method has a lot of very tricky corner cases and because it's not directly tested, it's easy to make mistakes - I am genuinely not thrilled about the state of it. And without strong test infrastructure, it's going to be very intimidating to try to rewrite if anyone wanted to do that some day.

This is only "shower thoughts" level detail, but it's the first time I can see a testing strategy that I actually think could work well.

@jasonmolenda jasonmolenda merged commit fd42417 into llvm:main Jul 12, 2024
8 checks passed
@jasonmolenda jasonmolenda deleted the allow-fetch-of-lr-register-above-fault-handler branch July 12, 2024 17:44
jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Jul 12, 2024
…8566)

In RegisterContextUnwind::SavedLocationForRegister we have special logic
for retrieving the Return Address register when it has the caller's
return address in it. An example would be the lr register on AArch64.

This register is never retrieved from a newer stack frame because it is
necessarly overwritten by a normal ABI function call. We allow frame 0
to provide its lr value to get the caller's return address, if it has
not been overwritten/saved to stack yet.

When a function is interrupted asynchronously by a POSIX signal
(sigtramp), or a fault handler more generally, the sigtramp/fault
handler has the entire register context available. In this situation, if
the fault handler is frame 0, the function that was async interrupted is
frame 1 and frame 2's return address may still be stored in lr. We need
to get the lr value for frame 1 from the fault handler in frame 0, to
get the return address for frame 2.

Without this fix, a frameless function that faults in a firmware
environment (that's where we've seen this issue most commonly) hasn't
spilled lr to stack, so we need to retrieve it from the fault handler's
full-register-context to find the caller of the frameless function that
faulted.

It's an unsurprising fix, all of the work was finding exactly where in
RegisterContextUnwind we were only allowing RA register use for frame 0,
when it should have been frame 0 or above a fault handler function.

rdar://127518945
(cherry picked from commit fd42417)
jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Jul 12, 2024
…8566)

In RegisterContextUnwind::SavedLocationForRegister we have special logic
for retrieving the Return Address register when it has the caller's
return address in it. An example would be the lr register on AArch64.

This register is never retrieved from a newer stack frame because it is
necessarly overwritten by a normal ABI function call. We allow frame 0
to provide its lr value to get the caller's return address, if it has
not been overwritten/saved to stack yet.

When a function is interrupted asynchronously by a POSIX signal
(sigtramp), or a fault handler more generally, the sigtramp/fault
handler has the entire register context available. In this situation, if
the fault handler is frame 0, the function that was async interrupted is
frame 1 and frame 2's return address may still be stored in lr. We need
to get the lr value for frame 1 from the fault handler in frame 0, to
get the return address for frame 2.

Without this fix, a frameless function that faults in a firmware
environment (that's where we've seen this issue most commonly) hasn't
spilled lr to stack, so we need to retrieve it from the fault handler's
full-register-context to find the caller of the frameless function that
faulted.

It's an unsurprising fix, all of the work was finding exactly where in
RegisterContextUnwind we were only allowing RA register use for frame 0,
when it should have been frame 0 or above a fault handler function.

rdar://127518945
(cherry picked from commit fd42417)
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
…8566)

In RegisterContextUnwind::SavedLocationForRegister we have special logic
for retrieving the Return Address register when it has the caller's
return address in it. An example would be the lr register on AArch64.

This register is never retrieved from a newer stack frame because it is
necessarly overwritten by a normal ABI function call. We allow frame 0
to provide its lr value to get the caller's return address, if it has
not been overwritten/saved to stack yet.

When a function is interrupted asynchronously by a POSIX signal
(sigtramp), or a fault handler more generally, the sigtramp/fault
handler has the entire register context available. In this situation, if
the fault handler is frame 0, the function that was async interrupted is
frame 1 and frame 2's return address may still be stored in lr. We need
to get the lr value for frame 1 from the fault handler in frame 0, to
get the return address for frame 2.

Without this fix, a frameless function that faults in a firmware
environment (that's where we've seen this issue most commonly) hasn't
spilled lr to stack, so we need to retrieve it from the fault handler's
full-register-context to find the caller of the frameless function that
faulted.

It's an unsurprising fix, all of the work was finding exactly where in
RegisterContextUnwind we were only allowing RA register use for frame 0,
when it should have been frame 0 or above a fault handler function.

rdar://127518945
jasonmolenda added a commit to swiftlang/llvm-project that referenced this pull request Jul 15, 2024
…nction-fault-backtrace-20230725

[lldb] Allow fetching of RA register when above fault handler (llvm#98566)
JDevlieghere added a commit to swiftlang/llvm-project that referenced this pull request Jul 15, 2024
…nction-fault-backtrace-6.0

[lldb] Allow fetching of RA register when above fault handler (llvm#98566)
jasonmolenda added a commit that referenced this pull request Nov 20, 2024
…#98566)"

This reverts commit fd42417.

This patch has two problems.  First, it is unnecessary, Pavel landed
a fix a week or so before mine which solves this problem in
bbd54e0 .  Second, the fix is
incorrect; for a function above a trap handler, where all registers
are available, this patch would have lldb fetch the return address
register from frame 0.  This might be 10 frames up in the stack;
the frame 0 return address register is incorrect.  The change would
have been correct a short bit later than this, but Pavel's fix is
executed earlier in the function and none of this is needed.
jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Nov 20, 2024
…llvm#98566)"

This reverts commit fd42417.

This patch has two problems.  First, it is unnecessary, Pavel landed
a fix a week or so before mine which solves this problem in
bbd54e0 .  Second, the fix is
incorrect; for a function above a trap handler, where all registers
are available, this patch would have lldb fetch the return address
register from frame 0.  This might be 10 frames up in the stack;
the frame 0 return address register is incorrect.  The change would
have been correct a short bit later than this, but Pavel's fix is
executed earlier in the function and none of this is needed.

(cherry picked from commit d29a50f)
jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Jan 21, 2025
…llvm#98566)"

This reverts commit fd42417.

This patch has two problems.  First, it is unnecessary, Pavel landed
a fix a week or so before mine which solves this problem in
bbd54e0 .  Second, the fix is
incorrect; for a function above a trap handler, where all registers
are available, this patch would have lldb fetch the return address
register from frame 0.  This might be 10 frames up in the stack;
the frame 0 return address register is incorrect.  The change would
have been correct a short bit later than this, but Pavel's fix is
executed earlier in the function and none of this is needed.

(cherry picked from commit d29a50f)
JDevlieghere added a commit to swiftlang/llvm-project that referenced this pull request Jan 22, 2025
…e0-lr-on-leafless-frame-that-trapped

Revert "[lldb] Allow fetching of RA register when above fault handler(llvm#98566)"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants