-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[lldb] Allow fetching of RA register when above fault handler #98566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lldb] Allow fetching of RA register when above fault handler #98566
Conversation
In RegisterContextUnwind::SavedLocationForRegister we have special logic for retrieving the Return Address register when it has the caller's return address in it. An example would be the lr register on AArch64. This register is never retrieved from a newer stack frame because it is necessarly overwritten by a normal ABI function call. We allow frame 0 to provide its lr value to get the caller's return address, if it has not been overwritten/saved to stack yet. When a function is interrupted asynchronously by a POSIX signal (sigtramp), or a fault handler more generally, the sigtramp/fault handler has the entire register context available. In this situation, if the fault handler is frame 0, the function that was async interrupted is frame 1 and frame 2's return address may still be stored in lr. We need to get the lr value for frame 1 from the fault handler in frame 0, to get the return address for frame 2. Without this fix, a frameless function that faults in a firmware environment (that's where we've seen this issue most commonly) hasn't spilled lr to stack, so we need to retrieve it from the fault handler's full-register-context to find the caller of the frameless function that faulted. It's an unsurprising fix, all of the work was finding exactly where in RegisterContextUnwind we were only allowing RA register use for frame 0, when it should have been frame 0 or above a fault handler function. rdar://127518945
@llvm/pr-subscribers-lldb Author: Jason Molenda (jasonmolenda) ChangesIn RegisterContextUnwind::SavedLocationForRegister we have special logic for retrieving the Return Address register when it has the caller's return address in it. An example would be the lr register on AArch64. This register is never retrieved from a newer stack frame because it is necessarly overwritten by a normal ABI function call. We allow frame 0 to provide its lr value to get the caller's return address, if it has not been overwritten/saved to stack yet. When a function is interrupted asynchronously by a POSIX signal (sigtramp), or a fault handler more generally, the sigtramp/fault handler has the entire register context available. In this situation, if the fault handler is frame 0, the function that was async interrupted is frame 1 and frame 2's return address may still be stored in lr. We need to get the lr value for frame 1 from the fault handler in frame 0, to get the return address for frame 2. Without this fix, a frameless function that faults in a firmware environment (that's where we've seen this issue most commonly) hasn't spilled lr to stack, so we need to retrieve it from the fault handler's full-register-context to find the caller of the frameless function that faulted. It's an unsurprising fix, all of the work was finding exactly where in RegisterContextUnwind we were only allowing RA register use for frame 0, when it should have been frame 0 or above a fault handler function. rdar://127518945 Full diff: https://github.com/llvm/llvm-project/pull/98566.diff 1 Files Affected:
diff --git a/lldb/source/Target/RegisterContextUnwind.cpp b/lldb/source/Target/RegisterContextUnwind.cpp
index 95e8abd763d53..bc8081f4e3b31 100644
--- a/lldb/source/Target/RegisterContextUnwind.cpp
+++ b/lldb/source/Target/RegisterContextUnwind.cpp
@@ -1401,7 +1401,7 @@ RegisterContextUnwind::SavedLocationForRegister(
// it's still live in the actual register. Handle this specially.
if (!have_unwindplan_regloc && return_address_reg.IsValid() &&
- IsFrameZero()) {
+ BehavesLikeZerothFrame()) {
if (return_address_reg.GetAsKind(eRegisterKindLLDB) !=
LLDB_INVALID_REGNUM) {
lldb_private::UnwindLLDB::RegisterLocation new_regloc;
|
So the difference between being interrupted and making a function call is that the latter allows you to store the link register then make the call. A signal may come in at any time, so there may be no saved lr in the frame record at the time the interrupt happens. And this fix means specifically that if you're inside the function that was interrupted, we will read its lr from the fault handler context? Sounds good to me. Testing this is in theory possible, tricky bit is guaranteeing a frameless function. There is the naked attribute but it's not portable https://godbolt.org/z/s9117Gr7a. Or you could write the function in an assembly file, or define and call it inside an inline assembly block, inside a normal C function. That function would branch to self waiting for SIGALRM for example. Maybe that has its own problems, I haven't tried it. Maybe it wouldn't generate enough debug info for us to know that the assembly function was there? |
There's (I'd argue) three parts to the unwind system. First is converting the different unwind info formats (eh_frame, debug_frame, compact unwind, arm idx, assembly instruciton scanning) into the intermediate representation of UnwindPlans. Second is the unwind engine itself, which encodes rules about which type of unwind plan to use for a given stack frame, which registers can be passed up the stack, and rules about behavior on the 0th frame or above a fault handler/sigtramp. And third are correctly fetching the register value for a row in an UnwindPlan (often, dereferencing memory offset from the Canonical Frame Address which is set in terms of another register most often) -- these often end up being dwarf expressions. That middle bit, the unwind engine logic, is hard to test today without making hand-written assembly programs that set up specific unwind scenarios with metadata (.cfi directives) about what they've done. Source level tests are at the mercy of compiler codegen and not stable, or requires capturing a corefile and object binary when the necessary conditions are achieved. But here's the idea I had the other day. With a scripted process, an ObjectFileJSON to create a fake binary with function start addresses, and a way to specify UnwindPlans for those functions, where all of the register rules would be "and the value of fp is " instead of "read stack memory to get the value of fp", I bet there's a way we could write unwind engine tests entirely in these terms. And honest, the unwind engine method has a lot of very tricky corner cases and because it's not directly tested, it's easy to make mistakes - I am genuinely not thrilled about the state of it. And without strong test infrastructure, it's going to be very intimidating to try to rewrite if anyone wanted to do that some day. This is only "shower thoughts" level detail, but it's the first time I can see a testing strategy that I actually think could work well. |
…8566) In RegisterContextUnwind::SavedLocationForRegister we have special logic for retrieving the Return Address register when it has the caller's return address in it. An example would be the lr register on AArch64. This register is never retrieved from a newer stack frame because it is necessarly overwritten by a normal ABI function call. We allow frame 0 to provide its lr value to get the caller's return address, if it has not been overwritten/saved to stack yet. When a function is interrupted asynchronously by a POSIX signal (sigtramp), or a fault handler more generally, the sigtramp/fault handler has the entire register context available. In this situation, if the fault handler is frame 0, the function that was async interrupted is frame 1 and frame 2's return address may still be stored in lr. We need to get the lr value for frame 1 from the fault handler in frame 0, to get the return address for frame 2. Without this fix, a frameless function that faults in a firmware environment (that's where we've seen this issue most commonly) hasn't spilled lr to stack, so we need to retrieve it from the fault handler's full-register-context to find the caller of the frameless function that faulted. It's an unsurprising fix, all of the work was finding exactly where in RegisterContextUnwind we were only allowing RA register use for frame 0, when it should have been frame 0 or above a fault handler function. rdar://127518945 (cherry picked from commit fd42417)
…8566) In RegisterContextUnwind::SavedLocationForRegister we have special logic for retrieving the Return Address register when it has the caller's return address in it. An example would be the lr register on AArch64. This register is never retrieved from a newer stack frame because it is necessarly overwritten by a normal ABI function call. We allow frame 0 to provide its lr value to get the caller's return address, if it has not been overwritten/saved to stack yet. When a function is interrupted asynchronously by a POSIX signal (sigtramp), or a fault handler more generally, the sigtramp/fault handler has the entire register context available. In this situation, if the fault handler is frame 0, the function that was async interrupted is frame 1 and frame 2's return address may still be stored in lr. We need to get the lr value for frame 1 from the fault handler in frame 0, to get the return address for frame 2. Without this fix, a frameless function that faults in a firmware environment (that's where we've seen this issue most commonly) hasn't spilled lr to stack, so we need to retrieve it from the fault handler's full-register-context to find the caller of the frameless function that faulted. It's an unsurprising fix, all of the work was finding exactly where in RegisterContextUnwind we were only allowing RA register use for frame 0, when it should have been frame 0 or above a fault handler function. rdar://127518945 (cherry picked from commit fd42417)
…8566) In RegisterContextUnwind::SavedLocationForRegister we have special logic for retrieving the Return Address register when it has the caller's return address in it. An example would be the lr register on AArch64. This register is never retrieved from a newer stack frame because it is necessarly overwritten by a normal ABI function call. We allow frame 0 to provide its lr value to get the caller's return address, if it has not been overwritten/saved to stack yet. When a function is interrupted asynchronously by a POSIX signal (sigtramp), or a fault handler more generally, the sigtramp/fault handler has the entire register context available. In this situation, if the fault handler is frame 0, the function that was async interrupted is frame 1 and frame 2's return address may still be stored in lr. We need to get the lr value for frame 1 from the fault handler in frame 0, to get the return address for frame 2. Without this fix, a frameless function that faults in a firmware environment (that's where we've seen this issue most commonly) hasn't spilled lr to stack, so we need to retrieve it from the fault handler's full-register-context to find the caller of the frameless function that faulted. It's an unsurprising fix, all of the work was finding exactly where in RegisterContextUnwind we were only allowing RA register use for frame 0, when it should have been frame 0 or above a fault handler function. rdar://127518945
…nction-fault-backtrace-20230725 [lldb] Allow fetching of RA register when above fault handler (llvm#98566)
…nction-fault-backtrace-6.0 [lldb] Allow fetching of RA register when above fault handler (llvm#98566)
…#98566)" This reverts commit fd42417. This patch has two problems. First, it is unnecessary, Pavel landed a fix a week or so before mine which solves this problem in bbd54e0 . Second, the fix is incorrect; for a function above a trap handler, where all registers are available, this patch would have lldb fetch the return address register from frame 0. This might be 10 frames up in the stack; the frame 0 return address register is incorrect. The change would have been correct a short bit later than this, but Pavel's fix is executed earlier in the function and none of this is needed.
…llvm#98566)" This reverts commit fd42417. This patch has two problems. First, it is unnecessary, Pavel landed a fix a week or so before mine which solves this problem in bbd54e0 . Second, the fix is incorrect; for a function above a trap handler, where all registers are available, this patch would have lldb fetch the return address register from frame 0. This might be 10 frames up in the stack; the frame 0 return address register is incorrect. The change would have been correct a short bit later than this, but Pavel's fix is executed earlier in the function and none of this is needed. (cherry picked from commit d29a50f)
…llvm#98566)" This reverts commit fd42417. This patch has two problems. First, it is unnecessary, Pavel landed a fix a week or so before mine which solves this problem in bbd54e0 . Second, the fix is incorrect; for a function above a trap handler, where all registers are available, this patch would have lldb fetch the return address register from frame 0. This might be 10 frames up in the stack; the frame 0 return address register is incorrect. The change would have been correct a short bit later than this, but Pavel's fix is executed earlier in the function and none of this is needed. (cherry picked from commit d29a50f)
…e0-lr-on-leafless-frame-that-trapped Revert "[lldb] Allow fetching of RA register when above fault handler(llvm#98566)"
In RegisterContextUnwind::SavedLocationForRegister we have special logic for retrieving the Return Address register when it has the caller's return address in it. An example would be the lr register on AArch64.
This register is never retrieved from a newer stack frame because it is necessarly overwritten by a normal ABI function call. We allow frame 0 to provide its lr value to get the caller's return address, if it has not been overwritten/saved to stack yet.
When a function is interrupted asynchronously by a POSIX signal (sigtramp), or a fault handler more generally, the sigtramp/fault handler has the entire register context available. In this situation, if the fault handler is frame 0, the function that was async interrupted is frame 1 and frame 2's return address may still be stored in lr. We need to get the lr value for frame 1 from the fault handler in frame 0, to get the return address for frame 2.
Without this fix, a frameless function that faults in a firmware environment (that's where we've seen this issue most commonly) hasn't spilled lr to stack, so we need to retrieve it from the fault handler's full-register-context to find the caller of the frameless function that faulted.
It's an unsurprising fix, all of the work was finding exactly where in RegisterContextUnwind we were only allowing RA register use for frame 0, when it should have been frame 0 or above a fault handler function.
rdar://127518945