Skip to content

Conversation

godlygeek
Copy link
Contributor

Closes #820

When tracing starts in one thread, we use PyThreadState_GetFrame to
capture the current Python call stack in every other thread, so that we
can associate allocations made later in pre-existing threads with the
correct call stack.

Unfortunately, our approach doesn't play nicely with trace functions,
like those installed by Coverage. The stack we get by walking backwards
from PyThreadState_GetFrame sees call frames that a profile function
never sees (because PyThreadState_GetFrame gets frames that correspond
to calls to trace functions, but profile functions don't get called for
calls into trace functions). Because of this, we wind up referring to
frames even after they've been popped off the call stack. Because we're
holding borrowed references, this results in a use-after-free, rather
than just incorrect stacks.

We could hold owned references instead of borrowed references. That
fixes the problem of use-after-free, though we'd still report incorrect
stacks. Unfortunately, it's not easy to do, as there are times when we
need to drop those references while the GIL isn't held and we're not
able to decrement the reference count.

We can't easily trim frames off the PyThreadState_GetFrame to make it
match the stack that profile functions see, either. That would require
us to be able to recognize frames that correspond to calls to trace
functions. That's easy if the thread never changes its trace function,
but it's impossible in the general case where the trace function
installs a different trace function.

The approach taken by this commit is that, instead of referencing the
frame objects captured when tracking started, we instead copy
information out of them and reference that information when allocations
occur. Later, when our profile function runs in that thread, we switch
to our normal shadow stack approach, using PyEval_GetFrame to populate
it from a place where we know there can be no trace function calls on
the stack. This sidesteps the use-after-free problem, but it still means
that we'll misreport stacks until the first time our profile function
gets called in any thread that already existed when the tracking session
started. Notably, because the profile function doesn't get called when
changing from one line to another within a Python function, we'll report
incorrect line numbers for allocations made in background threads until
the first Python function call or return on that thread.

Rather than having callers of the Python stack tracker work directly at
the level of pushing and popping individual frames, allow them to
express higher level goals that the stack tracker will achieve on their
behalf.

This currently just reorganizes some code, but lays the groundwork for
future changes that rely on the stack tracker knowing why a particular
frame is being pushed or popped.

Signed-off-by: Matt Wozniski <[email protected]>
Previously `PythonStackTracker::handleGreenletSwitch` cleared the shadow
stack itself, rather than reusing the `PythonStackTracker::clear`
method. The approach used by `handleGreenletSwitch` should be a little
bit faster, so adopt it into `clear`. Also, switch `clear` to be private
rather than public, since it's currently only used internally to the
class.

Signed-off-by: Matt Wozniski <[email protected]>
@godlygeek godlygeek marked this pull request as draft September 4, 2025 17:33
@codecov-commenter
Copy link

codecov-commenter commented Sep 4, 2025

Codecov Report

❌ Patch coverage is 98.55072% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 92.17%. Comparing base (26b7f3b) to head (1e2ec15).

Files with missing lines Patch % Lines
src/memray/_memray/tracking_api.cpp 98.55% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #823      +/-   ##
==========================================
+ Coverage   92.01%   92.17%   +0.15%     
==========================================
  Files          95       95              
  Lines       11940    11952      +12     
  Branches      413      413              
==========================================
+ Hits        10987    11017      +30     
+ Misses        953      935      -18     
Flag Coverage Δ
cpp 92.17% <98.55%> (+0.15%) ⬆️
python_and_cython 92.17% <98.55%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@godlygeek godlygeek force-pushed the fix_GetFrame_stack_mismatch branch 2 times, most recently from 17f418e to 8ae2869 Compare September 5, 2025 05:00
When tracing starts in one thread, we use `PyThreadState_GetFrame` to
capture the current Python call stack in every other thread, so that we
can associate allocations made later in pre-existing threads with the
correct call stack.

Unfortunately, our approach doesn't play nicely with trace functions,
like those installed by Coverage. The stack we get by walking backwards
from `PyThreadState_GetFrame` sees call frames that a profile function
never sees (because `PyThreadState_GetFrame` gets frames that correspond
to calls to trace functions, but profile functions don't get called for
calls into trace functions). Because of this, we wind up referring to
frames even after they've been popped off the call stack. Because we're
holding borrowed references, this results in a use-after-free, rather
than just incorrect stacks.

We could hold owned references instead of borrowed references. That
fixes the problem of use-after-free, though we'd still report incorrect
stacks. Unfortunately, it's not easy to do, as there are times when we
need to drop those references while the GIL isn't held and we're not
able to decrement the reference count.

We can't easily trim frames off the `PyThreadState_GetFrame` to make it
match the stack that profile functions see, either. That would require
us to be able to recognize frames that correspond to calls to trace
functions. That's easy if the thread never changes its trace function,
but it's impossible in the general case where the trace function
installs a different trace function.

The approach taken by this commit is that, instead of referencing the
frame objects captured when tracking started, we instead copy
information out of them and reference that information when allocations
occur. Later, when our profile function runs in that thread, we switch
to our normal shadow stack approach, using `PyEval_GetFrame` to populate
it from a place where we know there can be no trace function calls on
the stack. This sidesteps the use-after-free problem, but it still means
that we'll misreport stacks until the first time our profile function
gets called in any thread that already existed when the tracking session
started. Notably, because the profile function doesn't get called when
changing from one line to another within a Python function, we'll report
incorrect line numbers for allocations made in background threads until
the first Python function call or return on that thread.

Signed-off-by: Matt Wozniski <[email protected]>
@godlygeek godlygeek force-pushed the fix_GetFrame_stack_mismatch branch from 8ae2869 to 1e2ec15 Compare September 5, 2025 05:06
@godlygeek godlygeek marked this pull request as ready for review September 5, 2025 06:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segfault on macOS and Linux, threading or asyncio related
2 participants