Skip to content

Reduce stacktrace capture overhead #1606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

living180
Copy link
Contributor

The overhead of the SQL panel when ENABLE_STACKTRACES is enabled (the default) can be significant when there are a large number of queries. After profiling, I found that nearly all the overhead was due to filesystem access in debug_toolbar.utils.getframeinfo() (called via debug_toolbar.utils.get_stack()), and the call to os.path.realpath() in debug_toolbar.utils.tidy_stacktrace(). Applying LRU caching to both of those situations reduced the overhead to be almost negligible.

I had to implement a custom LRU cache rather than using Python's functools.lru_cache, as using frame objects directly as cache keys would cause them (and their associated locals) to stay in memory indefinitely. The custom LRU cache takes a function which transform the frame argument into a suitable cache key.

get_stack() is never called with a value other than the default.  Also
remove the context argument from our local copy of getframeinfo().
However, keep it as a variable within the function to reduce the
deviation from the original function in the Python stdlib.
getframeinfo() is quite expensive as it hits the filesystem multiple
times.  Wrap with a custom least-recently-used cache to reduce this
cost.  It is not possible to use functools.lru_cache directly as using
frame objects as cache keys would cause them (and their associated
locals) to stay in memory indefinitely.  The custom LRU cache takes a
function which transforms the frame argument into a suitable cache key.
Prior to this commit, tidy_stacktrace() called os.path.realpath() for
each frame in the stack.  This call is expensive because it has to hit
the filesystem.  Reduce this cost by moving the call to realpath() into
omit_path() and applying an LRUCache() decorator to omit_path().
Copy link
Member

@matthiask matthiask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The code looks good to me, although it's a little bit sad that you had to reimplement the cache.

I think this should work well with the autoreloader since (IIRC) it restarts the whole development server; if it doesn't (now or in the future) I worry a bit about the correctness of using an LRU cache and not some other type of cache which is emptied upon request_finished.

@living180
Copy link
Contributor Author

Outside of the usage of importlib.reload() (which comes with a bunch of caveats anyway), I'm not worried about caching during process lifetime, as otherwise modules are only imported once.

Clearing the cache upon request_finished has the downside that concurrent requests in multiple threads could then affect each other's caching since the cache is global.

Is there anything else I'm not thinking of?

@living180
Copy link
Contributor Author

Actually, I guess a per-thread cache where a thread's cache gets discarded in response to request_finished could work.

@matthiask
Copy link
Member

Actually, I guess a per-thread cache where a thread's cache gets discarded in response to request_finished could work.

I'm not sure it's necessary, maybe it's fine as-is. It will lead to hard to debug problems if it would have been necessary and that's what I'm worrying about.

@living180
Copy link
Contributor Author

So I did go ahead and implement the per-thread cache version: main...living180:stacktrace_overhead_v2. In some ways I like that version better because it doesn't have any race conditions, and each request starts out with a clean slate and thus is more deterministic due to being unaffected by previous requests. But I don't have a huge preference either way.

@tim-schilling
Copy link
Member

I like the per thread cache approach more.

One idea I want to toss out there is reviewing all uses of inspect and determining if we still need them. For example, we should be able to get the file name from frame.co_code.co_filename. Then we may be able to get the lines of source code using linecache (linecache.getlines(filename)).

I couldn't find a replacement for realpath, but path doesn't have the same sensitivities as a frame does when being cached.

@living180
Copy link
Contributor Author

Once question about the per-thread cache approach before I move to that. I know that the Debug Toolbar explicitly does not support Django Channels or async. However, if there is any plan/desire to support either of those eventually, any per-thread caching will not work without changes. Should I take that into consideration at all?

@tim-schilling I also plan to look at your suggestion to see if it is possible to move away from inspect altogether, which might actually mean we wouldn't need any per-thread cache at all.

@tim-schilling
Copy link
Member

However, if there is any plan/desire to support either of those eventually, any per-thread caching will not work without changes. Should I take that into consideration at all?

I think the solution would be to use a ContextVar rather than the threading module.

@tim-schilling
Copy link
Member

Regarding async compatibility. Matthias and I have a conversation coming up in a few weeks which I think that will come up. Additionally, there's an outside contributor who wants to work on the async compat over the summer, but it's nothing concrete at this point.

@living180
Copy link
Contributor Author

So, I have a branch that re-implements get_stack() without using Python's inspect module: main...living180:stacktrace_overhead_v3. However, it does not support getting the source for code from modules loaded using a PEP 302 loader. If that is important (and I suspect that it likely is), I don't see a good way to incorporate that support without going back to just using the inspect module with some caching around it.

Let me know your thoughts.

@living180 living180 marked this pull request as draft May 18, 2022 11:51
@matthiask
Copy link
Member

So, I have a branch that re-implements get_stack() without using Python's inspect module: main...living180:stacktrace_overhead_v3. However, it does not support getting the source for code from modules loaded using a PEP 302 loader. If that is important (and I suspect that it likely is), I don't see a good way to incorporate that support without going back to just using the inspect module with some caching around it.

I'm not sure I'm informed enough to even have a competent opinion on this at all. How common would PEP 302 loaders even be? IIRC the import machinery has received big updates so is this even relevant to the versions we're supporting? (If this sounds like a really dumb question it more than likely is one.)

@living180
Copy link
Contributor Author

So, I have a branch that re-implements get_stack() without using Python's inspect module: main...living180:stacktrace_overhead_v3. However, it does not support getting the source for code from modules loaded using a PEP 302 loader. If that is important (and I suspect that it likely is), I don't see a good way to incorporate that support without going back to just using the inspect module with some caching around it.

I'm not sure I'm informed enough to even have a competent opinion on this at all. How common would PEP 302 loaders even be? IIRC the import machinery has received big updates so is this even relevant to the versions we're supporting? (If this sounds like a really dumb question it more than likely is one.)

I honestly don't know the answer to that. I wouldn't think it would be very common, but based on some old Debug Toolbar issues, it seems that some people have used it in unusual ways (e.g. #279). Anyway, I am actually prototyping a different approach which will hopefully supersede this PR and make this issue moot. Hopefully I'll be able to get that up this week and then close this PR in favor of the new code.

@matthiask
Copy link
Member

Thanks! Btw, I think that a solution which works for 90% of users is fine. Showing the source is a goodie not a hard requirement for debugging, at least in my experience. If it makes things easier (that is: if it allows deleting a lot of code) I'd even agree to remove some features for users with unusual setups. Especially because we're volunteering here.

@living180 living180 mentioned this pull request May 29, 2022
@matthiask
Copy link
Member

Let's close this pull request in favor of #1631

@matthiask matthiask closed this May 29, 2022
@living180 living180 deleted the stacktrace_overhead branch May 29, 2022 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants