Reduce stacktrace capture overhead #1606

living180 · 2022-04-21T12:32:05Z

The overhead of the SQL panel when ENABLE_STACKTRACES is enabled (the default) can be significant when there are a large number of queries. After profiling, I found that nearly all the overhead was due to filesystem access in debug_toolbar.utils.getframeinfo() (called via debug_toolbar.utils.get_stack()), and the call to os.path.realpath() in debug_toolbar.utils.tidy_stacktrace(). Applying LRU caching to both of those situations reduced the overhead to be almost negligible.

I had to implement a custom LRU cache rather than using Python's functools.lru_cache, as using frame objects directly as cache keys would cause them (and their associated locals) to stay in memory indefinitely. The custom LRU cache takes a function which transform the frame argument into a suitable cache key.

get_stack() is never called with a value other than the default. Also remove the context argument from our local copy of getframeinfo(). However, keep it as a variable within the function to reduce the deviation from the original function in the Python stdlib.

getframeinfo() is quite expensive as it hits the filesystem multiple times. Wrap with a custom least-recently-used cache to reduce this cost. It is not possible to use functools.lru_cache directly as using frame objects as cache keys would cause them (and their associated locals) to stay in memory indefinitely. The custom LRU cache takes a function which transforms the frame argument into a suitable cache key.

Prior to this commit, tidy_stacktrace() called os.path.realpath() for each frame in the stack. This call is expensive because it has to hit the filesystem. Reduce this cost by moving the call to realpath() into omit_path() and applying an LRUCache() decorator to omit_path().

matthiask

Thanks! The code looks good to me, although it's a little bit sad that you had to reimplement the cache.

I think this should work well with the autoreloader since (IIRC) it restarts the whole development server; if it doesn't (now or in the future) I worry a bit about the correctness of using an LRU cache and not some other type of cache which is emptied upon request_finished.

living180 · 2022-04-21T13:57:20Z

Outside of the usage of importlib.reload() (which comes with a bunch of caveats anyway), I'm not worried about caching during process lifetime, as otherwise modules are only imported once.

Clearing the cache upon request_finished has the downside that concurrent requests in multiple threads could then affect each other's caching since the cache is global.

Is there anything else I'm not thinking of?

living180 · 2022-04-21T14:03:36Z

Actually, I guess a per-thread cache where a thread's cache gets discarded in response to request_finished could work.

matthiask · 2022-04-21T14:43:38Z

Actually, I guess a per-thread cache where a thread's cache gets discarded in response to request_finished could work.

I'm not sure it's necessary, maybe it's fine as-is. It will lead to hard to debug problems if it would have been necessary and that's what I'm worrying about.

living180 · 2022-04-21T17:59:36Z

So I did go ahead and implement the per-thread cache version: main...living180:stacktrace_overhead_v2. In some ways I like that version better because it doesn't have any race conditions, and each request starts out with a clean slate and thus is more deterministic due to being unaffected by previous requests. But I don't have a huge preference either way.

tim-schilling · 2022-04-23T14:16:03Z

I like the per thread cache approach more.

One idea I want to toss out there is reviewing all uses of inspect and determining if we still need them. For example, we should be able to get the file name from frame.co_code.co_filename. Then we may be able to get the lines of source code using linecache (linecache.getlines(filename)).

I couldn't find a replacement for realpath, but path doesn't have the same sensitivities as a frame does when being cached.

living180 · 2022-04-25T10:53:08Z

Once question about the per-thread cache approach before I move to that. I know that the Debug Toolbar explicitly does not support Django Channels or async. However, if there is any plan/desire to support either of those eventually, any per-thread caching will not work without changes. Should I take that into consideration at all?

@tim-schilling I also plan to look at your suggestion to see if it is possible to move away from inspect altogether, which might actually mean we wouldn't need any per-thread cache at all.

tim-schilling · 2022-04-25T14:10:17Z

However, if there is any plan/desire to support either of those eventually, any per-thread caching will not work without changes. Should I take that into consideration at all?

I think the solution would be to use a ContextVar rather than the threading module.

tim-schilling · 2022-04-25T14:12:05Z

Regarding async compatibility. Matthias and I have a conversation coming up in a few weeks which I think that will come up. Additionally, there's an outside contributor who wants to work on the async compat over the summer, but it's nothing concrete at this point.

living180 · 2022-04-27T14:09:43Z

So, I have a branch that re-implements get_stack() without using Python's inspect module: main...living180:stacktrace_overhead_v3. However, it does not support getting the source for code from modules loaded using a PEP 302 loader. If that is important (and I suspect that it likely is), I don't see a good way to incorporate that support without going back to just using the inspect module with some caching around it.

Let me know your thoughts.

matthiask · 2022-05-23T18:28:39Z

So, I have a branch that re-implements get_stack() without using Python's inspect module: main...living180:stacktrace_overhead_v3. However, it does not support getting the source for code from modules loaded using a PEP 302 loader. If that is important (and I suspect that it likely is), I don't see a good way to incorporate that support without going back to just using the inspect module with some caching around it.

I'm not sure I'm informed enough to even have a competent opinion on this at all. How common would PEP 302 loaders even be? IIRC the import machinery has received big updates so is this even relevant to the versions we're supporting? (If this sounds like a really dumb question it more than likely is one.)

living180 · 2022-05-23T18:46:06Z

So, I have a branch that re-implements get_stack() without using Python's inspect module: main...living180:stacktrace_overhead_v3. However, it does not support getting the source for code from modules loaded using a PEP 302 loader. If that is important (and I suspect that it likely is), I don't see a good way to incorporate that support without going back to just using the inspect module with some caching around it.

I'm not sure I'm informed enough to even have a competent opinion on this at all. How common would PEP 302 loaders even be? IIRC the import machinery has received big updates so is this even relevant to the versions we're supporting? (If this sounds like a really dumb question it more than likely is one.)

I honestly don't know the answer to that. I wouldn't think it would be very common, but based on some old Debug Toolbar issues, it seems that some people have used it in unusual ways (e.g. #279). Anyway, I am actually prototyping a different approach which will hopefully supersede this PR and make this issue moot. Hopefully I'll be able to get that up this week and then close this PR in favor of the new code.

matthiask · 2022-05-24T07:50:57Z

Thanks! Btw, I think that a solution which works for 90% of users is fine. Showing the source is a goodie not a hard requirement for debugging, at least in my experience. If it makes things easier (that is: if it allows deleting a lot of code) I'd even agree to remove some features for users with unusual setups. Especially because we're volunteering here.

matthiask · 2022-05-29T14:58:06Z

Let's close this pull request in favor of #1631

living180 added 3 commits April 21, 2022 15:07

matthiask reviewed Apr 21, 2022

View reviewed changes

living180 marked this pull request as draft May 18, 2022 11:51

living180 mentioned this pull request May 29, 2022

Stack trace rework #1631

Merged

matthiask closed this May 29, 2022

living180 deleted the stacktrace_overhead branch May 29, 2022 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce stacktrace capture overhead #1606

Reduce stacktrace capture overhead #1606

Uh oh!

living180 commented Apr 21, 2022

Uh oh!

matthiask left a comment

Uh oh!

living180 commented Apr 21, 2022

Uh oh!

living180 commented Apr 21, 2022

Uh oh!

matthiask commented Apr 21, 2022

Uh oh!

living180 commented Apr 21, 2022

Uh oh!

tim-schilling commented Apr 23, 2022

Uh oh!

living180 commented Apr 25, 2022

Uh oh!

tim-schilling commented Apr 25, 2022

Uh oh!

tim-schilling commented Apr 25, 2022

Uh oh!

living180 commented Apr 27, 2022

Uh oh!

matthiask commented May 23, 2022

Uh oh!

living180 commented May 23, 2022

Uh oh!

matthiask commented May 24, 2022

Uh oh!

matthiask commented May 29, 2022

Uh oh!

Uh oh!

Reduce stacktrace capture overhead #1606

Reduce stacktrace capture overhead #1606

Uh oh!

Conversation

living180 commented Apr 21, 2022

Uh oh!

matthiask left a comment

Choose a reason for hiding this comment

Uh oh!

living180 commented Apr 21, 2022

Uh oh!

living180 commented Apr 21, 2022

Uh oh!

matthiask commented Apr 21, 2022

Uh oh!

living180 commented Apr 21, 2022

Uh oh!

tim-schilling commented Apr 23, 2022

Uh oh!

living180 commented Apr 25, 2022

Uh oh!

tim-schilling commented Apr 25, 2022

Uh oh!

tim-schilling commented Apr 25, 2022

Uh oh!

living180 commented Apr 27, 2022

Uh oh!

matthiask commented May 23, 2022

Uh oh!

living180 commented May 23, 2022

Uh oh!

matthiask commented May 24, 2022

Uh oh!

matthiask commented May 29, 2022

Uh oh!

Uh oh!