-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Reduce stacktrace capture overhead #1606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
get_stack() is never called with a value other than the default. Also remove the context argument from our local copy of getframeinfo(). However, keep it as a variable within the function to reduce the deviation from the original function in the Python stdlib.
getframeinfo() is quite expensive as it hits the filesystem multiple times. Wrap with a custom least-recently-used cache to reduce this cost. It is not possible to use functools.lru_cache directly as using frame objects as cache keys would cause them (and their associated locals) to stay in memory indefinitely. The custom LRU cache takes a function which transforms the frame argument into a suitable cache key.
Prior to this commit, tidy_stacktrace() called os.path.realpath() for each frame in the stack. This call is expensive because it has to hit the filesystem. Reduce this cost by moving the call to realpath() into omit_path() and applying an LRUCache() decorator to omit_path().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The code looks good to me, although it's a little bit sad that you had to reimplement the cache.
I think this should work well with the autoreloader since (IIRC) it restarts the whole development server; if it doesn't (now or in the future) I worry a bit about the correctness of using an LRU cache and not some other type of cache which is emptied upon request_finished
.
Outside of the usage of Clearing the cache upon Is there anything else I'm not thinking of? |
Actually, I guess a per-thread cache where a thread's cache gets discarded in response to |
I'm not sure it's necessary, maybe it's fine as-is. It will lead to hard to debug problems if it would have been necessary and that's what I'm worrying about. |
So I did go ahead and implement the per-thread cache version: main...living180:stacktrace_overhead_v2. In some ways I like that version better because it doesn't have any race conditions, and each request starts out with a clean slate and thus is more deterministic due to being unaffected by previous requests. But I don't have a huge preference either way. |
I like the per thread cache approach more. One idea I want to toss out there is reviewing all uses of I couldn't find a replacement for realpath, but |
Once question about the per-thread cache approach before I move to that. I know that the Debug Toolbar explicitly does not support Django Channels or async. However, if there is any plan/desire to support either of those eventually, any per-thread caching will not work without changes. Should I take that into consideration at all? @tim-schilling I also plan to look at your suggestion to see if it is possible to move away from |
I think the solution would be to use a |
Regarding async compatibility. Matthias and I have a conversation coming up in a few weeks which I think that will come up. Additionally, there's an outside contributor who wants to work on the async compat over the summer, but it's nothing concrete at this point. |
So, I have a branch that re-implements Let me know your thoughts. |
I'm not sure I'm informed enough to even have a competent opinion on this at all. How common would PEP 302 loaders even be? IIRC the import machinery has received big updates so is this even relevant to the versions we're supporting? (If this sounds like a really dumb question it more than likely is one.) |
I honestly don't know the answer to that. I wouldn't think it would be very common, but based on some old Debug Toolbar issues, it seems that some people have used it in unusual ways (e.g. #279). Anyway, I am actually prototyping a different approach which will hopefully supersede this PR and make this issue moot. Hopefully I'll be able to get that up this week and then close this PR in favor of the new code. |
Thanks! Btw, I think that a solution which works for 90% of users is fine. Showing the source is a goodie not a hard requirement for debugging, at least in my experience. If it makes things easier (that is: if it allows deleting a lot of code) I'd even agree to remove some features for users with unusual setups. Especially because we're volunteering here. |
Let's close this pull request in favor of #1631 |
The overhead of the SQL panel when
ENABLE_STACKTRACES
is enabled (the default) can be significant when there are a large number of queries. After profiling, I found that nearly all the overhead was due to filesystem access indebug_toolbar.utils.getframeinfo()
(called viadebug_toolbar.utils.get_stack()
), and the call toos.path.realpath()
indebug_toolbar.utils.tidy_stacktrace()
. Applying LRU caching to both of those situations reduced the overhead to be almost negligible.I had to implement a custom LRU cache rather than using Python's
functools.lru_cache
, as using frame objects directly as cache keys would cause them (and their associated locals) to stay in memory indefinitely. The custom LRU cache takes a function which transform the frame argument into a suitable cache key.