-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
GH-115060: Speed up pathlib.Path.glob()
by removing redundant regex matching
#115061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… regex matching When expanding and filtering paths for a `**` wildcard segment, build an `re.Pattern` object from the subsequent pattern parts, rather than the entire pattern. Also skip compiling a pattern when expanding a `*` wildcard segment.
Notable improvements: $ ./python -m timeit -s "from pathlib import Path" "list(Path.cwd().glob('*', follow_symlinks=False))"
2000 loops, best of 5: 180 usec per loop # before
2000 loops, best of 5: 159 usec per loop # after
# --> 1.13x faster
$ ./python -m timeit -s "from pathlib import Path" "list(Path.cwd().glob('**/*.py', follow_symlinks=False))"
5 loops, best of 5: 54 msec per loop # before
5 loops, best of 5: 40.9 msec per loop # after
# --> 1.32x faster Everything else is about the same. |
For whatever reason, every time I try to review this, I struggle to figure out what the change is doing :D Since it doesn't require changing any test cases, and I know the tests cases are pretty thorough for this area, I don't think there's any reason to not sign off. Maybe trigger a buildbot run with the tag to make sure it doesn't behave strangely on any of those setups - they can occasionally be a bit unusual and find some edge cases. |
Thanks Steve.
The algorithm might be worthy of a blog post at this point! The main change is that we now filter partial paths through a regex corresponding to a partial pattern in The secondary change (which includes the addition of |
Okay, today it made sense :) Guess I'm more awake right now. Reading the changes from the bottom up might have helped as well. Personally, I don't think you can have too many comments in an algorithm like this, particularly when it's recursive and split between a couple of functions. I'll suggest a few comments that would've helped me, but I don't think there are any code changes needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just comments that may help make it more understandable. No changes required
… regex matching (python#115061) When expanding and filtering paths for a `**` wildcard segment, build an `re.Pattern` object from the subsequent pattern parts, rather than the entire pattern, and match against the `os.DirEntry` object prior to instantiating a path object. Also skip compiling a pattern when expanding a `*` wildcard segment.
When expanding and filtering paths for a
**
wildcard segment, build anre.Pattern
object from the subsequent pattern parts, rather than the entire pattern, and match against theos.DirEntry
object prior to instantiating a path object.Also skip compiling a pattern when expanding a
*
wildcard segment.pathlib.Path.glob()
by removing redundant regex matching #115060