-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Complementary re patterns such as [\s\S] or [\w\W] are much slower than . with DOTALL #111259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It is neither a bug in the This can be qualified as a bug in the You can use the |
Is there a document of slow and fast patterns for the Python regex engine? I just stumbled across this pattern here by chance. If it does work as expected then I think it cannot qualify as a bug in Feel free to set different labels that match the issue more closely. |
I see that |
Regular expression pattern `(?s:.)` is mach faster than `[\s\S]`.
Regular expression pattern `(?s:.)` is much faster than `[\s\S]`.
Regular expression pattern `(?s:.)` is much faster than `[\s\S]`.
…1303) Regular expression pattern `(?s:.)` is much faster than `[\s\S]`.
Patterns like "[\s\S]" or "\s|\S" which match any character are now compiled to the same effective code as a dot with the DOTALL modifier ("(?s:.)").
Patterns like "[\s\S]" or "\s|\S" which match any character are now compiled to the same effective code as a dot with the DOTALL modifier ("(?s:.)").
…y character (pythonGH-120745) (cherry picked from commit a2f6f7d) Co-authored-by: Serhiy Storchaka <[email protected]>
…y character (pythonGH-120745) (cherry picked from commit a2f6f7d) Co-authored-by: Serhiy Storchaka <[email protected]>
…ny character (GH-120745) (GH-120814) (cherry picked from commit a2f6f7d) Co-authored-by: Serhiy Storchaka <[email protected]>
…ny character (GH-120745) (GH-120813) (cherry picked from commit a2f6f7d) Co-authored-by: Serhiy Storchaka <[email protected]>
…H-120742) Patterns like "[\s\S]" or "\s|\S" which match any character are now compiled to the same effective code as a dot with the DOTALL modifier ("(?s:.)").
…H-120742) Patterns like "[\s\S]" or "\s|\S" which match any character are now compiled to the same effective code as a dot with the DOTALL modifier ("(?s:.)").
…H-120742) Patterns like "[\s\S]" or "\s|\S" which match any character are now compiled to the same effective code as a dot with the DOTALL modifier ("(?s:.)").
…1303) Regular expression pattern `(?s:.)` is much faster than `[\s\S]`.
Bug report
Bug description:
Runtimes are 0.44 s vs 0.0016 s on my system. Instead of simplification, the [\s\S] is stepped through one after another. \s does not match so then \S is checked (the order [\S\s] is twice as fast for the string here). This is not solely an issue for larger matches. A 40 char string is processed half as fast when using [\s\S]. Even 10 chars take about 25% longer to process. I'm not completely sure whether this qualifies as a bug or an issue with documentation. Other languages don't have the DOTALL option and always rely on the first option. Plenty of posts on SO and elsewhere will thus advocate using [\s\S] as an all-matching regex pattern. Unsuspecting Python programmers such as @barneygale may expect [\s\S] to be identical to using a dot with DOTALL as seen below.
@serhiy-storchaka
cpython/Lib/pathlib.py
Lines 126 to 133 in 9bb202a
CPython versions tested on:
3.11, 3.13
Operating systems tested on:
Linux, Windows
Linked PRs
The text was updated successfully, but these errors were encountered: