-
Notifications
You must be signed in to change notification settings - Fork 241
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Consider the following regular expression .*abcd.
And a large string that doesn't match it.
Despite having no match, pcre2grep returns instantly:
> wc -c large_string.txt
98305 large_string.txt
> time pcre2grep -i '.*abcd' large_string.txt
pcre2grep -i '.*abcd' large_string.txt 0.00s user 0.01s system 45% cpu 0.023 total
Now let's wrap the regex into a positive lookahead (?=.*abcd).
With the same file it takes more than 3 seconds to figure out that there is no match:
> time pcre2grep -i '(?=.*abcd)' large_string.txt
pcre2grep -i '(?=.*abcd)' large_string.txt 3.06s user 0.01s system 99% cpu 3.084 total
And the time complexity increases non-linearly.
If I increase the size of input in 2, time grows 4 times, indicating quadratic match complexity:
> wc -c large_string.txt
196609 large_string.txt
> time pcre2grep -i '(?=.*abcd)' large_string.txt
pcre2grep -i '(?=.*abcd)' large_string.txt 12.86s user 0.03s system 99% cpu 12.906 total
It looks like wrapping regex into a positive lookahead prevents an optimization of first looking for a fixed abcd substring before matching.
pcre2grep version:
> pcre2grep --version
pcre2grep version 10.47 2025-10-21
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request