Skip to content

Adding positive lookahead prevents some optimizations #851

@Fefer-Ivan

Description

@Fefer-Ivan

Consider the following regular expression .*abcd.
And a large string that doesn't match it.
Despite having no match, pcre2grep returns instantly:

> wc -c large_string.txt
   98305 large_string.txt
> time pcre2grep -i '.*abcd' large_string.txt
pcre2grep -i '.*abcd' large_string.txt  0.00s user 0.01s system 45% cpu 0.023 total

Now let's wrap the regex into a positive lookahead (?=.*abcd).
With the same file it takes more than 3 seconds to figure out that there is no match:

> time pcre2grep -i '(?=.*abcd)' large_string.txt
pcre2grep -i '(?=.*abcd)' large_string.txt  3.06s user 0.01s system 99% cpu 3.084 total

And the time complexity increases non-linearly.
If I increase the size of input in 2, time grows 4 times, indicating quadratic match complexity:

> wc -c large_string.txt
  196609 large_string.txt
> time pcre2grep -i '(?=.*abcd)' large_string.txt
pcre2grep -i '(?=.*abcd)' large_string.txt  12.86s user 0.03s system 99% cpu 12.906 total

It looks like wrapping regex into a positive lookahead prevents an optimization of first looking for a fixed abcd substring before matching.

pcre2grep version:

> pcre2grep --version
pcre2grep version 10.47 2025-10-21

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions