Skip to content

Perl-compatible unsetting of captures during repeats #590

@NWilson

Description

@NWilson

In maint/README:

Perl and PCRE2 sometimes differ in the settings of capturing subpatterns
inside repeats. One example of the difference is the matching of
/(main(O)?)+/ against mainOmain, where PCRE2 leaves $2 set. In Perl, it's
unset. Changing this in PCRE2 will be very hard because I think it needs much
more state to be remembered.

In pcre2compat:

  1. There are some differences that are concerned with the settings of captured
    strings when part of a pattern is repeated. For example, matching "aba" against
    the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to
    "b".

This seems to be the most major thing in pcre2compat. It's also the only definite bug listed in maint/README (the rest seem to be fairly minor feature requests).

Unlike the technicalities of (*THEN) inside recursive patterns, or other trivia, that has major impact on "simple" regexes that just use standard syntax, like "abacb" =~ /^(a(b)?)+c\2$/ (as mentioned above).

Metadata

Metadata

Assignees

No one assigned

    Labels

    affects JITA change that will require matching JIT interpreter changesenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions