Skip to content

Support the /n pattern modifier #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Roy-Orbison opened this issue Sep 3, 2021 · 9 comments
Closed

Support the /n pattern modifier #15

Roy-Orbison opened this issue Sep 3, 2021 · 9 comments
Labels
enhancement New feature or request

Comments

@Roy-Orbison
Copy link

From pcre2test and the docs, I don't see a way to name the whole string match, i.e. begin the pattern with a ?<name> variant, which would be useful. I realise it's possible to wrap the entire expression in a named subpattern, but that creates an unnecessary extra match group.

It seems like it would be a backward-compatible change as beginning a pattern with ? currently produces an error like the following, so there can't be any valid patterns that would begin with those sequences.

Failed: error 109 at offset 0: quantifier does not follow a repeatable item

@PhilipHazel PhilipHazel added the enhancement New feature or request label Sep 4, 2021
@PhilipHazel
Copy link
Collaborator

Two comments: (1) This isn't something Perl does. (2) I may be missing something, but I can't see much use for this other than to use as a synonym for (?R) to recurse the whole pattern, which seems a miniscule benefit.

@Roy-Orbison
Copy link
Author

Maybe Perl should, too.

I think it'd be primarily useful in global matching. Take this simplified example of matching some template tags:

Without being able to name group 0, the expression

/(?<tag>\{([^\s}]+)(?:\s*(?<params>[^}\s][^}]*))?\})/g

matched against

content {tag1} content {tag2 param param} content

yields

 0: {tag1}
 1 (tag): {tag1}
 2 (name): tag1
 0: {tag2 param param}
 1 (tag): {tag2 param param}
 2 (name): tag2
 3 (params): param param

However, if the expression could be

/?<tag>\{([^\s}]+)(?:\s*(?<params>[^}\s][^}]*))?\}/g

one would get

 0 (tag): {tag1}
 1 (name): tag1
 0 (tag): {tag2 param param}
 1 (name): tag2
 2 (params): param param

This would make result sets smaller, and access of their members more consistent because every numbered part could also be a named part. As it is now, 0 behaves like an exception.

@PhilipHazel
Copy link
Collaborator

That is true, though you seem to be suggesting the use of pcre2test in some kind of regular production, which is not what it is intended for. Anyway, as this is not in Perl, I'm afraid it is unlikely to get done unless Perl takes it up.

@Roy-Orbison
Copy link
Author

Don't worry, I'm only using that to test with, interactively, so my result wasn't tainted by any other software employing PCRE as a library. I manually edited the group names into its output.

I'll write an RFC for Perl.

@Roy-Orbison
Copy link
Author

Roy-Orbison commented Sep 14, 2021

The feedback from the Perl Porters is that it cannot work because it would break some kinds of patterns created by interpolation. Their explanation was this:

There is a strong expectation that:

$str=~/PAT/;

and

my $qr= qr/PAT/;
$str=~/(?:$qr)/

will always expected to match the same.

Basically PAT and (?:PAT) have to match the same thing.


I looked further into the Perl docs and it seems there's an /n pattern modifier that PCRE could implement. It prevents all numbered backreferences, which could be used to achieve the same thing. From what I read, it has two benefits:

  1. Patterns become more readable when you don't care about matching subgroups, e.g.:
    /(?:foo|bar)baz(?:qux)?/
    becomes
    /(foo|bar)baz(qux)?/n
  2. You get only semantic, named matches in result sets. Reusing the first pattern example from my comment above, the result would be:
    0:
      tag: {tag1}
      name: tag1
    1:
      tag: {tag2 param param}
      name: tag2
      params: param param
    

This should mean better compatibility with Perl, and looks backward-compatible in PCRE because that flag causes a compilation error, for me.

@Roy-Orbison Roy-Orbison changed the title Support naming the 0th (whole) match Support the /n pattern modifier Sep 14, 2021
@PhilipHazel
Copy link
Collaborator

PCRE2 has the PCRE2_NO_AUTO_CAPTURE option to do this - and PCRE1 before it has the equivalent; I think they've been there almost from the start. I hadn't noticed that Perl has added the /n option. I've made a note to add this synonym to pcre2test. But in the meantime you can use this with patterns like /whatever/no_auto_capture in pcre2test...HANG ON ... pcre2test already has /n. Did you not spot this?

@zherczeg
Copy link
Collaborator

While these are useful requests, the pcre library is focusing on matching a pattern. This was the reason I have made another library which can be used to rewrite patterns, and it, for example, has a feature for smart capture removal feature:

https://github.com/zherczeg/repan/blob/master/tests/opt/test_uncapture_expected.txt

It only keeps the necessary captures and rewrite references. You can also add a feature to create a group for the entire match, and automatically rewrite references in a pattern.

@Roy-Orbison
Copy link
Author

PCRE2 has the PCRE2_NO_AUTO_CAPTURE option to do this - and PCRE1 before it has the equivalent; I think they've been there almost from the start. I hadn't noticed that Perl has added the /n option. I've made a note to add this synonym to pcre2test. But in the meantime you can use this with patterns like /whatever/no_auto_capture in pcre2test...HANG ON ... pcre2test already has /n. Did you not spot this?

I guess the version I was testing with is much older than I realised, it supports /no_auto_capture but not /n. Sorry for the hassle.

@PhilipHazel
Copy link
Collaborator

I think this issue is now thrashed out, so I am going to close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants