-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Negative lookaround assertions sometimes leak capture groups #89702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
When you have capture groups inside a negative lookaround assertion, the strings captured by those capture groups can sometimes survive the failure of the assertion and feature in the returned Match object. Here it is illustrated with lookbehinds and lookaheads: >>> re.search(r"(?<!(a)c)de", "abde").group(1)
'a'
>>> re.search(r"(?!(a)c)ab", "ab").group(1)
'a' Even though the search for the expression '(a)c' fails when trying to match 'c', the string 'a' is still reported as having been successfully matched by capture group 1. The expected behavior would be for the capture group 1 to not have a match. Because of the following reasons, I believe this behavior is not intentional and is the result of Python not cleaning up after the asserted subexpression fails (e.g. by running the asserted subexpression in a new stack frame).
>>> re.search(r"(?<!(a)c|(a)d)de", "abde").group(1) is None
True
>>> re.search(r"(?!(a)c|(a)d)ab", "ab").group(1) is None
True
MRI (Ruby): irb(main):001:0> /(?<!(a)c)de/.match("abde")[1] JShell (Java): jshell> Matcher m = java.util.regex.Pattern.compile("(?<!(a)c)de").matcher("abde")
|
It's definitely a bug. In order for the pattern to match, the negative lookaround must match, which means that its subexpression mustn't match, so none of the groups in that subexpression have captured. |
This bug was fixed in 356997c. Python 3.10.4 >>> re.search(r"(?<!(a)c)de", "abde").groups()
('a',)
>>> re.search(r"(?!(a)c)ab", "ab").groups()
('a',) Python 3.11 a7+ >>> re.search(r"(?<!(a)c)de", "abde").groups()
(None,)
>>> re.search(r"(?!(a)c)ab", "ab").groups()
(None,) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: