Skip to content
This repository was archived by the owner on May 14, 2020. It is now read-only.

Regex: Only escape characters when necessary #1130

Closed
michaelgranzow-avi opened this issue Jun 26, 2018 · 5 comments
Closed

Regex: Only escape characters when necessary #1130

michaelgranzow-avi opened this issue Jun 26, 2018 · 5 comments

Comments

@michaelgranzow-avi
Copy link

The following rules

942432 942430 942431 942421 942420

seem to escape all characters inside the character class as a matter of course. Most of the backslashes are not needed and are indeed prone to cause problems. Inside character classes, only the caret ^, the hyphen -, the closing bracket ] and the backslash itself are metacharacters. We should heed the advice in http://www.regexguru.com/2008/04/escape-characters-only-when-necessary/, so the regex should be (double quote escaped because it's inside a string, ^ not escaped because it's not at the start of the range):

"((?:[~!@#$%^&*()\-+={}[\]|:;\"'´’‘`<>][^~!@#$%^&*()\-+={}[\]\|:;\"'´’‘`<>]*?){N})"

with N = 2, 3, 6, 8 or 12, depending on the rule.

@csanders-git
Copy link
Contributor

csanders-git commented Jul 6, 2018

The general advice is reasonable, personal feeling although in many cases I think it leads to readability almost being harder as you have to know the content to ensure what the meaning is /personal feeling. However from a modsec perspective sometimes it plays by slightly different rules then normal regex engines. I think however in the situation you prescribed would be fine advice. If you would like to do minimal work, i'd recommend adding a PR to our contributing guidelines (https://github.com/SpiderLabs/owasp-modsecurity-crs/blob/v3.2/dev/CONTRIBUTING.md). If you're feeling more helpful :-P. We'd love PR's that actually address this in the rules.

Let me know if there is more feedback by reopening the issue.

@csanders-git
Copy link
Contributor

Thanks for the feedback!!!

@michaelgranzow-avi
Copy link
Author

Feedback was given in-person; for the record: When evaluating the RE2 regex library for use with ModSecurity/CRS, it was noticed that 16 regexes used in CRS are rejected by RE2, five of them for the trivial reason of excessive escaping. So to pave the way for replacing PCRE (which has well-known poor worst-case behavior) with RE2, it would be good to clean these rules up.

@dune73
Copy link
Contributor

dune73 commented Jul 11, 2018

Are the said 16 rules the only thing that stops us from adopting RE2? And should we?

@michaelgranzow-avi
Copy link
Author

In terms of RX compilation, yes. Full evaluation would require extensive testing, of course.

RE2 should be preferred because of its predictable worst-case behavior, the original paper is here:

https://swtch.com/~rsc/regexp/regexp1.html

These are the issues:

1 instance of \Z at end of text, or before newline at end of text (NOT SUPPORTED)
5 instances of ` This seems to be a GNU extension; CRS escapes the backtick unnecessarily
3 instances of (?< after text (not) matching re (NOT SUPPORTED)
5 instances of (?! before text not matching re (NOT SUPPORTED)
1 instance of (?= before text matching re (NOT SUPPORTED)
1 instance of ++ one or more x, possessive (NOT SUPPORTED)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants