-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Support for RE2 regular expression engine #1996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @WGH-, Thinking in that possibility, in ModSecurity v3 we have this as plug able interface. So that we could support provide for different regular expression engines. It just demands a single file change, here: Ideally the regular expression back-end could be selected with a compilation flag, selected during the configuration. Hyperscan can be also taken into consideration. |
It seems nobody is sitting on an unpublished patch (or at least doesn't admit to), so I'll try to clean up and submit mine this week. |
Great to hear that you have a patch already. Let us known if you need further assistance. |
Hi, @WGH-, As commented on your patch review, I've made a branch with a couple of suggestions on a refactoring, to make our code a little bit more elegant. Also, I've made some suggestions on how to tackle the architecture to support multiples regex engines. Instead of write about I guess it was faster just to had, it implemented. The code is available here: v3/dev/re2. As of now, I think it is Ok to have the selection on compilation time. Further, we may have a run-time configuration specifically for that. This is the output of my configure:
Notice the [selected] which appears by default on the pcre, in case the user uses This branch is just for reference. If you have something else in mind, please share. If you want to discuss something, my Skype is felipezimmerlebr. Hangouts: [email protected]. I am interested in doing a blog post with the performance comparison in between the three options: re2, pcre and HyperScan, it will be amazing if you want to do it together. |
The verify_cc implementation is not using the regex encapsulation: That will be in our way. |
I had slightly different idea. Since RE2 doesn't support certain constructs, it might make sense to do a run-time fallback instead. That is, if RE2 support is built, every regular expression is first attempted to be compiled using RE2, and if that fails, it's compiled with libpcre instead. This way RE2 can be enabled even if some rules are not supported by RE2, and at the same time, rules that are supported by RE2 enjoy better worst-case speed. |
Is that any specific ModSecurity-wise regular expression that concerns you regarding functionality? As far as I understand the syntax may be a little different in some corner cases, but both can accomplish the same mission. That said, we may not need a fallback. In case the rules are not syntax-wise supported and the user wants to use re2, he can always change the rule. Unless some functionality is indeed missing. The thing that concerns me the most about this run time thing is the usability. I don't want to the user to read a manual to understand how regular expression works in modsec. It should be simple and natural. If fallback is the way that we want to go, we have to have a compilation option to let the user choose the priority list: (rp2, hyperscan, pcre); ...; (hyperscan, re2, pcre). We may support different backends in the future (not only re2 and pcre) and we want the user to freely choose the back end that he opts to, fallback or not. Another important thing to consider is that the dependency needs be optional. If re2 is not present on the platform, we have to relay on pcre and vice versa. As ModSec internally uses regex, at least one backend should be supported. Most importantly, if it is runtime, it has to be clear to the user (somehow: debug, error log, warning while loading the rules) which was the engine that was picked to the evaluation, allowing the user to debug and investigate further issues in the regular expression. Otherwise, it will be very difficult to debug the regular expressions. The way I see is that the runtime implementation is an amelioration of the compile-time option, at least given the architecture presented. Nothing has te bo structural changed to support it, just new code has to be added. What do you think? |
As I said in my first message, some functionality is indeed missing. There's extensive reference of RE2 syntax here - https://github.com/google/re2/wiki/Syntax - and it also includes features found in other regular expression engines, but not supported by RE2. Some of them are purely syntantic ones (like comments, numbered and named (at the same time) groups, etc.), but some features cannot be replaced in general case. The pull request I mentioned - SpiderLabs/owasp-modsecurity-crs#1255 - marks the regexes not supported by RE2. For example, And how, for example, would you rewrite this: Althought there're no regular expressions with backreferences in the CRS, they might exist in custom rule sets, and would be much more difficult to rewrite, which is also impossible in general case. My main point is forcing the user to rewrite all his rules to be RE2-compatible is a bad decision, given difficulty of such feat. Very few people would be able to flip the RE2 switch. Retaining libpcre fallback, on the other hand, seems like the best of two worlds to me.
A good thing about RE2 is that if regular expression was compiled successfully, then it will work like it did in libpcre.
It makes sense. Since libpcre supports most features, putting it first would effectively disable all the others, though. |
RE2 (https://github.com/google/re2) is a regular expression engine written in C++ and developed by Google.
Unlike libpcre, RE2 runtime is always linear to the size of the input. It comes at some cost: the engine inherently doesn't support certain features, like backreferences and lookaround assertions.
Still, even the CRS has some regular expressions in its rules that run very slowly (minutes) on certain inputs, but are supported by RE2 and run much faster (seconds) on the same inputs.
The idea of adding RE2 support to ModSecurity has been floating around for a while, and there's quite a few mentions of people working on things related to it:
I have been unable to find neither issue nor public pull request/fork/branch of ModSecurity with RE2 support.
I have a more-or-less working proof-of-concept patch for ModSecurity with optional RE2 support (with fallback to libpcre if the regexp fails to compile on RE2).
Is someone else working on this? Because if yes, it's probably not worth it to duplicate efforts. If not, I think I can clean up my patch a bit and submit it here as a PR.
The text was updated successfully, but these errors were encountered: