Skip to content

Can not get data in group when using regular expression. #2336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
anhkhoa14592 opened this issue Jun 11, 2020 · 2 comments · Fixed by #2348
Closed

Can not get data in group when using regular expression. #2336

anhkhoa14592 opened this issue Jun 11, 2020 · 2 comments · Fixed by #2348
Assignees

Comments

@anhkhoa14592
Copy link

Describe the bug
I tried to extract the value from PHPSESSID with regular pattern (This pattern from Web Application Defender's Cookbook: Battling Hackers and Protecting Users):
(?i:(j?sessionid|(php)?sessid|(asp|jserv|jw)?session[-_]?(id)?|cf(id|token)|sid)=([^\s]+)\;\s?)

But I can not get the value from group 6 (TX:6). I tried in others Text Editor and everything is fine but I don't know these pattern does not work. Maybe I miss somethings?

Logs and dumps

SecRule

SecRule RESPONSE_HEADERS:/Set-Cookie2?/ "(?i:(j?sessionid|(php)?sessid|(asp|jserv|jw)?session[-_]?(id)?|cf(id|token)|sid)\=([^\s]+)\;\s?)" "chain,phase:3,id:'981062',t:none,pass,log,capture,setsid:%{tx.6},setvar:session.sessionid=%{tx.6},setvar:session.valid=1,msg:'%{session.sessionid}, tx.0:%{tx.0},tx.6:%{tx.6}'"
SecRule REMOTE_ADDR "^(\d{1,3}\.\d{1,3}\.\d{1,3}\.)"  "chain,capture,setvar:session.ip_block=%{tx.1}"
SecRule REQUEST_HEADERS:User-Agent ".*" "t:none,t:sha1,t:hexEncode,setvar:session.ua=%{matched_var}"

Output of:
Response
HTTP/1.1 200
Server: nginx/1.18.0
Date: Thu, 11 Jun 2020 11:43:06 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Cache-Control: no-store, no-cache, must-revalidate
Pragma: no-cache
X-Powered-By: PHP/7.4.6
Set-Cookie: PHPSESSID=ea101040fa9365d3ad6e921d9e1e04da; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT

AuditLog

ModSecurity: Warning. Matched "Operator Rx' with parameter .*' against variable REQUEST_HEADERS:User-Agent' (Value: curl/7.58.0') [file "/webserver/modsec/modsec.demo.com.conf"] [line "156"] [id "981062"] [rev ""] [msg ", tx.0:PHPSESSID=ea101040fa9365d3ad6e921d9e1e04da; ,tx.6:"] [data ""] [severity "0"] [ver ""] [maturity "0"] [accuracy "0"] [hostname "127.0.0.1"] [uri "/cookies.php"] [unique_id "159187578641.650199"] [ref "o0,44o0,9o0,3v84,50o0,8o0,8v0,9o40,0o0,40v60,11t:sha1,t:hexEncode"]
ModSecurity: Warning. [file "/webserver/modsec/modsec.conf"] [line "15"] [id "980145"] [rev ""] [msg "'Incoming Anomaly Score: 0'"] [data ""] [severity "0"] [ver ""] [maturity "0"] [accuracy "0"] [tag "modsec.demo.com"] [hostname "127.0.0.1"] [uri "/cookies.php"] [unique_id "159187578641.650199"] [ref ""]

Expected Behavior

Based on other TextEditor, Auditlog must have the value of PHPSESSID as below:

[msg ", tx.0:PHPSESSID=ea101040fa9365d3ad6e921d9e1e04da; ,tx.6:ea101040fa9365d3ad6e921d9e1e04da"]

@martinhsv martinhsv self-assigned this Jun 12, 2020
@martinhsv
Copy link
Contributor

Hi @anhkhoa14592 ,

Thank you for the report.

I do see a bug in the handling of matching groups that result in no content. E.g. the matching group '(id)?' where the 'match' occurs because there are 0 occurrences rather than 1.

If you are looking for an immediate workaround, in your case you could consider turning the groups that you do not care about into non-matching groups. I.e. add '?:' at the beginning of each of '(asp|jserv|jw)', '(id)', and '(id|token)|sid)' -- this should then enable you to read the content resulting from the match group '([^\s]+)' using tx.3.

I will work on a code fix for this shortly, though.

@anhkhoa14592
Copy link
Author

Hi @martinhsv,

Thanks you for your support. Hope to receive the update for this bug :).

WGH- added a commit to WGH-/ModSecurity that referenced this issue Sep 5, 2020
Previously, searchAll would stop search when it encountered an empty
matching group in any position. This means that, for example,
regular expression "(a)(b?)(c)" would match string "ac", but the
resulting group list would be ["ac", "a"].

After this change, the resulting list for the aforementioned regular
expression becomes ["ac", "a", "", "c"] like it should've been.

Additionally, this also changes behaviour for multiple matches. For
example, when "aaa00bbb" is matched by "[a-z]*", previously only "aaa"
would be returned. Now the matching list is ["aaa", "", "", "bbb", ""].

The old behaviour was confusing and almost certainly a bug. The new
behaviour is the same as in Python's re.findall.

For reference, though, Go does it somewhat differently: empty matches
at the end of non-empty matches are ignored, so in Go above example is
["aaa", "", "bbb"] instead.

This is the root cause of issue owasp-modsecurity#2336 which has been already fixed by
replacing searchAll call there with a new function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants