Skip to content

feat: regex key support for ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag#3526

Open
etiennemunnich wants to merge 3 commits intoowasp-modsecurity:v3/masterfrom
etiennemunnich:feature/ctl-regex-ruleRemoveTarget-byid-bytag
Open

feat: regex key support for ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag#3526
etiennemunnich wants to merge 3 commits intoowasp-modsecurity:v3/masterfrom
etiennemunnich:feature/ctl-regex-ruleRemoveTarget-byid-bytag

Conversation

@etiennemunnich
Copy link
Copy Markdown

Summary

Add regex pattern matching in the variable-key position of ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag.

This enables exclusion patterns like:

ctl:ruleRemoveTargetById=932125;ARGS:/^json\.\d+\.JobDescription$/
ctl:ruleRemoveTargetByTag=XSS;ARGS:/^json\.\d+\.JobDescription$/

Fixes #3505

Problem

JSON body processing generates argument names with unpredictable numeric indices (json.0.JobDescription, json.1.JobDescription, ...). Without regex key support, operators must either:

  1. Write one exclusion per possible array index (impractical)
  2. Exclude the entire ARGS collection from the rule (too broad, loses protection)
  3. Use path-based exclusion (loses all CRS protection for the endpoint)

This is a common pain point for anyone running CRS with JSON/GraphQL APIs.

Approach

Following your guidance in the issue discussion, the regex is compiled once at config load — never recompiled per request. This directly addresses the concern about the v2 PR #3121 where regex compilation ran on every exclusion check.

How it works

  1. Detection: In init(), the /pattern/ delimiter is detected in the target string (e.g. ARGS:/^json\.\d+\.Field$/)
  2. Compilation: The pattern is compiled via Utils::Regex (PCRE2 by default, PCRE1 with --with-pcre) at config load time
  3. Storage: Compiled regex is stored as shared_ptr<Utils::Regex> — shared across all requests, zero per-request allocation
  4. Matching: During rule evaluation, searchAll() runs against the short variable-key string (typically 10-40 chars)
  5. Backward compat: Literal targets (ARGS:password) continue to work unchanged via the existing == / case-insensitive comparison

Shared design for ById and ByTag

Both actions use a common RuleRemoveTargetSpec struct with matchesFullName() and matchesKeyWithCollection() methods, avoiding code duplication.

Lexer change

The scanner character class REMOVE_RULE_TARGET_VALUE (previously REMOVE_RULE_TARGET_BY_ID_VALUE, used only by ById) is now shared by both ById and ByTag. It includes regex metacharacters (^ $ + ( ) | ? \) but not comma — so chained ctl: actions still split correctly on ,.

Context: ModSecurity v2 and Coraza

Engine ById ByTag ByMsg Regex keys
ModSec v2 (v2.9.12)
ModSec v3 (upstream)
Coraza (post PR #1561)
This PR ✅ (ById + ByTag)

This PR adds regex key support to the two actions that v3 already has (ById, ByTag). The missing ctl:ruleRemoveTargetByMsg is a separate, larger discussion and is intentionally excluded.

Files Changed (11 files)

File Change
headers/modsecurity/rule_remove_target_entry.h New. RuleRemoveTargetSpec, ByIdEntry, ByTagEntry structs
headers/modsecurity/transaction.h ById + ByTag list types updated to use new entry structs
src/actions/ctl/rule_remove_target_by_id.cc Parse /pattern/, compile regex in init()
src/actions/ctl/rule_remove_target_by_id.h Add shared_ptr<Regex> member
src/actions/ctl/rule_remove_target_by_tag.cc Parse /pattern/, compile regex in init()
src/actions/ctl/rule_remove_target_by_tag.h Add shared_ptr<Regex> member
src/parser/seclang-scanner.ll REMOVE_RULE_TARGET_VALUE shared by ById + ByTag
src/rule_with_operator.cc Both match paths use target.matchesFullName() / matchesKeyWithCollection()
test/test-cases/regression/issue-3505.json 5 tests: ById regex, ByTag regex, ByTag literal compat
test/test-cases/regression/issue-3505-crs-ctl-byid-tag-msg.json 2 CRS-style tests with @detectSQLi + JSON body
test/test-suite.in Register new test files

Test Results

7 new tests, all passing:

Test Description
ById regex — JSON array keys ARGS:/^json\.\d+\.JobDescription$/ excludes dynamic JSON args
ById regex — suffix match ARGS:/mixpanel$/ excludes args by suffix pattern
ByTag regex — JSON array keys Same as above, matching rules by tag
ByTag regex — suffix match Same as above, matching rules by tag
ByTag literal — backward compat ARGS:password — proves literal targets still work unchanged
CRS-style ById + @detectSQLi Realistic JSON POST with SQL injection, excluded by regex key
CRS-style ByTag + @detectSQLi Same scenario, excluded by tag + regex key

Full regression suite: 5005 total, 4987 pass, 18 skip, 0 fail, 0 error.

Performance

Benchmark: 25,000 iterations × 5 trials, JSON POST with 20 ARGS keys, 3 detection rules (@detectSQLi, @detectXSS, @rx), 2 regex exclusions (one ById, one ByTag).

Scenario Median Per-request
Baseline (no exclusions) 1,209 ms 0.048 ms
With regex exclusions 1,326 ms 0.053 ms
Overhead +117 ms +0.005 ms/req (+9.7%)

Scaling with more ARGS keys:

Keys/request Baseline With regex Overhead
20 1,209 ms 1,326 ms +9.7%
50 2,620 ms 2,873 ms +9.7%
100 5,199 ms 5,543 ms +6.6%

The overhead scales linearly with ARGS count — no exponential blowup. At 100 keys (an extreme JSON body), the per-request cost is +0.014 ms. The cost is the searchAll() call on short variable-name strings against precompiled PCRE2 patterns.

Key design decisions keeping performance in check:

  • Regex compiled once at config load, stored as shared_ptr
  • searchAll() runs on short strings (variable names, typically 10-40 chars)
  • Literal targets use existing == comparison — no regression for non-regex users

Made with Cursor

- 2 test cases: JSON array keys, mixpanel suffix (SecRuleUpdateTargetById parity)
- Expected to fail until regex support is implemented
- OODA baseline: Test 1 parse error, Test 2 HTTP 403 (exclusion not applied)

Made-with: Cursor
- Compile regex at config load, not per-request
- RuleRemoveTargetByIdEntry struct: literal or shared_ptr<Regex>
- Test 2 (ARGS:/mixpanel$/) passes; Test 1 blocked by parser owasp-modsecurity#2927

Made-with: Cursor
…veTargetByTag

Add regex pattern matching in the variable-key position of
ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag, enabling
exclusions like:

  ctl:ruleRemoveTargetById=932125;ARGS:/^json\.\d+\.JobDescription$/
  ctl:ruleRemoveTargetByTag=XSS;ARGS:/^json\.\d+\.JobDescription$/

JSON body processing generates argument names with dynamic array
indices (json.0.Field, json.1.Field, ...). Without regex keys,
operators cannot scope exclusions to specific keys without listing
every possible index or disabling rules entirely.

Design:
- Regex detected by /pattern/ delimiter in COLLECTION:/pattern/
- Compiled once at config load via Utils::Regex (PCRE2/PCRE1)
- Stored as shared_ptr - zero per-request compilation
- Literal targets continue to work unchanged (no breaking change)
- Shared RuleRemoveTargetSpec struct used by both ById and ByTag
- Lexer REMOVE_RULE_TARGET_VALUE class shared by both actions

Aligns ModSecurity v3 with Coraza (corazawaf/coraza#1561).

Fixes owasp-modsecurity#3505
@etiennemunnich etiennemunnich force-pushed the feature/ctl-regex-ruleRemoveTarget-byid-bytag branch from 370f93b to 637ad9c Compare March 28, 2026 22:44
@etiennemunnich
Copy link
Copy Markdown
Author

Known limitation: {m,n} quantifiers in regex keys

After reviewing the discussion in PR #3121 between @airween, @marcstern, and @dune73, I want to flag one inherent limitation.

The comma inside {m,n} quantifiers breaks parsing. For example:

# This will NOT work — comma inside {2,5} terminates the token:
ctl:ruleRemoveTargetById=1;ARGS:/^field\d{2,5}$/

# Parser sees: ARGS:/^field\d{2   ← token ends at comma

This is the same constraint discussed at length in PR #3121 — the comma is the SecLang action separator, and it cannot be included in the lexer character class without breaking ctl:a=...,ctl:b=... chaining.

Workarounds are straightforward:

Instead of Use
\d{2,5} \d\d+ or \d+
[a-z]{3} [a-z][a-z][a-z] or [a-z]+
.{1,10} .+

The character class does include { and } (updated in latest push), so fixed-count quantifiers like \d{3} work fine — only the comma-containing {m,n} form is affected. This matches the same trade-off the v2 PR makes.

I believe this is acceptable for the target use case (JSON key patterns like ^json\.\d+\.FieldName$ and cookie name patterns like ^sess_[a-f0-9]+$), which don't need {m,n}.

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Wildcard/pattern support in ctl:ruleRemoveTargetById (v3)

1 participant