refactor(rules): deduplicate hardcoded-secret regex (closes #274)#277
refactor(rules): deduplicate hardcoded-secret regex (closes #274)#277peaktwilight merged 8 commits intomainfrom
Conversation
- Add Cargo seeds: k256, secp256k1, libsecp256k1, ed448-goldilocks (tier 1), openssl (tier 2) - Add pip seeds: pyjwt, authlib, python-jose, jwcrypto (0.8), m2crypto (0.6) - Fix fabric: drop misleading RSA algorithm (wraps paramiko, no crypto itself)
…loses #274) Extract shared HARDCODED_SECRET_PATTERN and CSHARP_HARDCODED_SECRET_PATTERN constants into common.rs; replace inline copies in all 10 language rule files. Adds test asserting the C# pattern is a superset of the base pattern. Also normalises PHP to the full keyword set (was missing auth, credential, private_key).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR refactors hardcoded-secret detection by centralizing pattern definitions into shared constants in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 8/10 reviews remaining, refill in 10 minutes and 42 seconds. Comment |
Greptile SummaryThis PR extracts the shared
Confidence Score: 5/5Safe to merge; all findings are P2 style/documentation concerns with no correctness impact. The core regex-deduplication refactor is mechanically correct — nine of ten language files are no-op replacements, and the PHP expansion is intentional and documented. The only behavioral change (C# gaining src/rules/csharp.rs (undocumented Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[common.rs] -->|HARDCODED_SECRET_PATTERN| B(go.rs)
A -->|HARDCODED_SECRET_PATTERN| C(java.rs)
A -->|HARDCODED_SECRET_PATTERN| D(javascript.rs)
A -->|HARDCODED_SECRET_PATTERN| E(kotlin.rs)
A -->|HARDCODED_SECRET_PATTERN| F(php.rs)
A -->|HARDCODED_SECRET_PATTERN| G(python.rs)
A -->|HARDCODED_SECRET_PATTERN| H(ruby.rs)
A -->|HARDCODED_SECRET_PATTERN| I(rust_lang.rs)
A -->|HARDCODED_SECRET_PATTERN| J(swift.rs)
A -->|CSHARP_HARDCODED_SECRET_PATTERN| K(csharp.rs)
subgraph BASE [HARDCODED_SECRET_PATTERN]
L[password · secret · api_?key · token · auth · credential · private_?key]
end
subgraph CSHARP [CSHARP_HARDCODED_SECRET_PATTERN superset]
M[...base keywords... + connection_?string · connectionstring redundant]
end
A --- BASE
A --- CSHARP
Reviews (1): Last reviewed commit: "refactor(rules): deduplicate hardcoded-s..." | Re-trigger Greptile |
|
|
||
| /// Extended variant for C# that adds `connection_?string` / | ||
| /// `connectionstring` to the base keyword set. | ||
| pub const CSHARP_HARDCODED_SECRET_PATTERN: &str = r"(?i)(password|secret|api_?key|token|auth|credential|private_?key|connection_?string|connectionstring)"; |
There was a problem hiding this comment.
Redundant
connectionstring alternative
connection_?string (with _? making the underscore optional) already matches connectionstring, so the explicit connectionstring alternative at the end is redundant. This was present in the original csharp.rs inline pattern too, but now that it's being lifted into a shared constant it's worth cleaning up.
| pub const CSHARP_HARDCODED_SECRET_PATTERN: &str = r"(?i)(password|secret|api_?key|token|auth|credential|private_?key|connection_?string|connectionstring)"; | |
| pub const CSHARP_HARDCODED_SECRET_PATTERN: &str = r"(?i)(password|secret|api_?key|token|auth|credential|private_?key|connection_?string)"; |
| r"(?i)(password|secret|api_?key|apikey|token|credential|private_?key|connection_?string|connectionstring)", | ||
| ) | ||
| .unwrap(); | ||
| let secret_pattern = Regex::new(CSHARP_HARDCODED_SECRET_PATTERN).unwrap(); |
There was a problem hiding this comment.
C# rule silently gains
auth keyword
The old inline C# pattern was password|secret|api_?key|apikey|token|credential|private_?key|connection_?string|connectionstring — auth was intentionally absent. CSHARP_HARDCODED_SECRET_PATTERN inherits auth from the shared base, so this rule will now fire on C# variables like authToken, authKey, authHeader, etc. that it never flagged before. The PR description only mentions PHP gaining keywords; this C# expansion is undocumented. If it's intentional (makes sense given the normalisation goal), a brief note in the description or a changelog entry would help reviewers and users understand the wider diff in findings.
|
|
||
| const MANIFEST_PQ_CWE: &str = "CWE-327"; | ||
| const MANIFEST_PQ_DESC: &str = "Dependency uses quantum-vulnerable cryptographic algorithm"; |
There was a problem hiding this comment.
Unrelated changes bundled into a regex-dedup PR
The PR title and description are scoped to deduplicating HARDCODED_SECRET_PATTERN, but manifest.rs carries several orthogonal changes: new k256/secp256k1/ed448-goldilocks/pyjwt/etc. seed entries, a BFS deduplication fix (Vec → HashMap), a fabric algorithm reclassification, a new CARGO_PQ_DESC constant, and SARIF depName emission in sarif.rs. Bundling these makes the diff harder to bisect if a regression appears in the manifest or SARIF output. Consider splitting into a separate PR (or at minimum updating the summary to describe all changed behaviour).
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/rules/common.rs (1)
58-63: Consider composing both regex constants from a single shared keyword fragment.The base alternation is duplicated in two constants, so future edits can still drift. Building both from one keyword source would make the superset relationship structural rather than manual.
♻️ Suggested refactor
+pub const HARDCODED_SECRET_KEYWORDS: &str = + r"password|secret|api_?key|token|auth|credential|private_?key"; + -pub const HARDCODED_SECRET_PATTERN: &str = - r"(?i)(password|secret|api_?key|token|auth|credential|private_?key)"; +pub const HARDCODED_SECRET_PATTERN: &str = + concat!(r"(?i)(", HARDCODED_SECRET_KEYWORDS, r")"); -pub const CSHARP_HARDCODED_SECRET_PATTERN: &str = r"(?i)(password|secret|api_?key|token|auth|credential|private_?key|connection_?string|connectionstring)"; +pub const CSHARP_HARDCODED_SECRET_PATTERN: &str = + concat!(r"(?i)(", HARDCODED_SECRET_KEYWORDS, r"|connection_?string|connectionstring)");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/rules/common.rs` around lines 58 - 63, The two regex constants HARDCODED_SECRET_PATTERN and CSHARP_HARDCODED_SECRET_PATTERN duplicate the same alternation; extract the common alternation into a single shared &str (e.g., HARDCODED_SECRET_KEYWORDS) and then build the two constants by composing that fragment (e.g., format! or concat! to wrap with (?i)( ... ) and append the C#-specific `connection_?string|connectionstring` in CSHARP_HARDCODED_SECRET_PATTERN) so the base keyword set is defined once and the C# variant extends it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/rules/manifest.rs`:
- Around line 329-332: The selection of best seed from reached_seeds is
non-deterministic when confidences tie; change the call that computes best
(currently using reached_seeds.values().max_by(...).unwrap()) to iterate with
reached_seeds.iter() and pick the max by a two-field comparison: primary key
v.confidence (use total_cmp) and secondary key the seed name (the HashMap key)
so ties break deterministically by name; update the variable selection that
assigns to best to use the (&key, &value) iterator result and then take the
value part for downstream use.
---
Nitpick comments:
In `@src/rules/common.rs`:
- Around line 58-63: The two regex constants HARDCODED_SECRET_PATTERN and
CSHARP_HARDCODED_SECRET_PATTERN duplicate the same alternation; extract the
common alternation into a single shared &str (e.g., HARDCODED_SECRET_KEYWORDS)
and then build the two constants by composing that fragment (e.g., format! or
concat! to wrap with (?i)( ... ) and append the C#-specific
`connection_?string|connectionstring` in CSHARP_HARDCODED_SECRET_PATTERN) so the
base keyword set is defined once and the C# variant extends it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: d2e3ed62-56c1-468a-acf4-8e02aae6bff1
📒 Files selected for processing (14)
src/report/sarif.rssrc/rules/common.rssrc/rules/csharp.rssrc/rules/go.rssrc/rules/java.rssrc/rules/javascript.rssrc/rules/kotlin.rssrc/rules/manifest.rssrc/rules/php.rssrc/rules/python.rssrc/rules/ruby.rssrc/rules/rust_lang.rssrc/rules/swift.rswww/src/data/rules.ts
- Fix import formatting in javascript.rs and python.rs - Make reached_seeds best-pick deterministic by breaking confidence ties on seed name
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/rules/python.rs (1)
79-81: Please add regression coverage for shared-pattern precision in this rule.Now that this rule depends on
HARDCODED_SECRET_PATTERN, add/extendpy/no-hardcoded-secrettests for both new positives (e.g.,auth,credential,private_key) and obvious benign negatives (e.g.,author) to catch FP drift early.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/rules/python.rs` around lines 79 - 81, Add regression tests for the shared HARDCODED_SECRET_PATTERN precision changes by extending the py/no-hardcoded-secret test suite: add positive test cases containing identifiers/strings like "auth", "credential", "private_key" that should be flagged by the rule and add negative test cases such as "author" (and other benign substrings) that must not be flagged; ensure the tests reference the rule py/no-hardcoded-secret so they exercise the Regex created from HARDCODED_SECRET_PATTERN in src/rules/python.rs and fail if the pattern becomes overly permissive or regresses.src/rules/manifest.rs (1)
142-171: Add a collision guard for normalized package keys.Because
pip_mapis built viacollect(), any future normalized-name collision would silently overwrite an entry. Consider a small invariant test to fail fast if curated seeds ever collide after PEP 503 normalization.Proposed invariant test
#[cfg(test)] mod tests { use super::*; + use std::collections::HashSet; @@ + #[test] + fn pip_seed_names_normalize_uniquely() { + let mut seen = HashSet::new(); + for entry in PIP_PACKAGES { + let key = entry.name.to_lowercase().replace(['_', '.'], "-"); + assert!(seen.insert(key), "duplicate normalized PIP seed name found"); + } + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/rules/manifest.rs` around lines 142 - 171, The curated SeedEntry list can silently overwrite entries when building pip_map because names are PEP-503 normalized and collected without collision checks; modify the code that builds pip_map (where SeedEntry items are iterated and inserted into pip_map) to detect if a normalized key already exists and fail fast (panic or return Err) with a clear message including both conflicting original names, and add a unit test that normalizes all SeedEntry.name values and asserts there are no duplicates (i.e., length of set == length of seeds) to catch future collisions; reference the SeedEntry struct and the pip_map construction site so reviewers can find and update the insertion logic and add the invariant test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/rules/manifest.rs`:
- Around line 142-171: The curated SeedEntry list can silently overwrite entries
when building pip_map because names are PEP-503 normalized and collected without
collision checks; modify the code that builds pip_map (where SeedEntry items are
iterated and inserted into pip_map) to detect if a normalized key already exists
and fail fast (panic or return Err) with a clear message including both
conflicting original names, and add a unit test that normalizes all
SeedEntry.name values and asserts there are no duplicates (i.e., length of set
== length of seeds) to catch future collisions; reference the SeedEntry struct
and the pip_map construction site so reviewers can find and update the insertion
logic and add the invariant test.
In `@src/rules/python.rs`:
- Around line 79-81: Add regression tests for the shared
HARDCODED_SECRET_PATTERN precision changes by extending the
py/no-hardcoded-secret test suite: add positive test cases containing
identifiers/strings like "auth", "credential", "private_key" that should be
flagged by the rule and add negative test cases such as "author" (and other
benign substrings) that must not be flagged; ensure the tests reference the rule
py/no-hardcoded-secret so they exercise the Regex created from
HARDCODED_SECRET_PATTERN in src/rules/python.rs and fail if the pattern becomes
overly permissive or regresses.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 0718461b-1a3b-40ea-a980-3d01e75b6842
📒 Files selected for processing (3)
src/rules/javascript.rssrc/rules/manifest.rssrc/rules/python.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/rules/javascript.rs
|
Approved — clean dedup of HARDCODED_SECRET_PATTERN across 5 language rule files, with the C# extension correctly tested as a superset. PHP normalisation is a nice catch. Merging. |
Summary
HARDCODED_SECRET_PATTERNandCSHARP_HARDCODED_SECRET_PATTERNintocommon.rsauth,credential,private_key)Test plan
cargo test— all 564 tests passSummary by CodeRabbit
New Features
connection_stringandconnectionstringpatterns.Bug Fixes
Refactor