Skip to content

refactor(rules): deduplicate hardcoded-secret regex (closes #274)#277

Merged
peaktwilight merged 8 commits intomainfrom
274-dedup-secret-regex
Apr 30, 2026
Merged

refactor(rules): deduplicate hardcoded-secret regex (closes #274)#277
peaktwilight merged 8 commits intomainfrom
274-dedup-secret-regex

Conversation

@Darkroom4364
Copy link
Copy Markdown
Collaborator

@Darkroom4364 Darkroom4364 commented Apr 26, 2026

Summary

  • Extract HARDCODED_SECRET_PATTERN and CSHARP_HARDCODED_SECRET_PATTERN into common.rs
  • Replace inline regex copies across all 10 language rule files
  • Normalises PHP to the full keyword set (was missing auth, credential, private_key)
  • Adds test asserting the C# pattern is a superset of the base

Test plan

  • cargo test — all 564 tests pass
  • Verify no new FPs on a sample repo scan

Summary by CodeRabbit

  • New Features

    • C# hardcoded secret detection now includes connection_string and connectionstring patterns.
  • Bug Fixes

    • Improved seed crate selection in manifest analysis by prioritizing entries with higher confidence scores when multiple occurrences are discovered.
  • Refactor

    • Consolidated hardcoded secret detection patterns across supported languages for consistency.

- Add Cargo seeds: k256, secp256k1, libsecp256k1, ed448-goldilocks (tier 1), openssl (tier 2)
- Add pip seeds: pyjwt, authlib, python-jose, jwcrypto (0.8), m2crypto (0.6)
- Fix fabric: drop misleading RSA algorithm (wraps paramiko, no crypto itself)
…loses #274)

Extract shared HARDCODED_SECRET_PATTERN and CSHARP_HARDCODED_SECRET_PATTERN
constants into common.rs; replace inline copies in all 10 language rule files.
Adds test asserting the C# pattern is a superset of the base pattern.
Also normalises PHP to the full keyword set (was missing auth, credential,
private_key).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 274d5db8-5a3f-47fb-ab33-2a65c83a15ca

📥 Commits

Reviewing files that changed from the base of the PR and between 85957e0 and 96b9aee.

📒 Files selected for processing (1)
  • src/rules/manifest.rs

📝 Walkthrough

Walkthrough

This PR refactors hardcoded-secret detection by centralizing pattern definitions into shared constants in common.rs, creating a base pattern and a C#-specific extension. Language-specific rule files (Go, Java, JavaScript, Kotlin, PHP, Python, Ruby, Swift, C#) now import and use these constants instead of inline regex patterns. Additionally, manifest.rs improves seed entry selection by comparing confidence scores with deterministic tie-breaking.

Changes

Cohort / File(s) Summary
Shared Constants Definition
src/rules/common.rs
Exports two new constants: HARDCODED_SECRET_PATTERN for base regex and CSHARP_HARDCODED_SECRET_PATTERN for C#-specific extension including connection_?string. Includes tests asserting C# pattern is a superset of base pattern.
Language Rule Consolidation
src/rules/go.rs, src/rules/java.rs, src/rules/kotlin.rs, src/rules/php.rs, src/rules/python.rs, src/rules/ruby.rs, src/rules/swift.rs
Updated rules to import and use HARDCODED_SECRET_PATTERN from common module, replacing inline hardcoded regex strings. Matching and detection logic unchanged.
C# Rule Extension
src/rules/csharp.rs
Updated to use CSHARP_HARDCODED_SECRET_PATTERN constant from common module instead of inline pattern, enabling language-specific keyword matching.
Manifest Seed Selection Logic
src/rules/manifest.rs
Enhanced CargoLockPqCrypto::check to compare confidence scores when multiple BFS-discovered seed crates with the same name are found, selecting highest-confidence entry with deterministic name-based tie-breaking.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • peaktwilight

Poem

🐰 Constants hop and patterns dance,
No more duplication's prance,
C# secrets gain their place,
While seeds pick winners by grace! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'refactor(rules): deduplicate hardcoded-secret regex (closes #274)' accurately summarizes the main change: extracting and consolidating duplicate hardcoded-secret regex patterns across language rules into shared constants.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 274-dedup-secret-regex

Review rate limit: 8/10 reviews remaining, refill in 10 minutes and 42 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 26, 2026

Greptile Summary

This PR extracts the shared HARDCODED_SECRET_PATTERN and CSHARP_HARDCODED_SECRET_PATTERN constants into common.rs and replaces ten inline regex copies across the language rule files, eliminating duplication. It also bundles unrelated manifest.rs changes (new PQ-seed entries, a BFS deduplication fix, fabric reclassification) and SARIF depName emission that fall outside the stated scope.

  • The C# rule silently gains the auth keyword (present in the base pattern, absent in the old inline C# pattern); this widens C# detection and should be documented alongside the PHP keyword additions.
  • CSHARP_HARDCODED_SECRET_PATTERN contains a redundant connectionstring alternative — connection_?string already matches it via the optional _.

Confidence Score: 5/5

Safe to merge; all findings are P2 style/documentation concerns with no correctness impact.

The core regex-deduplication refactor is mechanically correct — nine of ten language files are no-op replacements, and the PHP expansion is intentional and documented. The only behavioral change (C# gaining auth) is validated by the new superset test and is desirable, just undocumented. The redundant connectionstring alternative and bundled manifest changes are cleanup/scoping concerns that don't affect correctness or security. All findings are P2.

src/rules/csharp.rs (undocumented auth addition), src/rules/common.rs (redundant connectionstring), src/rules/manifest.rs (out-of-scope changes)

Important Files Changed

Filename Overview
src/rules/common.rs Introduces HARDCODED_SECRET_PATTERN and CSHARP_HARDCODED_SECRET_PATTERN constants; CSHARP variant contains a redundant connectionstring alternative already covered by connection_?string
src/rules/csharp.rs Replaces inline regex with CSHARP_HARDCODED_SECRET_PATTERN; silently adds auth keyword to C# detection (absent in old pattern)
src/rules/php.rs Intentional expansion of PHP pattern from 4 keywords to the full base set (adds auth, credential, private_key); documented in PR description
src/rules/manifest.rs Adds new PQ-vulnerable seed entries, fixes BFS deduplication via HashMap, reclassifies fabric, and introduces CARGO_PQ_DESC — all unrelated to the PR's stated scope
src/report/sarif.rs Adds dep_name to SARIF properties output when present on a finding; clean and correct
src/rules/go.rs Mechanical replacement of inline regex with HARDCODED_SECRET_PATTERN; pattern is identical so no behavioral change
src/rules/java.rs Old pattern had explicit apiKey alternative; api_?key with (?i) already covers it, so no behavioral change
src/rules/kotlin.rs Same as Java — explicit apiKey was redundant under case-insensitive api_?key; clean replacement
www/src/data/rules.ts Updates cargo-pq-vulnerable-dep description to match the new CARGO_PQ_DESC note about dev-dependencies

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[common.rs] -->|HARDCODED_SECRET_PATTERN| B(go.rs)
    A -->|HARDCODED_SECRET_PATTERN| C(java.rs)
    A -->|HARDCODED_SECRET_PATTERN| D(javascript.rs)
    A -->|HARDCODED_SECRET_PATTERN| E(kotlin.rs)
    A -->|HARDCODED_SECRET_PATTERN| F(php.rs)
    A -->|HARDCODED_SECRET_PATTERN| G(python.rs)
    A -->|HARDCODED_SECRET_PATTERN| H(ruby.rs)
    A -->|HARDCODED_SECRET_PATTERN| I(rust_lang.rs)
    A -->|HARDCODED_SECRET_PATTERN| J(swift.rs)
    A -->|CSHARP_HARDCODED_SECRET_PATTERN| K(csharp.rs)

    subgraph BASE [HARDCODED_SECRET_PATTERN]
        L[password · secret · api_?key · token · auth · credential · private_?key]
    end

    subgraph CSHARP [CSHARP_HARDCODED_SECRET_PATTERN superset]
        M[...base keywords... + connection_?string · connectionstring redundant]
    end

    A --- BASE
    A --- CSHARP
Loading

Reviews (1): Last reviewed commit: "refactor(rules): deduplicate hardcoded-s..." | Re-trigger Greptile

Comment thread src/rules/common.rs

/// Extended variant for C# that adds `connection_?string` /
/// `connectionstring` to the base keyword set.
pub const CSHARP_HARDCODED_SECRET_PATTERN: &str = r"(?i)(password|secret|api_?key|token|auth|credential|private_?key|connection_?string|connectionstring)";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant connectionstring alternative

connection_?string (with _? making the underscore optional) already matches connectionstring, so the explicit connectionstring alternative at the end is redundant. This was present in the original csharp.rs inline pattern too, but now that it's being lifted into a shared constant it's worth cleaning up.

Suggested change
pub const CSHARP_HARDCODED_SECRET_PATTERN: &str = r"(?i)(password|secret|api_?key|token|auth|credential|private_?key|connection_?string|connectionstring)";
pub const CSHARP_HARDCODED_SECRET_PATTERN: &str = r"(?i)(password|secret|api_?key|token|auth|credential|private_?key|connection_?string)";

Comment thread src/rules/csharp.rs
r"(?i)(password|secret|api_?key|apikey|token|credential|private_?key|connection_?string|connectionstring)",
)
.unwrap();
let secret_pattern = Regex::new(CSHARP_HARDCODED_SECRET_PATTERN).unwrap();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 C# rule silently gains auth keyword

The old inline C# pattern was password|secret|api_?key|apikey|token|credential|private_?key|connection_?string|connectionstringauth was intentionally absent. CSHARP_HARDCODED_SECRET_PATTERN inherits auth from the shared base, so this rule will now fire on C# variables like authToken, authKey, authHeader, etc. that it never flagged before. The PR description only mentions PHP gaining keywords; this C# expansion is undocumented. If it's intentional (makes sense given the normalisation goal), a brief note in the description or a changelog entry would help reviewers and users understand the wider diff in findings.

Comment thread src/rules/manifest.rs
Comment on lines 15 to 17

const MANIFEST_PQ_CWE: &str = "CWE-327";
const MANIFEST_PQ_DESC: &str = "Dependency uses quantum-vulnerable cryptographic algorithm";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unrelated changes bundled into a regex-dedup PR

The PR title and description are scoped to deduplicating HARDCODED_SECRET_PATTERN, but manifest.rs carries several orthogonal changes: new k256/secp256k1/ed448-goldilocks/pyjwt/etc. seed entries, a BFS deduplication fix (VecHashMap), a fabric algorithm reclassification, a new CARGO_PQ_DESC constant, and SARIF depName emission in sarif.rs. Bundling these makes the diff harder to bisect if a regression appears in the manifest or SARIF output. Consider splitting into a separate PR (or at minimum updating the summary to describe all changed behaviour).

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/rules/common.rs (1)

58-63: Consider composing both regex constants from a single shared keyword fragment.

The base alternation is duplicated in two constants, so future edits can still drift. Building both from one keyword source would make the superset relationship structural rather than manual.

♻️ Suggested refactor
+pub const HARDCODED_SECRET_KEYWORDS: &str =
+    r"password|secret|api_?key|token|auth|credential|private_?key";
+
-pub const HARDCODED_SECRET_PATTERN: &str =
-    r"(?i)(password|secret|api_?key|token|auth|credential|private_?key)";
+pub const HARDCODED_SECRET_PATTERN: &str =
+    concat!(r"(?i)(", HARDCODED_SECRET_KEYWORDS, r")");

-pub const CSHARP_HARDCODED_SECRET_PATTERN: &str = r"(?i)(password|secret|api_?key|token|auth|credential|private_?key|connection_?string|connectionstring)";
+pub const CSHARP_HARDCODED_SECRET_PATTERN: &str =
+    concat!(r"(?i)(", HARDCODED_SECRET_KEYWORDS, r"|connection_?string|connectionstring)");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rules/common.rs` around lines 58 - 63, The two regex constants
HARDCODED_SECRET_PATTERN and CSHARP_HARDCODED_SECRET_PATTERN duplicate the same
alternation; extract the common alternation into a single shared &str (e.g.,
HARDCODED_SECRET_KEYWORDS) and then build the two constants by composing that
fragment (e.g., format! or concat! to wrap with (?i)( ... ) and append the
C#-specific `connection_?string|connectionstring` in
CSHARP_HARDCODED_SECRET_PATTERN) so the base keyword set is defined once and the
C# variant extends it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/rules/manifest.rs`:
- Around line 329-332: The selection of best seed from reached_seeds is
non-deterministic when confidences tie; change the call that computes best
(currently using reached_seeds.values().max_by(...).unwrap()) to iterate with
reached_seeds.iter() and pick the max by a two-field comparison: primary key
v.confidence (use total_cmp) and secondary key the seed name (the HashMap key)
so ties break deterministically by name; update the variable selection that
assigns to best to use the (&key, &value) iterator result and then take the
value part for downstream use.

---

Nitpick comments:
In `@src/rules/common.rs`:
- Around line 58-63: The two regex constants HARDCODED_SECRET_PATTERN and
CSHARP_HARDCODED_SECRET_PATTERN duplicate the same alternation; extract the
common alternation into a single shared &str (e.g., HARDCODED_SECRET_KEYWORDS)
and then build the two constants by composing that fragment (e.g., format! or
concat! to wrap with (?i)( ... ) and append the C#-specific
`connection_?string|connectionstring` in CSHARP_HARDCODED_SECRET_PATTERN) so the
base keyword set is defined once and the C# variant extends it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: d2e3ed62-56c1-468a-acf4-8e02aae6bff1

📥 Commits

Reviewing files that changed from the base of the PR and between 4ba172a and f2ed293.

📒 Files selected for processing (14)
  • src/report/sarif.rs
  • src/rules/common.rs
  • src/rules/csharp.rs
  • src/rules/go.rs
  • src/rules/java.rs
  • src/rules/javascript.rs
  • src/rules/kotlin.rs
  • src/rules/manifest.rs
  • src/rules/php.rs
  • src/rules/python.rs
  • src/rules/ruby.rs
  • src/rules/rust_lang.rs
  • src/rules/swift.rs
  • www/src/data/rules.ts

Comment thread src/rules/manifest.rs Outdated
- Fix import formatting in javascript.rs and python.rs
- Make reached_seeds best-pick deterministic by breaking
  confidence ties on seed name
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/rules/python.rs (1)

79-81: Please add regression coverage for shared-pattern precision in this rule.

Now that this rule depends on HARDCODED_SECRET_PATTERN, add/extend py/no-hardcoded-secret tests for both new positives (e.g., auth, credential, private_key) and obvious benign negatives (e.g., author) to catch FP drift early.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rules/python.rs` around lines 79 - 81, Add regression tests for the
shared HARDCODED_SECRET_PATTERN precision changes by extending the
py/no-hardcoded-secret test suite: add positive test cases containing
identifiers/strings like "auth", "credential", "private_key" that should be
flagged by the rule and add negative test cases such as "author" (and other
benign substrings) that must not be flagged; ensure the tests reference the rule
py/no-hardcoded-secret so they exercise the Regex created from
HARDCODED_SECRET_PATTERN in src/rules/python.rs and fail if the pattern becomes
overly permissive or regresses.
src/rules/manifest.rs (1)

142-171: Add a collision guard for normalized package keys.

Because pip_map is built via collect(), any future normalized-name collision would silently overwrite an entry. Consider a small invariant test to fail fast if curated seeds ever collide after PEP 503 normalization.

Proposed invariant test
 #[cfg(test)]
 mod tests {
     use super::*;
+    use std::collections::HashSet;
@@
+    #[test]
+    fn pip_seed_names_normalize_uniquely() {
+        let mut seen = HashSet::new();
+        for entry in PIP_PACKAGES {
+            let key = entry.name.to_lowercase().replace(['_', '.'], "-");
+            assert!(seen.insert(key), "duplicate normalized PIP seed name found");
+        }
+    }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rules/manifest.rs` around lines 142 - 171, The curated SeedEntry list can
silently overwrite entries when building pip_map because names are PEP-503
normalized and collected without collision checks; modify the code that builds
pip_map (where SeedEntry items are iterated and inserted into pip_map) to detect
if a normalized key already exists and fail fast (panic or return Err) with a
clear message including both conflicting original names, and add a unit test
that normalizes all SeedEntry.name values and asserts there are no duplicates
(i.e., length of set == length of seeds) to catch future collisions; reference
the SeedEntry struct and the pip_map construction site so reviewers can find and
update the insertion logic and add the invariant test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/rules/manifest.rs`:
- Around line 142-171: The curated SeedEntry list can silently overwrite entries
when building pip_map because names are PEP-503 normalized and collected without
collision checks; modify the code that builds pip_map (where SeedEntry items are
iterated and inserted into pip_map) to detect if a normalized key already exists
and fail fast (panic or return Err) with a clear message including both
conflicting original names, and add a unit test that normalizes all
SeedEntry.name values and asserts there are no duplicates (i.e., length of set
== length of seeds) to catch future collisions; reference the SeedEntry struct
and the pip_map construction site so reviewers can find and update the insertion
logic and add the invariant test.

In `@src/rules/python.rs`:
- Around line 79-81: Add regression tests for the shared
HARDCODED_SECRET_PATTERN precision changes by extending the
py/no-hardcoded-secret test suite: add positive test cases containing
identifiers/strings like "auth", "credential", "private_key" that should be
flagged by the rule and add negative test cases such as "author" (and other
benign substrings) that must not be flagged; ensure the tests reference the rule
py/no-hardcoded-secret so they exercise the Regex created from
HARDCODED_SECRET_PATTERN in src/rules/python.rs and fail if the pattern becomes
overly permissive or regresses.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 0718461b-1a3b-40ea-a980-3d01e75b6842

📥 Commits

Reviewing files that changed from the base of the PR and between f2ed293 and 85957e0.

📒 Files selected for processing (3)
  • src/rules/javascript.rs
  • src/rules/manifest.rs
  • src/rules/python.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/rules/javascript.rs

@peaktwilight
Copy link
Copy Markdown
Collaborator

Approved — clean dedup of HARDCODED_SECRET_PATTERN across 5 language rule files, with the C# extension correctly tested as a superset. PHP normalisation is a nice catch. Merging.

@peaktwilight peaktwilight merged commit 76d1c57 into main Apr 30, 2026
17 checks passed
@peaktwilight peaktwilight deleted the 274-dedup-secret-regex branch April 30, 2026 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants