Skip to content

refactor(taint): deduplicate shared engine mechanics#287

Open
Darkroom4364 wants to merge 2 commits intomainfrom
276-deduplicate-taint-engine
Open

refactor(taint): deduplicate shared engine mechanics#287
Darkroom4364 wants to merge 2 commits intomainfrom
276-deduplicate-taint-engine

Conversation

@Darkroom4364
Copy link
Copy Markdown
Collaborator

@Darkroom4364 Darkroom4364 commented May 3, 2026

Summary

  • Move sanitizer grouping, merged source/sink construction, sink-to-rule attribution, and finding construction into shared taint_engine helpers
  • Reuse shared call/member sink matching across JS/Python/Go taint engines
  • Keep language-specific traversal, assignment/destructuring, expression taint, and cross-file callee resolution local to each engine

Closes #276

Test plan

  • cargo test python_taint
  • cargo test javascript_taint
  • cargo test go_taint
  • cargo test taint
  • cargo test --lib
  • cargo test
  • cargo fmt --check
  • cargo clippy --all-targets --all-features -- -D warnings

Summary by CodeRabbit

  • Refactor
    • Consolidated taint analysis across Go, JavaScript, and Python to use a shared batched attribution pipeline and standardized finding construction, reducing duplication and improving consistency.
  • Bug Fixes
    • Improved sink-to-rule attribution and cross-file finding emission, yielding more accurate and correctly attributed results.
  • Tests
    • Added unit tests covering batched grouping and sink-rule attribution behavior.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 3, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 140c5940-43fa-4679-aa01-476f07920061

📥 Commits

Reviewing files that changed from the base of the PR and between d67e271 and 2a1fc0e.

📒 Files selected for processing (4)
  • src/rules/go_taint.rs
  • src/rules/javascript_taint.rs
  • src/rules/python_taint.rs
  • src/rules/taint_engine.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/rules/python_taint.rs
  • src/rules/go_taint.rs

📝 Walkthrough

Walkthrough

Refactors taint analysis by centralizing batched rule grouping, sink matching, and finding construction in src/rules/taint_engine.rs, then updates Go, JavaScript, and Python taint modules to use those helpers and to emit attributed findings using the new batched metadata.

Changes

Taint Engine Deduplication

Layer / File(s) Summary
Data Shape / Types
src/rules/taint_engine.rs
Adds BatchedTaintGroup and MatchedSink structs and extends AnalysisContext usage to carry sink_to_rules: Option<&HashMap<String, Vec<String>>>.
Batching Logic
src/rules/taint_engine.rs
Adds build_batched_taint_groups to partition and merge BatchedRules by sanitizer equivalence and to build sink_to_rules and allowed_rule_ids.
Sink Matching Helpers
src/rules/taint_engine.rs
Adds match_call_sink, match_member_assign_sink, and shared matching/attribution helpers to resolve sinks and compute attribution keys and owning rule IDs.
Finding Construction & Fan-out
src/rules/taint_engine.rs
Adds taint_finding_for_node, cross_file_taint_finding, attribution_hint_for_sink, and push_attributed_findings to standardize finding creation and fan-out to multiple rule IDs.
Language Wiring — Batched Analysis
src/rules/go_taint.rs, src/rules/javascript_taint.rs, src/rules/python_taint.rs
analyze_tree_batched rewritten to iterate build_batched_taint_groups(rules); per-group intra-file analysis uses group.spec and results are attributed via push_attributed_findings (replacing previous manual grouping and post-loop remapping).
Language Wiring — Sink Matching & Findings
src/rules/go_taint.rs, src/rules/javascript_taint.rs, src/rules/python_taint.rs
Call- and member-assignment handling updated to use match_call_sink / match_member_assign_sink, and findings are created via taint_finding_for_node / cross_file_taint_finding with attribution_hint_for_sink instead of manual TaintFinding assembly.
Tests
src/rules/taint_engine.rs (unit tests)
Adds unit tests validating batch grouping behavior and sink→rule attribution fan-out.

Sequence Diagram(s)

sequenceDiagram
    participant Rules as Rules Input
    participant Batcher as taint_engine::build_batched_taint_groups
    participant Analyzer as Language Analyzer (Go/JS/Py)
    participant Matcher as taint_engine::match_*_sink
    participant Findings as taint_engine::push_attributed_findings / output

    Note over Rules,Batcher: rules grouped by sanitizer fingerprints
    Rules->>Batcher: provide BatchedRule list
    Batcher-->>Analyzer: emit BatchedTaintGroup(s) (spec, sink_to_rules)
    Analyzer->>Matcher: detect sink matches during traversal
    Matcher-->>Analyzer: MatchedSink (description, attribution_key, rule_ids)
    Analyzer->>Findings: create TaintFinding(s) via taint_finding_for_node / cross_file_taint_finding
    Findings->>Findings: fan-out findings by sink_to_rules (push_attributed_findings)
    Findings-->>Rules: emit attributed findings per rule id
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • peaktwilight

Poem

🐰 I hopped through rules in search of one true stream,
Merged the paths, split the hops—now builders dream.
Sinks find their owners, findings neatly spun,
Three tongues share helpers, and the work's half-done.
A little rabbit cheers: refactor—well run!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main refactoring objective of centralizing shared taint engine mechanics across language-specific modules.
Linked Issues check ✅ Passed The PR successfully extracts shared traversal, propagation, and sink matching into generic helpers in taint_engine.rs while keeping language-specific logic in each module, directly addressing #276 objectives.
Out of Scope Changes check ✅ Passed All changes are focused on refactoring taint engine mechanics and introducing shared batching infrastructure; no out-of-scope modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 276-deduplicate-taint-engine

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/rules/taint_engine.rs`:
- Around line 188-205: The code currently deduplicates and attributes sinks
using human-readable sink.description(), causing lost/misattributed findings
when descriptions collide; change dedup/attribution to use a matcher-stable key
(e.g. matcher_fingerprint(...) for each sink matcher) instead of description:
replace seen_sink_descs and sink_to_rule mapping to track matcher fingerprints
and map each fingerprint to a Vec<String> of rule IDs (not a single String),
update merged_sources/merged_sinks logic to dedupe by fingerprint, and modify
match_call_sink / match_member_assign_sink to return all owning rule_ids for
matched sinks so push_attributed_findings can fan out and emit one finding per
rule_id.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: d468f4fe-f7c7-4c09-9459-2a716beaf71a

📥 Commits

Reviewing files that changed from the base of the PR and between 981de86 and d67e271.

📒 Files selected for processing (4)
  • src/rules/go_taint.rs
  • src/rules/javascript_taint.rs
  • src/rules/python_taint.rs
  • src/rules/taint_engine.rs

Comment thread src/rules/taint_engine.rs Outdated
@Darkroom4364 Darkroom4364 requested a review from peaktwilight May 3, 2026 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor(taint): deduplicate taint engine across JS/PY/Go

1 participant