Skip to content

feat(engine): add C language support (tree-sitter-c parser) #300

Description

@peaktwilight

Why

foxguard currently parses 10 source languages (JS/TS, Python, Go, Ruby, Java, PHP, Rust, C#, Swift, Kotlin) plus 5 config formats. C is not among them. This blocks every kernel/system-software security pattern, including the Dirty Frag class rules drafted on feat/dirty-frag-rules (commit 70813a9).

Concrete trigger: the Dirty Frag advisory (2026-05-07) is structurally analyzable — `splice → MSG_SPLICE_PAGES → in-place crypto on shared frag`. We want foxguard's existing Semgrep-compat regex engine and structural rule pipeline to apply to `.c` / `.h` files so the kernel rules already drafted can actually run.

What

Surgical addition. Five edits across three files:

  1. `Cargo.toml` — add `tree-sitter-c = "0.23"` (matches the version pin used by all other tree-sitter crates).
  2. `src/lib.rs` — add `Language::C` variant + Display impl (`"c"`).
  3. `src/engine/parser.rs` — register `Language::C => tree_sitter_c::LANGUAGE.into()` in `parse_file`.
  4. `src/engine/scanner.rs` — map `"c" | "h"` extensions to `Language::C` in `detect_language`. Add to comment-prefix list (`&["//", "/*"]`).
  5. `src/rules/semgrep_compat.rs` — add `"c"` to the `languages:` mapping (around line 808-817).

Acceptance

  • `cargo build` clean; `cargo test` clean (no regressions across the 10 existing languages)
  • A `.c` file with a Semgrep-compat YAML rule (`languages: [c]`, `pattern-regex`) loads and matches
  • Round-trip test: load one of the Dirty Frag rules from `rules/kernel/dirty-frag-class/` against `net/ipv4/esp4.c` (calibration site) and observe a positive match
  • Round-trip test: same rule against the post-patch source matches zero (negative regex suppresses)

Scope (intentionally narrow)

  • Do not ship built-in C rules in this PR. That belongs to the dependent issue.
  • Do not add C++ support (`.cpp`/`.cc`/`.hpp`) yet — separate tree-sitter crate, separate review surface.
  • Do not add taint-flow for C — only structural / regex rules need to work in this MVP.

Effort

~2–3 hours. Mostly mechanical. The risk surface is the rules registry (does anything assume the language set is closed?) which a full `cargo test` will surface.

Related

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions