Skip to content

Publish false-positive methodology and per-rule precision numbers #9

@peaktwilight

Description

@peaktwilight

Problem

Today the only published validation is Semgrep parity (`tests/semgrep_parity.rs`): "foxguard finds what Semgrep finds on this fixed corpus." That is a correctness check against another tool, not a measurement of precision against real code. There is no public FP rate, no labeled corpus, and no per-rule numbers.

Users evaluating the scanner need to know: for rule X, out of N findings on real code, how many are true positives?

Proposed approach

  1. Labeled corpus. Assemble a small set of real OSS repos (pinned by SHA). For each rule that fires, label each finding TP / FP / unsure, with a one-line justification. Store as JSON alongside the corpus.
  2. Methodology doc. Write `docs/false-positive-methodology.md` explaining corpus selection, labeling criteria, and how to reproduce.
  3. Per-rule precision table. Generate a table of `rule_id | findings | TP | FP | precision` from the labeled data. Publish in the docs site and link from README.
  4. Regression harness. Re-run labeling (or at least finding counts) in CI so precision doesn't silently regress when rules change.

Non-goals

  • Recall measurement (needs ground-truth vuln datasets; separate effort).
  • Benchmarking against Semgrep/CodeQL on precision — we can't publish their rules' numbers.

Acceptance

  • First version of the labeled corpus committed under `benchmarks/precision/`.
  • Methodology doc merged.
  • Per-rule precision table published for at least the top 20 most-triggered rules.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationhelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions