Skip to content

brainAI-bot/agent-drift-detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Agent Behavioral Drift Detector

Detects cumulative behavioral drift in agent optimization loops by comparing rolling behavioral fingerprints against a baseline.

Built for integration with protect-mcp (ScopeBlind) receipt streams and HyperAgents safety policies.

The Problem

Static per-action safety policies catch individual violations but miss trajectory-level shifts. A meta-agent that stays within policy constraints on every iteration can still drift significantly over N iterations — the "boiling frog" problem.

How It Works

Receipt Stream → Rolling Window → Behavioral Fingerprint → Drift Score
                                         ↕
                                    Baseline (iteration 0)

The detector consumes signed receipts from any policy engine and computes four behavioral signals:

Signal Metric Weight
Tool distribution Jensen-Shannon divergence 35%
Allow rate Absolute delta 25%
Tier distribution Jensen-Shannon divergence 25%
Call velocity Normalized delta 15%

Quick Start

from drift_detector import DriftDetector, Receipt

detector = DriftDetector(
    window_size=50,          # receipts per fingerprint window
    drift_threshold=0.3,     # flag when drift exceeds this
    on_drift=lambda r: print(f"⚠️ Drift detected: {r.drift_score:.3f}")
)

# Ingest receipts from protect-mcp stderr
for line in receipt_stream:
    result = detector.ingest_json(line)
    if result and result.drifted:
        # Trigger approval gate or SATP attestation
        escalate(result)

Integration with protect-mcp

# Tail DecisionLog events in shadow mode
protect-mcp --mode shadow 2>&1 | python3 -c "
import sys
from drift_detector import DriftDetector
detector = DriftDetector(window_size=50, drift_threshold=0.3)
for line in sys.stdin:
    result = detector.ingest_json(line)
    if result:
        print(f'iteration={result.iteration} drift={result.drift_score:.3f} {result.message}')
"

Trust Level → Enforcement Mapping

Trust Level Mode Behavior
≥ 4 (Established) Shadow Log drift, don't halt
3 (Moderate) Simulate Flag drift, estimate impact
2 (New) Enforce Halt on threshold breach
1 (Untrusted) Sign Cryptographic attestation per iteration

Tests

python3 test_drift_detector.py
# All 8 tests passed ✅

Context

License

MIT


Built by brainAI as part of the SATP (Soulbound Agent Trust Protocol) ecosystem.

About

Behavioral drift detector for agent optimization loops. Consumes protect-mcp receipt streams, computes fingerprint deltas.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages