Skip to content

[FEATURE][SECURITY]: MCP server source code scanner - Semgrep/Bandit integration #2217

@crivetimihai

Description

@crivetimihai

🔌 Plugin: MCP Server Source Code Scanner - Semgrep/Bandit Integration

Goal

Implement a gateway plugin that performs static analysis on MCP server source code using Semgrep, Bandit, or other SAST tools to detect security vulnerabilities, code quality issues, and dangerous patterns before servers are added to the gateway.

Why Now?

  1. Code-Level Vulnerabilities: Container scanning misses application-level issues like SQL injection, command injection, and insecure deserialization
  2. MCP-Specific Risks: MCP servers execute tools on behalf of AI agents—code vulnerabilities can have amplified impact
  3. Shift-Left Security: Catching issues in code before deployment is cheaper than runtime detection
  4. GitHub Integration: Many MCP servers are deployed from GitHub repos; source scanning is natural
  5. Existing Code Safety Plugin: code_safety_linter plugin detects patterns in outputs, but pre-deployment source analysis is missing

📖 User Stories

US-1: Security Engineer - Scan Source Code for Vulnerabilities

As a Security Engineer
I want MCP server source code scanned for security issues
So that vulnerabilities are caught before deployment

Acceptance Criteria:

Given an MCP server from a GitHub repository:
  source:
    type: github
    repo: org/mcp-server
    branch: main
When the source scan runs:
Then the scanner should:
  - Clone the repository
  - Detect primary language
  - Run appropriate scanners (Semgrep, Bandit)
  - Return findings with:
    - Rule ID and severity
    - File path and line numbers
    - Code snippet
    - Remediation guidance
  - Block if critical findings exist
US-2: Developer - View Scan Findings with Remediation

As a Developer
I want actionable scan findings with code context
So that I can quickly fix security issues

Acceptance Criteria:

Given a scan has completed with findings:
When I view the assessment report:
Then I see for each finding:
  - Severity badge (CRITICAL/HIGH/MEDIUM/LOW)
  - Rule description
  - File path with clickable line number
  - Code snippet with highlighted issue
  - Remediation suggestion
  - Link to rule documentation

🏗 Architecture

Supported Scanners

Scanner Languages Output Format
Semgrep Python, JavaScript, Go, Java, etc. SARIF, JSON
Bandit Python JSON
ESLint (security) JavaScript/TypeScript JSON
CodeQL Multiple SARIF

Plugin Flow

sequenceDiagram
    participant Gateway as Gateway
    participant Plugin as SourceScannerPlugin
    participant Git as Git
    participant Semgrep as Semgrep
    participant Bandit as Bandit

    Gateway->>Plugin: server_pre_register(github_repo)
    Plugin->>Git: Clone repository
    Plugin->>Plugin: Detect languages
    
    par Python detected
        Plugin->>Bandit: bandit -r . -f json
        Bandit-->>Plugin: Python findings
    and All languages
        Plugin->>Semgrep: semgrep --config p/security-audit
        Semgrep-->>Plugin: SARIF findings
    end
    
    Plugin->>Plugin: Merge and deduplicate
    Plugin->>Plugin: Check severity threshold
    Plugin->>Git: Cleanup temp directory
    Plugin-->>Gateway: Findings or block
Loading

📋 Implementation Tasks

  • Create plugins/source_scanner/ directory structure
  • Implement SourceScannerPlugin class
  • Add Semgrep CLI wrapper with SARIF parsing
  • Add Bandit CLI wrapper for Python
  • Implement language detection logic
  • Add Git clone with authentication support
  • Add branch/tag/commit checkout
  • Implement finding deduplication
  • Add severity filtering and thresholds
  • Implement temp directory cleanup
  • Create MCP-specific Semgrep rules (optional)
  • Add scan result caching by commit SHA
  • Create Admin UI for findings display
  • Write unit tests
  • Write integration tests with vulnerable repos
  • Create README.md
  • Pass make verify checks

⚙️ Configuration Example

plugins:
  - name: "SourceScannerPlugin"
    kind: "plugins.source_scanner.source_scanner.SourceScannerPlugin"
    hooks:
      - server_pre_register
      - catalog_pre_deploy
    mode: "enforce"
    priority: 15
    
    config:
      # Scanner selection
      scanners:
        semgrep:
          enabled: true
          rulesets:
            - "p/security-audit"
            - "p/owasp-top-ten"
            - "p/python"
            - "p/javascript"
          extra_args: []
        bandit:
          enabled: true
          severity: "medium"
          confidence: "medium"
      
      # Severity settings
      severity_threshold: "WARNING"  # ERROR | WARNING | INFO
      fail_on_critical: true
      
      # Repository settings
      clone_timeout_seconds: 120
      scan_timeout_seconds: 600
      max_repo_size_mb: 500
      
      # Git authentication
      github_token_env: "GITHUB_TOKEN"
      
      # Caching
      cache_by_commit: true
      cache_ttl_hours: 168  # 1 week

✅ Success Criteria

  • Semgrep integration with security rulesets
  • Bandit integration for Python projects
  • Language detection selects appropriate scanners
  • Git clone with branch/tag support
  • SARIF output parsing
  • Findings stored in assessment database
  • Admin UI shows findings with code context
  • 80%+ test coverage
  • Documentation complete

🔗 Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    MUSTP1: Non-negotiable, critical requirements without which the product is non-functional or unsafeenhancementNew feature or requestpluginspythonPython / backend development (FastAPI)securityImproves securitysweng-group-12SwEng Group 12 - AI-Powered Security Scanner MCP Server for Pre-Deployment ValidationtcdSwEng Projects

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions