Skip to content

thomasmaerz/mcpipechekr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

MCPipeChekr — The MCP Testing & Fix Harness

Harness in Action

Built with:
opencodeoh-my-openagentmcp-tefMCP


🚀 The Problem

MCP server tool calls are unreliable across models and runs. LLMs often pick sub-optimal tools, miss parameters, or invent workarounds (like iterating date ranges) that inflate token costs. These regressions are invisible until they hit production, and there is no structured way to measure efficiency or automate the fix loop.

MCPipeChekr solves this by providing a multi-phase evaluation harness that benchmarks tool-call behavior against a ground-truth baseline and drives an agentic coding loop to fix detected bugs automatically.


🛠 Quick Start (MVP)

  1. Clone the repository:
    git clone https://github.com/thomasmaerz/mcpipechekr.git
    cd mcpipechekr
  2. Install dependencies: Ensure opencode, oh-my-openagent, and mcp-tef are installed and in your PATH.
  3. Configure your MCP server: Edit config.yaml to point to your emailindex (or other MCP) server.
  4. Define your tasks: Add test prompts to tasks.yaml.
  5. Generate a baseline:
    ./harness.sh --regenerate-baseline
  6. Run the harness loop:
    ./harness.sh
  7. Review & Approve: Check the Phase 2 findings in your CLI, then type approve to let the agent fix the code.

🏗 Architecture & Docs

For detailed documentation on the phase pipeline, data schemas, and configuration, visit our GitHub Wiki.

  • Phase 0: Ground Truth Generation
  • Phase 0.5: Tool Description Linting (mcp-tef)
  • Phase 1: Blind Execution (Trace Capture)
  • Phase 2: Trace Evaluation (Efficiency & Correctness)
  • Phase 3: Agentic Fix Loop (Git Commit)

📊 KPIs

  • Efficiency Ratio: Target ≤ 1.3 (Observed vs. Optimal calls)
  • Correctness: Target ≥ 95% match against baseline
  • Token Cost Gap: Monotonically decreasing across fix loops

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors