Built with:
opencode • oh-my-openagent • mcp-tef • MCP
MCP server tool calls are unreliable across models and runs. LLMs often pick sub-optimal tools, miss parameters, or invent workarounds (like iterating date ranges) that inflate token costs. These regressions are invisible until they hit production, and there is no structured way to measure efficiency or automate the fix loop.
MCPipeChekr solves this by providing a multi-phase evaluation harness that benchmarks tool-call behavior against a ground-truth baseline and drives an agentic coding loop to fix detected bugs automatically.
- Clone the repository:
git clone https://github.com/thomasmaerz/mcpipechekr.git cd mcpipechekr - Install dependencies:
Ensure
opencode,oh-my-openagent, andmcp-tefare installed and in your PATH. - Configure your MCP server:
Edit
config.yamlto point to youremailindex(or other MCP) server. - Define your tasks:
Add test prompts to
tasks.yaml. - Generate a baseline:
./harness.sh --regenerate-baseline
- Run the harness loop:
./harness.sh
- Review & Approve:
Check the Phase 2 findings in your CLI, then type
approveto let the agent fix the code.
For detailed documentation on the phase pipeline, data schemas, and configuration, visit our GitHub Wiki.
- Phase 0: Ground Truth Generation
- Phase 0.5: Tool Description Linting (
mcp-tef) - Phase 1: Blind Execution (Trace Capture)
- Phase 2: Trace Evaluation (Efficiency & Correctness)
- Phase 3: Agentic Fix Loop (Git Commit)
- Efficiency Ratio: Target ≤ 1.3 (Observed vs. Optimal calls)
- Correctness: Target ≥ 95% match against baseline
- Token Cost Gap: Monotonically decreasing across fix loops