Universal MCP Server for AI Agent Roles, Skills & Cognitive Implants
A semantic router that dynamically loads specialized agent personas, domain skills, and cognitive reasoning implants based on user queries. Works with any MCP-compatible client (Claude Code, Cursor, Windsurf, and others).
git clone <repository-url>
cd Agents
# Run initialization script
./scripts/init_repo.shThe script will:
- β
Create Python virtual environment (
.venv/) - β Install all dependencies
- β
Create
.envconfiguration file - β Validate MCP server configuration
# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp env.example .env
# Edit .env with your API keysCreate .env file with:
LANGFUSE_PUBLIC_KEY=pk-lf-... # Optional: observability
LANGFUSE_SECRET_KEY=sk-lf-... # Optional: observability
LANGFUSE_HOST=https://cloud.langfuse.com
ANTHROPIC_API_KEY=sk-ant-... # Optional: for document OCR
AGENTS_DEBUG=0 # Set to 1 for JSON debug logging in logs/Note: Embeddings are handled locally by
fastembed(ONNX Runtime). Model is selected during setup β no external API key is required for core routing.
The server exposes MCP tools that any compatible client can call:
| Tool | Purpose |
|---|---|
route_and_load(query) |
Semantic routing β finds the best agent, enriches its prompt with relevant skills & implants |
get_agent_context(agent_name, query) |
Direct agent loading when the target is already known |
load_implants(query|task_type) |
Load cognitive reasoning strategies by semantic query or preset bundle |
list_agents() |
Enumerate all available agents with metadata |
log_interaction(agent_name, query, response_content, intent?, action?, outcome?, files?, tags?) |
End-of-turn logger β appends to history.md (deduped by content hash) and, if configured, sends a Langfuse generation trace |
clear_session_cache() |
Reset session cache |
describe_repo(force_refresh=False) |
One-shot repo bootstrap β writes a structured summary into the managed Repository Memory section of CLAUDE.md |
read_history(limit?, since?, query?) |
Recent entries or lazy semantic recall over the action log |
route_and_load(query)β Single-hop routing via semantic cache- Meta Detection β Greetings/short queries auto-route to
universal_agent - Cache Hit β Returns enriched prompt (SUCCESS) or sampled response (SUCCESS_SAMPLED)
- Cache Miss β Returns ROUTE_REQUIRED with agent candidates for client selection
- Tier-Based Enrichment β lite (no extras) / standard (2 skills + 2 implants) / deep (4+ skills + 3 implants)
- Multi-Turn β
context_hashenables delta optimization on follow-up queries
Agents/
βββ agents/ # Agent personas (system prompts, 38 agents)
β βββ software_engineer/
β β βββ system_prompt.mdc
β βββ common/ # Shared agent resources
β βββ capabilities/ # Capability compositions (registry.yaml)
β βββ schemas/ # Validation schemas
βββ skills/ # Reusable knowledge chunks (RAG)
β βββ skill-*.mdc
βββ implants/ # Cognitive reasoning strategies (RAG)
β βββ implant-*.mdc
βββ src/
β βββ server.py # MCP Server entrypoint (FastMCP)
β βββ engine/
β β βββ router.py # Semantic routing (cache-first)
β β βββ skills.py # Skill retrieval (vector search)
β β βββ implants.py # Implant retrieval (vector search)
β β βββ config.py # Centralized configuration
β β βββ embedder.py # FastEmbed wrapper (ONNX Runtime)
β β βββ vector_store.py # NumPy-based vector store
β β βββ enrichment.py # Tier-based context enrichment
β β βββ capabilities.py # Capability registry resolution
β β βββ context.py # Context retrieval (history formatting)
β β βββ language.py # Language detection
β βββ utils/
β βββ prompt_loader.py
β βββ debug_logger.py # Optional JSON debug logging
β βββ langfuse_compat.py # Optional Langfuse layer
βββ data/ # Vector store cache (auto-initialized)
βββ mcp.json # MCP server configuration
βββ pyproject.toml # Python project metadata
βββ requirements.txt
| Component | Description |
|---|---|
| Agents | Specialized personas with unique system prompts |
| Skills | Domain-specific knowledge chunks (retrieved via RAG) |
| Implants | Cognitive patterns & reasoning strategies |
| Router | Semantic matching + caching for fast agent selection |
{
"mcpServers": {
"Agents-Core": {
"command": ".venv/bin/python",
"args": ["src/server.py"]
}
}
}{
"mcpServers": {
"Agents-Core": {
"command": ".venv/bin/python",
"args": ["src/server.py"]
}
}
}source .venv/bin/activate
python src/server.py
# Server communicates via stdin/stdout using MCP protocol- Create directory:
agents/<agent_name>/ - Create
system_prompt.mdcwith frontmatter:
---
identity:
name: "my_agent"
display_name: "My Agent"
role: "Expert in X"
tone: "Professional, Clear"
routing:
domain_keywords: ["keyword1", "keyword2"]
trigger_command: "/my_command"
---
# My Agent System Prompt
## Identity
You are an expert in X...The agent will be auto-discovered by the MCP server on next startup.
Instead of listing skills per agent, you can declare high-level capabilities:
capabilities: [development, dev-security]The enrichment pipeline resolves capabilities to skill bundles via agents/capabilities/registry.yaml. Available capabilities: critical-analysis, content-structure, development, dense-summary, trust-weighted-research, bio-health, tech-documentation, dev-security, consultative-intake, creative-writing, psychology, 3d-printing, data-investigation, epistemic-analysis, code-review, decision-making, product-thinking, temporal-research, performance-engineering, prompt-design, prompt-security, roblox-development, dev-tools, blender-scripting, health-optimization, consumer-research, visualization, child-psychology.
The server ships with a per-repo memory subsystem so each new Claude session does not have to re-explore the codebase from scratch:
describe_repoβ generates a compressed, LLM-consumable repo overview via MCP sampling and writes it into the managed Repository Memory section ofCLAUDE.md. Idempotent: re-runs are no-ops unless the repo manifest changes orforce_refresh=True.log_interactionβ end-of-turn logger. Appendsintent / action / outcomeentries (with optional files and tags) tohistory.mdat the repo root; deduplicated by content hash; rotated tohistory/YYYY-MM.mdwhen the file exceeds 512 KB. Also sends a Langfuse generation trace if keys are configured.read_historyβ returns recent entries by recency/sincefilter, or runs a lazy semantic search backed by the sameNumpyVectorStoreused for routing.
The full design and step-by-step rationale lives in docs/memory-subsystem-spec.md.
β οΈ Privacy warning βhistory.mdcaptures raw prompts and responses. If you paste secrets (API keys, tokens, credentials) into Claude, they will land in this file. It is gitignored by default to keep them out of git history; if you want the action log visible in PRs, removehistory.md/history/from.gitignoreand review entries before pushing.
The framework integrates with LangFuse for tracing:
- All tool calls are automatically traced
- Routing decisions are logged
- Cache hits/misses are tracked
Configure LangFuse in .env or leave blank for local-only operation.
source .venv/bin/activate
python src/server.pyEnable detailed per-call JSON logging:
AGENTS_DEBUG=1 python src/server.pyLogs are written to logs/{YYYY-MM-DD}/{HH-MM-SS.fff}_{tool}_{direction}.json. Zero overhead when disabled.
MIT