feat: production-agentic infrastructure remediation by wshobson · Pull Request #112 · wshobson/maverick-mcp

wshobson · 2026-03-13T01:06:43Z

Summary

Comprehensive production-agentic remediation across all 9 domains, upgrading the MaverickMCP codebase to production-grade agentic infrastructure.

Cost Controls: CostTrackingLLM wrapper with per-request ($1.00) and daily ($50.00) budget caps, CostAccumulator with thread-safe tracking, externalized model profiles to YAML
Persistent Memory: SQLite write-through persistence for MemoryStore with persistent connections, SqliteSaver-backed LangGraph checkpointer replacing in-memory MemorySaver across all agents
Observability: DecisionLogger with async-safe fire-and-forget DB writes, DecisionLog SQLAlchemy model with indexes, SecretsFilter for API key masking in logs
Deployment: Docker HEALTHCHECK, /health, /health/ready, /health/live endpoints, graceful shutdown with SIGTERM handling, resource limits in docker-compose
Multi-Agent Coordination: SharedAgentContext for cross-agent finding sharing, prior findings injection into agent prompts, real result aggregation with conflict detection
Tool Integration: Per-category rate limiting with sliding window counters, asyncio.timeout() enforcement, error classification (ErrorCategory StrEnum), get_tool_registry_status introspection tool
Security: SecretStr for all API keys in settings, input sanitization (sanitize_ticker, sanitize_text_input, sanitize_portfolio_name), warn_on_public_binding validator
Backtesting Resilience: Replanning nodes with bounded retries, conditional edges for failure recovery, fixed TypedDict dict-style access bug
Vector Store: ChromaDB integration with temporal decay scoring for research caching, deduplication via cosine similarity
Bug Fix: Fixed pre-existing optimize_strategy crash — vbt.utils.params.create_param_combs() expects tuple operation trees, replaced with itertools.product to generate parameter dicts

28 files changed, 5277 insertions, 454 deletions

Test plan

681 unit tests passing, 0 failures (59s)
Ruff lint: all checks passed
67 MCP tools integration-tested against live server (all passing)
Server startup clean with all 90 tools registered
Health/readiness/liveness endpoints verified
Cost tracking initialized and budget enforcement active
Circuit breakers initialized for all 8 external APIs
optimize_strategy backtest tool verified working after fix

Upgrades agentic infrastructure across all 9 production domains: **Cost Controls (C1)**: CostTrackingLLM wrapper enforces per-request ($1) and daily ($50) budget caps on every LLM call via OpenRouterProvider. **Model Profiles (S2)**: Extracted MODEL_PROFILES to YAML config file for updates without code changes. **Persistent Memory (C2)**: SQLite-backed checkpointer replaces in-memory MemorySaver across all 6 agents. Write-through memory stores with persistent SQLite connection (not per-write). **Deployment Hardening (I1)**: /health, /health/ready, /health/live endpoints. Dockerfile HEALTHCHECK. docker-compose restart policies, resource limits, health-conditioned depends_on, graceful shutdown. **Decision Audit Trail (I2)**: DecisionLog SQLAlchemy model with logging on all 10 agent-invoking tools. Error classification and fire-and-forget writes. **Backtesting Replanning (S1)**: 3 bounded retry loops in the LangGraph workflow (market regime, strategy selection, validation). Fixed attribute-style TypedDict access bug. **Research Vector Store (S3)**: ChromaDB-backed semantic cache with temporal decay scoring and deduplication. Optional dependency with graceful fallback. **Multi-Agent Coordination**: SharedAgentContext for cross-agent findings, context handoff in supervisor invoke paths, result aggregation with conflict detection, session cleanup. **Tool Integration**: Per-category rate limiter (8 categories), enforced asyncio.timeout() per tool, ErrorCategory + DecisionStatus enums, tool registry status endpoint. **Security**: SecretStr for all API keys, SecretsFilter log masking, public binding validation, input sanitization module. 681 tests passing, 0 lint errors.

…s.product create_param_combs expects a tuple-based operation tree, not a dict. Use itertools.product to generate parameter combination dicts directly, which is what _generate_signals() actually consumes.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e8b3d732eb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

maverick_mcp/config/settings.py

maverick_mcp/providers/openrouter_provider.py

maverick_mcp/utils/decision_logger.py

claude · 2026-03-13T01:15:53Z

test

claude · 2026-03-13T01:15:58Z

Code Review

Bugs (must fix)

[maverick_mcp/api/routers/data.py:246-262] settings.external_data.api_key now returns SecretStr after the settings refactor in this PR, but line 262 passes it directly as an HTTP header value. SecretStr serializes to masked text when coerced to str, so every call to the external sentiment API sends a redacted key and receives 401 Unauthorized. Fix: use settings.external_data.get_api_key() on line 246 instead.
[maverick_mcp/api/server.py] Double database disposal on shutdown. close_async_db_connections() and engine.dispose() are called in both the FastAPI on_event shutdown handler and the cleanup_database closure registered via shutdown_handler.register_cleanup. Both fire on shutdown; the second engine.dispose() on an already-disposed SQLAlchemy engine raises InvalidRequestError. Remove one of the two registrations.

Warnings (should fix)

[maverick_mcp/providers/openrouter_provider.py - CostTrackingLLM] CostTrackingLLM uses __getattr__ delegation but does not subclass BaseChatModel. LangGraph helpers such as create_react_agent call isinstance(model, BaseChatModel) to enable streaming and tool-call paths -- this returns False for the wrapper, causing AttributeError or silent wrong-path execution at runtime. Also, bind_tools wraps the resulting RunnableBinding in a new CostTrackingLLM, so _llm on the re-wrapped object is no longer a ChatOpenAI. Fix: subclass ChatOpenAI or add a __class__ shim.
[maverick_mcp/api/server.py] on_event was deprecated in FastAPI 0.95 and removed in recent versions. Prefer a lifespan context manager. Currently produces DeprecationWarning and will stop executing silently on a future FastAPI upgrade.

Notes

asyncio.Lock() in CostAccumulator.__init__ is fine in Python 3.10+ (no longer bound to a running loop at construction time).
DatabaseSettings.password migration looks correct -- the url property calls get_password(), so that path is clean.
Retry nodes in backtesting_workflow.py mutate state in-place before delegating to the sub-analyzer; a unit test covering the retry counter increment would confirm the counter survives through analyze_market_regime.

…st tracking - Use get_api_key()/get_exa_api_key() instead of raw SecretStr in data.py and research.py to prevent masked credentials in API calls - Remove duplicate DB/Redis cleanup from on_event("shutdown") handler; shutdown_handler callbacks already handle disposal - Generate per-invocation request_id in CostTrackingLLM instead of per-instance, preventing budget accumulation across unrelated calls - Add __class__ property shim so isinstance(wrapper, BaseChatModel) returns True for LangGraph compatibility - Only set _table_ensured=True on success in DecisionLogger, allowing retry after transient DB failures

claude · 2026-03-13T01:39:32Z

Code Review

Bugs (must fix)

[maverick_mcp/memory/stores.py:119] _persist_entry calls json.dumps(entry["value"]) with no error handling, and set() does not catch the resulting TypeError. Before this PR, MemoryStore.set() only stored in-memory and never serialized. Any caller passing datetime objects, numpy scalars, Decimal values, or Pydantic models (common in financial code) will now get an unhandled TypeError where none occurred before. Fix: wrap _persist_entry body in try/except and log the failure.

Warnings (should fix)

[maverick_mcp/data/vector_store.py:224] query_limit = min(top_k * 3, 100) -- ChromaDB raises ValueError: n_results X > n_elements Y when the count exceeds the collection size. The outer except Exception: return [] swallows this silently, so the vector cache always returns empty until >= 15 documents are stored. Fix: add self._collection.count() or 1 as an additional upper bound.
[maverick_mcp/agents/supervisor.py:~967] In _invoke_technical_agent, the final else branch (agent has neither analyze_stock nor ainvoke) returns the raw query string in a fake success dict. Downstream synthesis treats this as real analysis output with no indication the agent was never invoked. Should return an error dict matching the other failure paths.

Notes

get_persistent_checkpointer() and get_vector_store() singletons use no creation lock. Safe under asyncio but will race during multi-threaded startup. Low risk given FastMCP single-process model.
install_secrets_filter() runs at import time; uvicorn handlers added afterward will not carry the filter. Root-logger addFilter still covers all records in practice.
on_event("shutdown") is deprecated since FastAPI 0.93 in favour of lifespan. Non-breaking now but worth migrating.

wshobson added 2 commits March 12, 2026 20:47

fix: replace broken vbt.utils.params.create_param_combs with itertool…

e8b3d73

…s.product create_param_combs expects a tuple-based operation tree, not a dict. Use itertools.product to generate parameter combination dicts directly, which is what _generate_signals() actually consumes.

chatgpt-codex-connector bot reviewed Mar 13, 2026

View reviewed changes

maverick_mcp/config/settings.py Show resolved Hide resolved

maverick_mcp/providers/openrouter_provider.py Outdated Show resolved Hide resolved

maverick_mcp/utils/decision_logger.py Outdated Show resolved Hide resolved

wshobson merged commit bfb7a7a into main Mar 13, 2026
2 checks passed

wshobson deleted the production-agentic-remediation branch March 13, 2026 01:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: production-agentic infrastructure remediation#112

feat: production-agentic infrastructure remediation#112
wshobson merged 3 commits intomainfrom
production-agentic-remediation

wshobson commented Mar 13, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot commented Mar 13, 2026

Uh oh!

claude bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

claude bot commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

wshobson commented Mar 13, 2026

Summary

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot commented Mar 13, 2026

Uh oh!

claude bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Bugs (must fix)

Warnings (should fix)

Notes

Uh oh!

claude bot commented Mar 13, 2026

Code Review

Bugs (must fix)

Warnings (should fix)

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Mar 13, 2026 •

edited

Loading