Overview
This is a two-cycle refactor of hermes-agent's provider infrastructure.
Cycle 1 (this issue): Transport layer — Extract format conversion and response normalization from run_agent.py into agent/transports/. Each transport owns convert_messages, convert_tools, build_kwargs, normalize_response. Client lifecycle, streaming, credentials, and prompt caching stay on AIAgent.
Cycle 2 (future): Provider modules — Consolidate per-provider quirks (currently scattered across 5+ files) into single-file provider definitions under providers/. Each provider module declares its auth, endpoints, client headers, temperature behavior, max_tokens defaults, message preprocessing, and extra_body construction in one place. Transports become generic — they read from the provider object instead of checking boolean flags. See Cycle 2 Design below.
Principle: Every PR wires its code to real production paths in the same PR. No dormant abstractions.
Shared Types (agent/transports/types.py)
@dataclass
class ToolCall:
id: str | None # Protocol's canonical ID (call_XXXX, toolu_XXXX, etc.)
name: str
arguments: str # JSON string
provider_data: dict | None = None # Per-tool-call protocol metadata
@dataclass
class NormalizedResponse:
content: str | None
tool_calls: list[ToolCall] | None
finish_reason: str # "stop", "tool_calls", "length", "content_filter"
reasoning: str | None = None # Cross-provider (Anthropic, Codex, DeepSeek, Gemini)
usage: Usage | None = None
provider_data: dict | None = None # Response-level protocol state
Cycle 1: PR Tracker
| PR |
Status |
What it does |
Lines |
| PR 1 #12975 |
✅ Merged |
Extract 10 Codex Responses API functions into agent/codex_responses_adapter.py |
-565 from run_agent.py |
| PR 2 #13347 |
✅ Merged |
Add agent/transports/types.py (NormalizedResponse, ToolCall, Usage) + migrate Anthropic normalize path |
+554 |
| PR 3 #13366 |
✅ Merged |
Add ProviderTransport ABC + AnthropicTransport, wire all Anthropic paths (9 sites) |
+539/-45 |
| PR 4 #13430 |
✅ Merged |
Add ResponsesApiTransport, wire all Codex paths, remove 7 dead wrappers |
+590/-169 |
| PR 5 #13447 |
✅ Merged |
Add ChatCompletionsTransport, wire all default paths (210-line kwargs block extracted) |
+640/-227 |
| PR 6 #13467 |
✅ Merged |
Add BedrockTransport, wire all Bedrock paths |
+383/-13 |
| PR 7 #13862 |
🔄 Open |
Unify dispatch + runtime (combines original PR 7+8). Consolidate 4 transport helpers → 1 _get_transport(), collapse normalize shims, wire ALL response.choices[0] through transports, remove v2 scaffolding, clean dead imports, transport cache lifecycle |
+145/-444 |
| PR 8 |
— |
Folded into PR 7 |
— |
| PR 9 |
📋 Planned |
Documentation — architecture guide, transport authoring guide |
Dep: 7 |
Dependency Graph
PR1 ──→ PR4
PR2 ──→ PR3 ──→ PR4
──→ PR5
──→ PR6
PR4+5+6 ──→ PR7 (includes PR8) ──→ PR9
What the Transport Owns vs What Stays on AIAgent
| Transport owns |
AIAgent keeps |
convert_messages() — OpenAI msgs → provider format |
Client construction (build_anthropic_client, etc.) |
convert_tools() — OpenAI tools → provider format |
Client rebuild/teardown on interrupt |
build_kwargs() — assemble full API call kwargs |
Credential refresh/rotation |
normalize_response() → NormalizedResponse |
Streaming (_call_anthropic, _run_codex_stream) |
validate_response() — structural check |
Prompt caching policy |
extract_cache_stats() — provider-specific cache tokens |
Retry/interrupt threading |
map_finish_reason() — provider stop reason → OpenAI |
Fallback provider routing |
Transport Coverage
| api_mode |
Transport |
build_kwargs |
normalize |
validate |
cache_stats |
finish_reason |
anthropic_messages |
AnthropicTransport |
✅ |
✅ |
✅ |
✅ |
✅ |
codex_responses |
ResponsesApiTransport |
✅ |
✅ |
✅ |
— |
✅ |
chat_completions |
ChatCompletionsTransport |
✅ |
✅ |
✅ |
✅ |
— |
bedrock_converse |
BedrockTransport |
✅ |
✅ |
✅ |
— |
✅ |
Abort Points
Each PR delivers standalone value. Safe stopping points:
- After PR 3 — one transport proven end-to-end, types established
- After PR 6 — all 4 transports wired, transport layer complete
- After PR 7 — dispatch unified, scaffolding removed, zero
response.choices[0] in non-streaming code, full Cycle 1 done
- After PR 9 — documented, ready for Cycle 2
Known Gaps (from codebase stress test)
reasoning_content vs reasoning — two distinct fields downstream, transport merges them into reasoning. The thinking-prefill check reads reasoning_content separately.
- Prompt caching runs between convert and build_kwargs —
apply_anthropic_cache_control mutates messages after conversion. Transport can't produce final API-ready messages alone.
- ChatCompletionsTransport has 13 provider conditionals — flags passed as explicit params. Works but the param list is long. This is the primary motivation for Cycle 2.
flush_memories and iteration_limit_summary have their own normalize dispatch — wired through transports now but still have separate code paths.
- Bedrock normalizes at dispatch site —
normalize_converse_response() called directly at L5191 to produce the OpenAI-compatible SimpleNamespace that flush_memories' hasattr(response, "choices") guard checks. To remove: refactor guard to self.api_mode in (...).
_ephemeral_max_output_tokens is consumed by both Anthropic and chat_completions branches — shared agent state that both transports need.
- Adapter v1 functions return legacy shapes —
normalize_anthropic_response() returns (SimpleNamespace, str), normalize_converse_response() returns OpenAI-compat SimpleNamespace. Transport wraps with 2-layer chain: transport.normalize_response() → v1() → NR mapping. Collapsing to 1 layer requires migrating auxiliary_client.py. Cycle 2 resolves.
auxiliary_client.py bypasses transport entirely — calls build_anthropic_kwargs() and normalize_anthropic_response() directly for compression/vision/flush. Cycle 2 resolves: aux client gets transport instance or provider module interface.
Cycle 2: Provider Modules (Next)
Problem Cycle 1 leaves behind: Provider quirks are still scattered across auth.py, runtime_provider.py, models.py, auxiliary_client.py, run_agent.py, and the transports themselves. Adding a new provider requires touching 5+ files. The ChatCompletionsTransport takes 20+ boolean params because each provider's quirks are passed as flags.
Solution: Consolidate per-provider quirks into single-file provider modules under providers/. Each module declares everything about that provider in one place:
# providers/kimi.py
class KimiProvider:
name = "kimi-coding"
aliases = ["kimi", "moonshot"]
api_mode = "chat_completions"
# Auth (currently in hermes_cli/auth.py)
env_vars = ["KIMI_API_KEY", "MOONSHOT_API_KEY"]
base_url = "https://api.kimi.com/v1"
# Client quirks (currently in run_agent.py __init__)
default_headers = {"User-Agent": "hermes-agent/1.0"}
# Request quirks (currently in auxiliary_client.py)
fixed_temperature = 0.6
default_max_tokens = None
# providers/nvidia.py
class NvidiaProvider:
name = "nvidia"
api_mode = "chat_completions"
env_vars = ["NVIDIA_API_KEY"]
base_url = "https://integrate.api.nvidia.com/v1"
default_max_tokens = 16384 # GLM-4.7 thinking exhaust fix
# providers/qwen.py
class QwenPortalProvider:
name = "qwen-portal"
api_mode = "chat_completions"
env_vars = ["QWEN_API_KEY"]
base_url = "https://portal.qwen.ai/api/v1"
default_max_tokens = 65536
def prepare_messages(self, messages):
"""Normalize content to list-of-dicts, inject cache_control."""
...
def extra_body(self, session_id):
return {
"metadata": {"sessionId": session_id},
"vl_high_resolution_images": True,
}
What changes:
- Transport's
build_kwargs receives a provider object instead of 20 flags
hermes_cli/auth.py reads ProviderConfig from provider modules
hermes_cli/runtime_provider.py resolves api_mode from provider registry
hermes_cli/models.py reads model lists from provider modules
auxiliary_client.py reads temperature/aux config from provider modules
What this enables:
- Adding a new OpenAI-compatible provider = one file (
providers/newprovider.py)
- Each provider's behavior is testable in isolation
- No more "search 5 files to understand how Kimi works"
Transport cleanup (from Cycle 1 gaps 5, 7, 8):
- Collapse adapter v1 normalize functions to return
NormalizedResponse directly (eliminates the 2-layer transport → v1 → NR mapping chain)
- Migrate
auxiliary_client.py to use transports instead of calling adapter functions directly
- Remove bedrock dispatch-site
normalize_converse_response() — refactor flush_memories guard from hasattr(response, "choices") to self.api_mode in (...)
- Remove
_nr_to_assistant_message() shim — downstream code reads NormalizedResponse directly
Current quirk distribution (what Cycle 2 consolidates)
| Quirk |
Provider |
Currently in |
Moves to |
| Fixed temperature 0.6 |
Kimi |
auxiliary_client.py |
providers/kimi.py |
| User-Agent header |
Kimi |
run_agent.py client init |
providers/kimi.py |
| Default max_tokens 16384 |
NVIDIA |
ChatCompletionsTransport |
providers/nvidia.py |
| Default max_tokens 65536 |
Qwen |
ChatCompletionsTransport |
providers/qwen.py |
| Message normalization |
Qwen |
run_agent.py + transport |
providers/qwen.py |
vl_high_resolution_images |
Qwen |
ChatCompletionsTransport |
providers/qwen.py |
| Developer role swap |
GPT-5/Codex |
ChatCompletionsTransport |
providers/openai_codex.py |
think=false suppression |
Ollama/custom |
ChatCompletionsTransport |
providers/custom.py |
num_ctx override |
Ollama |
ChatCompletionsTransport |
providers/custom.py |
| Provider preferences |
OpenRouter |
ChatCompletionsTransport |
providers/openrouter.py |
| Product attribution tags |
Nous |
ChatCompletionsTransport |
providers/nous.py |
| Reasoning extra_body |
OR/Nous/GitHub |
ChatCompletionsTransport |
each provider module |
| xAI conv headers |
xAI/Grok |
ResponsesApiTransport |
providers/xai.py |
| Thinking signatures |
Anthropic |
AnthropicTransport → adapter |
providers/anthropic.py |
| Guardrail config |
Bedrock |
BedrockTransport |
providers/bedrock.py |
| OAuth identity transform |
Anthropic |
adapter |
providers/anthropic.py |
| Encrypted reasoning |
Codex/xAI |
ResponsesApiTransport |
each provider module |
Overview
This is a two-cycle refactor of hermes-agent's provider infrastructure.
Cycle 1 (this issue): Transport layer — Extract format conversion and response normalization from
run_agent.pyintoagent/transports/. Each transport ownsconvert_messages,convert_tools,build_kwargs,normalize_response. Client lifecycle, streaming, credentials, and prompt caching stay onAIAgent.Cycle 2 (future): Provider modules — Consolidate per-provider quirks (currently scattered across 5+ files) into single-file provider definitions under
providers/. Each provider module declares its auth, endpoints, client headers, temperature behavior, max_tokens defaults, message preprocessing, and extra_body construction in one place. Transports become generic — they read from the provider object instead of checking boolean flags. See Cycle 2 Design below.Principle: Every PR wires its code to real production paths in the same PR. No dormant abstractions.
Shared Types (
agent/transports/types.py)Cycle 1: PR Tracker
agent/codex_responses_adapter.pyagent/transports/types.py(NormalizedResponse, ToolCall, Usage) + migrate Anthropic normalize path_get_transport(), collapse normalize shims, wire ALLresponse.choices[0]through transports, remove v2 scaffolding, clean dead imports, transport cache lifecycleDependency Graph
What the Transport Owns vs What Stays on AIAgent
convert_messages()— OpenAI msgs → provider formatbuild_anthropic_client, etc.)convert_tools()— OpenAI tools → provider formatbuild_kwargs()— assemble full API call kwargsnormalize_response()→ NormalizedResponse_call_anthropic,_run_codex_stream)validate_response()— structural checkextract_cache_stats()— provider-specific cache tokensmap_finish_reason()— provider stop reason → OpenAITransport Coverage
anthropic_messagesAnthropicTransportcodex_responsesResponsesApiTransportchat_completionsChatCompletionsTransportbedrock_converseBedrockTransportAbort Points
Each PR delivers standalone value. Safe stopping points:
response.choices[0]in non-streaming code, full Cycle 1 doneKnown Gaps (from codebase stress test)
reasoning_contentvsreasoning— two distinct fields downstream, transport merges them intoreasoning. The thinking-prefill check readsreasoning_contentseparately.apply_anthropic_cache_controlmutates messages after conversion. Transport can't produce final API-ready messages alone.flush_memoriesanditeration_limit_summaryhave their own normalize dispatch — wired through transports now but still have separate code paths.normalize_converse_response()called directly at L5191 to produce the OpenAI-compatible SimpleNamespace thatflush_memories'hasattr(response, "choices")guard checks. To remove: refactor guard toself.api_mode in (...)._ephemeral_max_output_tokensis consumed by both Anthropic and chat_completions branches — shared agent state that both transports need.normalize_anthropic_response()returns(SimpleNamespace, str),normalize_converse_response()returns OpenAI-compat SimpleNamespace. Transport wraps with 2-layer chain:transport.normalize_response()→v1()→ NR mapping. Collapsing to 1 layer requires migratingauxiliary_client.py. Cycle 2 resolves.auxiliary_client.pybypasses transport entirely — callsbuild_anthropic_kwargs()andnormalize_anthropic_response()directly for compression/vision/flush. Cycle 2 resolves: aux client gets transport instance or provider module interface.Cycle 2: Provider Modules (Next)
Problem Cycle 1 leaves behind: Provider quirks are still scattered across
auth.py,runtime_provider.py,models.py,auxiliary_client.py,run_agent.py, and the transports themselves. Adding a new provider requires touching 5+ files. The ChatCompletionsTransport takes 20+ boolean params because each provider's quirks are passed as flags.Solution: Consolidate per-provider quirks into single-file provider modules under
providers/. Each module declares everything about that provider in one place:What changes:
build_kwargsreceives a provider object instead of 20 flagshermes_cli/auth.pyreadsProviderConfigfrom provider moduleshermes_cli/runtime_provider.pyresolves api_mode from provider registryhermes_cli/models.pyreads model lists from provider modulesauxiliary_client.pyreads temperature/aux config from provider modulesWhat this enables:
providers/newprovider.py)Transport cleanup (from Cycle 1 gaps 5, 7, 8):
NormalizedResponsedirectly (eliminates the 2-layertransport → v1 → NR mappingchain)auxiliary_client.pyto use transports instead of calling adapter functions directlynormalize_converse_response()— refactorflush_memoriesguard fromhasattr(response, "choices")toself.api_mode in (...)_nr_to_assistant_message()shim — downstream code readsNormalizedResponsedirectlyCurrent quirk distribution (what Cycle 2 consolidates)
auxiliary_client.pyproviders/kimi.pyrun_agent.pyclient initproviders/kimi.pyChatCompletionsTransportproviders/nvidia.pyChatCompletionsTransportproviders/qwen.pyrun_agent.py+ transportproviders/qwen.pyvl_high_resolution_imagesChatCompletionsTransportproviders/qwen.pyChatCompletionsTransportproviders/openai_codex.pythink=falsesuppressionChatCompletionsTransportproviders/custom.pynum_ctxoverrideChatCompletionsTransportproviders/custom.pyChatCompletionsTransportproviders/openrouter.pyChatCompletionsTransportproviders/nous.pyChatCompletionsTransportResponsesApiTransportproviders/xai.pyAnthropicTransport→ adapterproviders/anthropic.pyBedrockTransportproviders/bedrock.pyproviders/anthropic.pyResponsesApiTransport