Skip to content

tracking: provider transport refactor (agent/transports/) #13473

@kshitijk4poor

Description

@kshitijk4poor

Overview

This is a two-cycle refactor of hermes-agent's provider infrastructure.

Cycle 1 (this issue): Transport layer — Extract format conversion and response normalization from run_agent.py into agent/transports/. Each transport owns convert_messages, convert_tools, build_kwargs, normalize_response. Client lifecycle, streaming, credentials, and prompt caching stay on AIAgent.

Cycle 2 (future): Provider modules — Consolidate per-provider quirks (currently scattered across 5+ files) into single-file provider definitions under providers/. Each provider module declares its auth, endpoints, client headers, temperature behavior, max_tokens defaults, message preprocessing, and extra_body construction in one place. Transports become generic — they read from the provider object instead of checking boolean flags. See Cycle 2 Design below.

Principle: Every PR wires its code to real production paths in the same PR. No dormant abstractions.


Shared Types (agent/transports/types.py)

@dataclass
class ToolCall:
    id: str | None          # Protocol's canonical ID (call_XXXX, toolu_XXXX, etc.)
    name: str
    arguments: str          # JSON string
    provider_data: dict | None = None   # Per-tool-call protocol metadata

@dataclass
class NormalizedResponse:
    content: str | None
    tool_calls: list[ToolCall] | None
    finish_reason: str                  # "stop", "tool_calls", "length", "content_filter"
    reasoning: str | None = None        # Cross-provider (Anthropic, Codex, DeepSeek, Gemini)
    usage: Usage | None = None
    provider_data: dict | None = None   # Response-level protocol state

Cycle 1: PR Tracker

PR Status What it does Lines
PR 1 #12975 ✅ Merged Extract 10 Codex Responses API functions into agent/codex_responses_adapter.py -565 from run_agent.py
PR 2 #13347 ✅ Merged Add agent/transports/types.py (NormalizedResponse, ToolCall, Usage) + migrate Anthropic normalize path +554
PR 3 #13366 ✅ Merged Add ProviderTransport ABC + AnthropicTransport, wire all Anthropic paths (9 sites) +539/-45
PR 4 #13430 ✅ Merged Add ResponsesApiTransport, wire all Codex paths, remove 7 dead wrappers +590/-169
PR 5 #13447 ✅ Merged Add ChatCompletionsTransport, wire all default paths (210-line kwargs block extracted) +640/-227
PR 6 #13467 ✅ Merged Add BedrockTransport, wire all Bedrock paths +383/-13
PR 7 #13862 🔄 Open Unify dispatch + runtime (combines original PR 7+8). Consolidate 4 transport helpers → 1 _get_transport(), collapse normalize shims, wire ALL response.choices[0] through transports, remove v2 scaffolding, clean dead imports, transport cache lifecycle +145/-444
PR 8 Folded into PR 7
PR 9 📋 Planned Documentation — architecture guide, transport authoring guide Dep: 7

Dependency Graph

PR1 ──→ PR4
PR2 ──→ PR3 ──→ PR4
              ──→ PR5
              ──→ PR6
                    PR4+5+6 ──→ PR7 (includes PR8) ──→ PR9

What the Transport Owns vs What Stays on AIAgent

Transport owns AIAgent keeps
convert_messages() — OpenAI msgs → provider format Client construction (build_anthropic_client, etc.)
convert_tools() — OpenAI tools → provider format Client rebuild/teardown on interrupt
build_kwargs() — assemble full API call kwargs Credential refresh/rotation
normalize_response() → NormalizedResponse Streaming (_call_anthropic, _run_codex_stream)
validate_response() — structural check Prompt caching policy
extract_cache_stats() — provider-specific cache tokens Retry/interrupt threading
map_finish_reason() — provider stop reason → OpenAI Fallback provider routing

Transport Coverage

api_mode Transport build_kwargs normalize validate cache_stats finish_reason
anthropic_messages AnthropicTransport
codex_responses ResponsesApiTransport
chat_completions ChatCompletionsTransport
bedrock_converse BedrockTransport

Abort Points

Each PR delivers standalone value. Safe stopping points:

  • After PR 3 — one transport proven end-to-end, types established
  • After PR 6 — all 4 transports wired, transport layer complete
  • After PR 7 — dispatch unified, scaffolding removed, zero response.choices[0] in non-streaming code, full Cycle 1 done
  • After PR 9 — documented, ready for Cycle 2

Known Gaps (from codebase stress test)

  1. reasoning_content vs reasoning — two distinct fields downstream, transport merges them into reasoning. The thinking-prefill check reads reasoning_content separately.
  2. Prompt caching runs between convert and build_kwargsapply_anthropic_cache_control mutates messages after conversion. Transport can't produce final API-ready messages alone.
  3. ChatCompletionsTransport has 13 provider conditionals — flags passed as explicit params. Works but the param list is long. This is the primary motivation for Cycle 2.
  4. flush_memories and iteration_limit_summary have their own normalize dispatch — wired through transports now but still have separate code paths.
  5. Bedrock normalizes at dispatch sitenormalize_converse_response() called directly at L5191 to produce the OpenAI-compatible SimpleNamespace that flush_memories' hasattr(response, "choices") guard checks. To remove: refactor guard to self.api_mode in (...).
  6. _ephemeral_max_output_tokens is consumed by both Anthropic and chat_completions branches — shared agent state that both transports need.
  7. Adapter v1 functions return legacy shapesnormalize_anthropic_response() returns (SimpleNamespace, str), normalize_converse_response() returns OpenAI-compat SimpleNamespace. Transport wraps with 2-layer chain: transport.normalize_response()v1() → NR mapping. Collapsing to 1 layer requires migrating auxiliary_client.py. Cycle 2 resolves.
  8. auxiliary_client.py bypasses transport entirely — calls build_anthropic_kwargs() and normalize_anthropic_response() directly for compression/vision/flush. Cycle 2 resolves: aux client gets transport instance or provider module interface.

Cycle 2: Provider Modules (Next)

Problem Cycle 1 leaves behind: Provider quirks are still scattered across auth.py, runtime_provider.py, models.py, auxiliary_client.py, run_agent.py, and the transports themselves. Adding a new provider requires touching 5+ files. The ChatCompletionsTransport takes 20+ boolean params because each provider's quirks are passed as flags.

Solution: Consolidate per-provider quirks into single-file provider modules under providers/. Each module declares everything about that provider in one place:

# providers/kimi.py
class KimiProvider:
    name = "kimi-coding"
    aliases = ["kimi", "moonshot"]
    api_mode = "chat_completions"
    
    # Auth (currently in hermes_cli/auth.py)
    env_vars = ["KIMI_API_KEY", "MOONSHOT_API_KEY"]
    base_url = "https://api.kimi.com/v1"
    
    # Client quirks (currently in run_agent.py __init__)
    default_headers = {"User-Agent": "hermes-agent/1.0"}
    
    # Request quirks (currently in auxiliary_client.py)
    fixed_temperature = 0.6
    default_max_tokens = None
# providers/nvidia.py
class NvidiaProvider:
    name = "nvidia"
    api_mode = "chat_completions"
    env_vars = ["NVIDIA_API_KEY"]
    base_url = "https://integrate.api.nvidia.com/v1"
    default_max_tokens = 16384  # GLM-4.7 thinking exhaust fix
# providers/qwen.py
class QwenPortalProvider:
    name = "qwen-portal"
    api_mode = "chat_completions"
    env_vars = ["QWEN_API_KEY"]
    base_url = "https://portal.qwen.ai/api/v1"
    default_max_tokens = 65536
    
    def prepare_messages(self, messages):
        """Normalize content to list-of-dicts, inject cache_control."""
        ...
    
    def extra_body(self, session_id):
        return {
            "metadata": {"sessionId": session_id},
            "vl_high_resolution_images": True,
        }

What changes:

  • Transport's build_kwargs receives a provider object instead of 20 flags
  • hermes_cli/auth.py reads ProviderConfig from provider modules
  • hermes_cli/runtime_provider.py resolves api_mode from provider registry
  • hermes_cli/models.py reads model lists from provider modules
  • auxiliary_client.py reads temperature/aux config from provider modules

What this enables:

  • Adding a new OpenAI-compatible provider = one file (providers/newprovider.py)
  • Each provider's behavior is testable in isolation
  • No more "search 5 files to understand how Kimi works"

Transport cleanup (from Cycle 1 gaps 5, 7, 8):

  • Collapse adapter v1 normalize functions to return NormalizedResponse directly (eliminates the 2-layer transport → v1 → NR mapping chain)
  • Migrate auxiliary_client.py to use transports instead of calling adapter functions directly
  • Remove bedrock dispatch-site normalize_converse_response() — refactor flush_memories guard from hasattr(response, "choices") to self.api_mode in (...)
  • Remove _nr_to_assistant_message() shim — downstream code reads NormalizedResponse directly

Current quirk distribution (what Cycle 2 consolidates)

Quirk Provider Currently in Moves to
Fixed temperature 0.6 Kimi auxiliary_client.py providers/kimi.py
User-Agent header Kimi run_agent.py client init providers/kimi.py
Default max_tokens 16384 NVIDIA ChatCompletionsTransport providers/nvidia.py
Default max_tokens 65536 Qwen ChatCompletionsTransport providers/qwen.py
Message normalization Qwen run_agent.py + transport providers/qwen.py
vl_high_resolution_images Qwen ChatCompletionsTransport providers/qwen.py
Developer role swap GPT-5/Codex ChatCompletionsTransport providers/openai_codex.py
think=false suppression Ollama/custom ChatCompletionsTransport providers/custom.py
num_ctx override Ollama ChatCompletionsTransport providers/custom.py
Provider preferences OpenRouter ChatCompletionsTransport providers/openrouter.py
Product attribution tags Nous ChatCompletionsTransport providers/nous.py
Reasoning extra_body OR/Nous/GitHub ChatCompletionsTransport each provider module
xAI conv headers xAI/Grok ResponsesApiTransport providers/xai.py
Thinking signatures Anthropic AnthropicTransport → adapter providers/anthropic.py
Guardrail config Bedrock BedrockTransport providers/bedrock.py
OAuth identity transform Anthropic adapter providers/anthropic.py
Encrypted reasoning Codex/xAI ResponsesApiTransport each provider module

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions