tracking: provider transport refactor (agent/transports/)

## Overview

This is a two-cycle refactor of hermes-agent's provider infrastructure.

**Cycle 1 (this issue): Transport layer** — Extract format conversion and response normalization from `run_agent.py` into `agent/transports/`. Each transport owns `convert_messages`, `convert_tools`, `build_kwargs`, `normalize_response`. Client lifecycle, streaming, credentials, and prompt caching stay on `AIAgent`.

**Cycle 2 (future): Provider modules** — Consolidate per-provider quirks (currently scattered across 5+ files) into single-file provider definitions under `providers/`. Each provider module declares its auth, endpoints, client headers, temperature behavior, max_tokens defaults, message preprocessing, and extra_body construction in one place. Transports become generic — they read from the provider object instead of checking boolean flags. See [Cycle 2 Design](#cycle-2-provider-modules-next) below.

**Principle:** Every PR wires its code to real production paths in the same PR. No dormant abstractions.

---

## Shared Types (`agent/transports/types.py`)

```python
@dataclass
class ToolCall:
    id: str | None          # Protocol's canonical ID (call_XXXX, toolu_XXXX, etc.)
    name: str
    arguments: str          # JSON string
    provider_data: dict | None = None   # Per-tool-call protocol metadata

@dataclass
class NormalizedResponse:
    content: str | None
    tool_calls: list[ToolCall] | None
    finish_reason: str                  # "stop", "tool_calls", "length", "content_filter"
    reasoning: str | None = None        # Cross-provider (Anthropic, Codex, DeepSeek, Gemini)
    usage: Usage | None = None
    provider_data: dict | None = None   # Response-level protocol state
```

---

## Cycle 1: PR Tracker

| PR | Status | What it does | Lines |
|----|--------|-------------|-------|
| **PR 1** #12975 | ✅ Merged | Extract 10 Codex Responses API functions into `agent/codex_responses_adapter.py` | -565 from run_agent.py |
| **PR 2** #13347 | ✅ Merged | Add `agent/transports/types.py` (NormalizedResponse, ToolCall, Usage) + migrate Anthropic normalize path | +554 |
| **PR 3** #13366 | ✅ Merged | Add ProviderTransport ABC + AnthropicTransport, wire all Anthropic paths (9 sites) | +539/-45 |
| **PR 4** #13430 | ✅ Merged | Add ResponsesApiTransport, wire all Codex paths, remove 7 dead wrappers | +590/-169 |
| **PR 5** #13447 | ✅ Merged | Add ChatCompletionsTransport, wire all default paths (210-line kwargs block extracted) | +640/-227 |
| **PR 6** #13467 | ✅ Merged | Add BedrockTransport, wire all Bedrock paths | +383/-13 |
| **PR 7** #13862 | 🔄 Open | Unify dispatch + runtime (combines original PR 7+8). Consolidate 4 transport helpers → 1 `_get_transport()`, collapse normalize shims, wire ALL `response.choices[0]` through transports, remove v2 scaffolding, clean dead imports, transport cache lifecycle | +145/-444 |
| **PR 8** | — | Folded into PR 7 | — |
| **PR 9** | 📋 Planned | Documentation — architecture guide, transport authoring guide | Dep: 7 |

### Dependency Graph
```
PR1 ──→ PR4
PR2 ──→ PR3 ──→ PR4
              ──→ PR5
              ──→ PR6
                    PR4+5+6 ──→ PR7 (includes PR8) ──→ PR9
```

---

## What the Transport Owns vs What Stays on AIAgent

| Transport owns | AIAgent keeps |
|----------------|--------------|
| `convert_messages()` — OpenAI msgs → provider format | Client construction (`build_anthropic_client`, etc.) |
| `convert_tools()` — OpenAI tools → provider format | Client rebuild/teardown on interrupt |
| `build_kwargs()` — assemble full API call kwargs | Credential refresh/rotation |
| `normalize_response()` → NormalizedResponse | Streaming (`_call_anthropic`, `_run_codex_stream`) |
| `validate_response()` — structural check | Prompt caching policy |
| `extract_cache_stats()` — provider-specific cache tokens | Retry/interrupt threading |
| `map_finish_reason()` — provider stop reason → OpenAI | Fallback provider routing |

---

## Transport Coverage

| api_mode | Transport | build_kwargs | normalize | validate | cache_stats | finish_reason |
|----------|-----------|:---:|:---:|:---:|:---:|:---:|
| `anthropic_messages` | `AnthropicTransport` | ✅ | ✅ | ✅ | ✅ | ✅ |
| `codex_responses` | `ResponsesApiTransport` | ✅ | ✅ | ✅ | — | ✅ |
| `chat_completions` | `ChatCompletionsTransport` | ✅ | ✅ | ✅ | ✅ | — |
| `bedrock_converse` | `BedrockTransport` | ✅ | ✅ | ✅ | — | ✅ |

---

## Abort Points

Each PR delivers standalone value. Safe stopping points:

- **After PR 3** — one transport proven end-to-end, types established
- **After PR 6** — all 4 transports wired, transport layer complete
- **After PR 7** — dispatch unified, scaffolding removed, zero `response.choices[0]` in non-streaming code, full Cycle 1 done
- **After PR 9** — documented, ready for Cycle 2

---

## Known Gaps (from codebase stress test)

1. **`reasoning_content` vs `reasoning`** — two distinct fields downstream, transport merges them into `reasoning`. The thinking-prefill check reads `reasoning_content` separately.
2. **Prompt caching runs between convert and build_kwargs** — `apply_anthropic_cache_control` mutates messages after conversion. Transport can't produce final API-ready messages alone.
3. **ChatCompletionsTransport has 13 provider conditionals** — flags passed as explicit params. Works but the param list is long. **This is the primary motivation for Cycle 2.**
4. **`flush_memories` and `iteration_limit_summary` have their own normalize dispatch** — wired through transports now but still have separate code paths.
5. **Bedrock normalizes at dispatch site** — `normalize_converse_response()` called directly at L5191 to produce the OpenAI-compatible SimpleNamespace that `flush_memories`' `hasattr(response, "choices")` guard checks. To remove: refactor guard to `self.api_mode in (...)`.
6. **`_ephemeral_max_output_tokens`** is consumed by both Anthropic and chat_completions branches — shared agent state that both transports need.
7. **Adapter v1 functions return legacy shapes** — `normalize_anthropic_response()` returns `(SimpleNamespace, str)`, `normalize_converse_response()` returns OpenAI-compat SimpleNamespace. Transport wraps with 2-layer chain: `transport.normalize_response()` → `v1()` → NR mapping. Collapsing to 1 layer requires migrating `auxiliary_client.py`. **Cycle 2 resolves.**
8. **`auxiliary_client.py` bypasses transport entirely** — calls `build_anthropic_kwargs()` and `normalize_anthropic_response()` directly for compression/vision/flush. **Cycle 2 resolves: aux client gets transport instance or provider module interface.**

---

## Cycle 2: Provider Modules (Next)

**Problem Cycle 1 leaves behind:** Provider quirks are still scattered across `auth.py`, `runtime_provider.py`, `models.py`, `auxiliary_client.py`, `run_agent.py`, and the transports themselves. Adding a new provider requires touching 5+ files. The ChatCompletionsTransport takes 20+ boolean params because each provider's quirks are passed as flags.

**Solution:** Consolidate per-provider quirks into single-file provider modules under `providers/`. Each module declares everything about that provider in one place:

```python
# providers/kimi.py
class KimiProvider:
    name = "kimi-coding"
    aliases = ["kimi", "moonshot"]
    api_mode = "chat_completions"
    
    # Auth (currently in hermes_cli/auth.py)
    env_vars = ["KIMI_API_KEY", "MOONSHOT_API_KEY"]
    base_url = "https://api.kimi.com/v1"
    
    # Client quirks (currently in run_agent.py __init__)
    default_headers = {"User-Agent": "hermes-agent/1.0"}
    
    # Request quirks (currently in auxiliary_client.py)
    fixed_temperature = 0.6
    default_max_tokens = None
```

```python
# providers/nvidia.py
class NvidiaProvider:
    name = "nvidia"
    api_mode = "chat_completions"
    env_vars = ["NVIDIA_API_KEY"]
    base_url = "https://integrate.api.nvidia.com/v1"
    default_max_tokens = 16384  # GLM-4.7 thinking exhaust fix
```

```python
# providers/qwen.py
class QwenPortalProvider:
    name = "qwen-portal"
    api_mode = "chat_completions"
    env_vars = ["QWEN_API_KEY"]
    base_url = "https://portal.qwen.ai/api/v1"
    default_max_tokens = 65536
    
    def prepare_messages(self, messages):
        """Normalize content to list-of-dicts, inject cache_control."""
        ...
    
    def extra_body(self, session_id):
        return {
            "metadata": {"sessionId": session_id},
            "vl_high_resolution_images": True,
        }
```

**What changes:**
- Transport's `build_kwargs` receives a provider object instead of 20 flags
- `hermes_cli/auth.py` reads `ProviderConfig` from provider modules
- `hermes_cli/runtime_provider.py` resolves api_mode from provider registry
- `hermes_cli/models.py` reads model lists from provider modules
- `auxiliary_client.py` reads temperature/aux config from provider modules

**What this enables:**
- Adding a new OpenAI-compatible provider = one file (`providers/newprovider.py`)
- Each provider's behavior is testable in isolation
- No more "search 5 files to understand how Kimi works"

**Transport cleanup (from Cycle 1 gaps 5, 7, 8):**
- Collapse adapter v1 normalize functions to return `NormalizedResponse` directly (eliminates the 2-layer `transport → v1 → NR mapping` chain)
- Migrate `auxiliary_client.py` to use transports instead of calling adapter functions directly
- Remove bedrock dispatch-site `normalize_converse_response()` — refactor `flush_memories` guard from `hasattr(response, "choices")` to `self.api_mode in (...)`
- Remove `_nr_to_assistant_message()` shim — downstream code reads `NormalizedResponse` directly

### Current quirk distribution (what Cycle 2 consolidates)

| Quirk | Provider | Currently in | Moves to |
|-------|----------|-------------|----------|
| Fixed temperature 0.6 | Kimi | `auxiliary_client.py` | `providers/kimi.py` |
| User-Agent header | Kimi | `run_agent.py` client init | `providers/kimi.py` |
| Default max_tokens 16384 | NVIDIA | `ChatCompletionsTransport` | `providers/nvidia.py` |
| Default max_tokens 65536 | Qwen | `ChatCompletionsTransport` | `providers/qwen.py` |
| Message normalization | Qwen | `run_agent.py` + transport | `providers/qwen.py` |
| `vl_high_resolution_images` | Qwen | `ChatCompletionsTransport` | `providers/qwen.py` |
| Developer role swap | GPT-5/Codex | `ChatCompletionsTransport` | `providers/openai_codex.py` |
| `think=false` suppression | Ollama/custom | `ChatCompletionsTransport` | `providers/custom.py` |
| `num_ctx` override | Ollama | `ChatCompletionsTransport` | `providers/custom.py` |
| Provider preferences | OpenRouter | `ChatCompletionsTransport` | `providers/openrouter.py` |
| Product attribution tags | Nous | `ChatCompletionsTransport` | `providers/nous.py` |
| Reasoning extra_body | OR/Nous/GitHub | `ChatCompletionsTransport` | each provider module |
| xAI conv headers | xAI/Grok | `ResponsesApiTransport` | `providers/xai.py` |
| Thinking signatures | Anthropic | `AnthropicTransport` → adapter | `providers/anthropic.py` |
| Guardrail config | Bedrock | `BedrockTransport` | `providers/bedrock.py` |
| OAuth identity transform | Anthropic | adapter | `providers/anthropic.py` |
| Encrypted reasoning | Codex/xAI | `ResponsesApiTransport` | each provider module |




Transport owns	AIAgent keeps
`convert_messages()` — OpenAI msgs → provider format	Client construction (`build_anthropic_client`, etc.)
`convert_tools()` — OpenAI tools → provider format	Client rebuild/teardown on interrupt
`build_kwargs()` — assemble full API call kwargs	Credential refresh/rotation
`normalize_response()` → NormalizedResponse	Streaming (`_call_anthropic`, `_run_codex_stream`)
`validate_response()` — structural check	Prompt caching policy
`extract_cache_stats()` — provider-specific cache tokens	Retry/interrupt threading
`map_finish_reason()` — provider stop reason → OpenAI	Fallback provider routing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tracking: provider transport refactor (agent/transports/) #13473

Overview

Shared Types (`agent/transports/types.py`)

Cycle 1: PR Tracker

Dependency Graph

What the Transport Owns vs What Stays on AIAgent

Transport Coverage

Abort Points

Known Gaps (from codebase stress test)

Cycle 2: Provider Modules (Next)

Current quirk distribution (what Cycle 2 consolidates)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PR	Status	What it does	Lines
PR 1 #12975	✅ Merged	Extract 10 Codex Responses API functions into `agent/codex_responses_adapter.py`	-565 from run_agent.py
PR 2 #13347	✅ Merged	Add `agent/transports/types.py` (NormalizedResponse, ToolCall, Usage) + migrate Anthropic normalize path	+554
PR 3 #13366	✅ Merged	Add ProviderTransport ABC + AnthropicTransport, wire all Anthropic paths (9 sites)	+539/-45
PR 4 #13430	✅ Merged	Add ResponsesApiTransport, wire all Codex paths, remove 7 dead wrappers	+590/-169
PR 5 #13447	✅ Merged	Add ChatCompletionsTransport, wire all default paths (210-line kwargs block extracted)	+640/-227
PR 6 #13467	✅ Merged	Add BedrockTransport, wire all Bedrock paths	+383/-13
PR 7 #13862	🔄 Open	Unify dispatch + runtime (combines original PR 7+8). Consolidate 4 transport helpers → 1 `_get_transport()`, collapse normalize shims, wire ALL `response.choices[0]` through transports, remove v2 scaffolding, clean dead imports, transport cache lifecycle	+145/-444
PR 8	—	Folded into PR 7	—
PR 9	📋 Planned	Documentation — architecture guide, transport authoring guide	Dep: 7

api_mode	Transport	build_kwargs	normalize	validate	cache_stats	finish_reason
`anthropic_messages`	`AnthropicTransport`	✅	✅	✅	✅	✅
`codex_responses`	`ResponsesApiTransport`	✅	✅	✅	—	✅
`chat_completions`	`ChatCompletionsTransport`	✅	✅	✅	✅	—
`bedrock_converse`	`BedrockTransport`	✅	✅	✅	—	✅

Quirk	Provider	Currently in	Moves to
Fixed temperature 0.6	Kimi	`auxiliary_client.py`	`providers/kimi.py`
User-Agent header	Kimi	`run_agent.py` client init	`providers/kimi.py`
Default max_tokens 16384	NVIDIA	`ChatCompletionsTransport`	`providers/nvidia.py`
Default max_tokens 65536	Qwen	`ChatCompletionsTransport`	`providers/qwen.py`
Message normalization	Qwen	`run_agent.py` + transport	`providers/qwen.py`
`vl_high_resolution_images`	Qwen	`ChatCompletionsTransport`	`providers/qwen.py`
Developer role swap	GPT-5/Codex	`ChatCompletionsTransport`	`providers/openai_codex.py`
`think=false` suppression	Ollama/custom	`ChatCompletionsTransport`	`providers/custom.py`
`num_ctx` override	Ollama	`ChatCompletionsTransport`	`providers/custom.py`
Provider preferences	OpenRouter	`ChatCompletionsTransport`	`providers/openrouter.py`
Product attribution tags	Nous	`ChatCompletionsTransport`	`providers/nous.py`
Reasoning extra_body	OR/Nous/GitHub	`ChatCompletionsTransport`	each provider module
xAI conv headers	xAI/Grok	`ResponsesApiTransport`	`providers/xai.py`
Thinking signatures	Anthropic	`AnthropicTransport` → adapter	`providers/anthropic.py`
Guardrail config	Bedrock	`BedrockTransport`	`providers/bedrock.py`
OAuth identity transform	Anthropic	adapter	`providers/anthropic.py`
Encrypted reasoning	Codex/xAI	`ResponsesApiTransport`	each provider module

tracking: provider transport refactor (agent/transports/) #13473

Description

Overview

Shared Types (agent/transports/types.py)

Cycle 1: PR Tracker

Dependency Graph

What the Transport Owns vs What Stays on AIAgent

Transport Coverage

Abort Points

Known Gaps (from codebase stress test)

Cycle 2: Provider Modules (Next)

Current quirk distribution (what Cycle 2 consolidates)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Shared Types (`agent/transports/types.py`)