Bug Description
When using hermes --tui with a provider in chat_completions API mode (e.g., opencode-go, opencode-zen, kilo, deepseek, openrouter), the exit summary always shows reasoning 0 even when the model produces substantial reasoning tokens. For example, a 53-message session with deepseek-v4-pro shows:
Tokens: 890226 (in 35517, out 6581, cache 848128, reasoning 0)
The cache tokens (848,128) clearly indicate heavy context reuse across a long session — models like deepseek-v4-pro that support extended thinking should have produced non-zero reasoning tokens.
Steps to Reproduce
- Run
hermes --tui with a reasoning-capable model via a chat_completions-mode provider (e.g., deepseek-v4-pro via opencode-go, or claude-sonnet-4 via openrouter)
- Send several prompts that trigger reasoning/thinking
- Press Ctrl-C to exit
- Observe:
reasoning 0 in the exit summary token breakdown
Expected Behavior
The exit summary should show non-zero reasoning_tokens when the model produces reasoning content, matching the actual API usage.
Actual Behavior
reasoning 0 is always displayed for chat_completions mode providers, regardless of actual reasoning token usage.
Root Cause Analysis
In agent/usage_pricing.py:normalize_usage() (line 575–578):
reasoning_tokens = 0
output_details = getattr(response_usage, "output_tokens_details", None)
if output_details:
reasoning_tokens = _to_int(getattr(output_details, "reasoning_tokens", 0))
This code checks output_tokens_details.reasoning_tokens — the field name used by the Codex Responses API (codex_responses mode).
However, for OpenAI Chat Completions (chat_completions mode — used by opencode-go, opencode-zen, kilo, deepseek, openrouter, and most other providers), reasoning tokens are stored in completion_tokens_details.reasoning_tokens:
# OpenAI Chat Completions response usage object:
response.usage.completion_tokens_details.reasoning_tokens
The function never checks completion_tokens_details, so reasoning tokens are always 0 for chat_completions mode.
Note: The mode-specific branches (lines 539–573) correctly handle input/output/cache tokens for each API mode, but the reasoning extraction (lines 575–578) is a single code path that only checks the Codex-format field.
Proposed Fix
Add a fallback to check completion_tokens_details when output_tokens_details is absent:
reasoning_tokens = 0
# Codex Responses format
output_details = getattr(response_usage, "output_tokens_details", None)
if output_details:
reasoning_tokens = _to_int(getattr(output_details, "reasoning_tokens", 0))
# OpenAI Chat Completions format
if reasoning_tokens == 0:
completion_details = getattr(response_usage, "completion_tokens_details", None)
if completion_details:
reasoning_tokens = _to_int(getattr(completion_details, "reasoning_tokens", 0))
Or alternatively, make it mode-aware using the mode parameter already available in the function.
Affected Files
agent/usage_pricing.py — normalize_usage() (line 575–578)
Impact
- All
chat_completions-mode providers: opencode-go, opencode-zen, kilo, deepseek, openrouter, huggingface, nvidia, xiaomi, gmi, arcee, lmstudio, ollama-cloud, custom providers, and any others with transport: openai_chat
- Unaffected:
codex_responses-mode providers (OpenAI Codex, xAI, etc.) — these correctly use output_tokens_details
- Unaffected:
anthropic_messages-mode providers — Anthropic API doesn't break out reasoning tokens separately
Bug Description
When using
hermes --tuiwith a provider inchat_completionsAPI mode (e.g.,opencode-go,opencode-zen,kilo,deepseek,openrouter), the exit summary always showsreasoning 0even when the model produces substantial reasoning tokens. For example, a 53-message session withdeepseek-v4-proshows:The cache tokens (848,128) clearly indicate heavy context reuse across a long session — models like
deepseek-v4-prothat support extended thinking should have produced non-zero reasoning tokens.Steps to Reproduce
hermes --tuiwith a reasoning-capable model via achat_completions-mode provider (e.g.,deepseek-v4-proviaopencode-go, orclaude-sonnet-4viaopenrouter)reasoning 0in the exit summary token breakdownExpected Behavior
The exit summary should show non-zero
reasoning_tokenswhen the model produces reasoning content, matching the actual API usage.Actual Behavior
reasoning 0is always displayed forchat_completionsmode providers, regardless of actual reasoning token usage.Root Cause Analysis
In
agent/usage_pricing.py:normalize_usage()(line 575–578):This code checks
output_tokens_details.reasoning_tokens— the field name used by the Codex Responses API (codex_responsesmode).However, for OpenAI Chat Completions (
chat_completionsmode — used byopencode-go,opencode-zen,kilo,deepseek,openrouter, and most other providers), reasoning tokens are stored incompletion_tokens_details.reasoning_tokens:The function never checks
completion_tokens_details, so reasoning tokens are always 0 forchat_completionsmode.Note: The mode-specific branches (lines 539–573) correctly handle input/output/cache tokens for each API mode, but the reasoning extraction (lines 575–578) is a single code path that only checks the Codex-format field.
Proposed Fix
Add a fallback to check
completion_tokens_detailswhenoutput_tokens_detailsis absent:Or alternatively, make it mode-aware using the
modeparameter already available in the function.Affected Files
agent/usage_pricing.py—normalize_usage()(line 575–578)Impact
chat_completions-mode providers:opencode-go,opencode-zen,kilo,deepseek,openrouter,huggingface,nvidia,xiaomi,gmi,arcee,lmstudio,ollama-cloud,customproviders, and any others withtransport: openai_chatcodex_responses-mode providers (OpenAI Codex, xAI, etc.) — these correctly useoutput_tokens_detailsanthropic_messages-mode providers — Anthropic API doesn't break out reasoning tokens separately