Skip to content

[Bug]: reasoning_tokens always 0 for chat_completions mode — normalize_usage only extracts from output_tokens_details, missing completion_tokens_details #18466

@Hubedge

Description

@Hubedge

Bug Description

When using hermes --tui with a provider in chat_completions API mode (e.g., opencode-go, opencode-zen, kilo, deepseek, openrouter), the exit summary always shows reasoning 0 even when the model produces substantial reasoning tokens. For example, a 53-message session with deepseek-v4-pro shows:

Tokens:  890226 (in 35517, out 6581, cache 848128, reasoning 0)

The cache tokens (848,128) clearly indicate heavy context reuse across a long session — models like deepseek-v4-pro that support extended thinking should have produced non-zero reasoning tokens.

Steps to Reproduce

  1. Run hermes --tui with a reasoning-capable model via a chat_completions-mode provider (e.g., deepseek-v4-pro via opencode-go, or claude-sonnet-4 via openrouter)
  2. Send several prompts that trigger reasoning/thinking
  3. Press Ctrl-C to exit
  4. Observe: reasoning 0 in the exit summary token breakdown

Expected Behavior

The exit summary should show non-zero reasoning_tokens when the model produces reasoning content, matching the actual API usage.

Actual Behavior

reasoning 0 is always displayed for chat_completions mode providers, regardless of actual reasoning token usage.

Root Cause Analysis

In agent/usage_pricing.py:normalize_usage() (line 575–578):

reasoning_tokens = 0
output_details = getattr(response_usage, "output_tokens_details", None)
if output_details:
    reasoning_tokens = _to_int(getattr(output_details, "reasoning_tokens", 0))

This code checks output_tokens_details.reasoning_tokens — the field name used by the Codex Responses API (codex_responses mode).

However, for OpenAI Chat Completions (chat_completions mode — used by opencode-go, opencode-zen, kilo, deepseek, openrouter, and most other providers), reasoning tokens are stored in completion_tokens_details.reasoning_tokens:

# OpenAI Chat Completions response usage object:
response.usage.completion_tokens_details.reasoning_tokens

The function never checks completion_tokens_details, so reasoning tokens are always 0 for chat_completions mode.

Note: The mode-specific branches (lines 539–573) correctly handle input/output/cache tokens for each API mode, but the reasoning extraction (lines 575–578) is a single code path that only checks the Codex-format field.

Proposed Fix

Add a fallback to check completion_tokens_details when output_tokens_details is absent:

reasoning_tokens = 0
# Codex Responses format
output_details = getattr(response_usage, "output_tokens_details", None)
if output_details:
    reasoning_tokens = _to_int(getattr(output_details, "reasoning_tokens", 0))
# OpenAI Chat Completions format
if reasoning_tokens == 0:
    completion_details = getattr(response_usage, "completion_tokens_details", None)
    if completion_details:
        reasoning_tokens = _to_int(getattr(completion_details, "reasoning_tokens", 0))

Or alternatively, make it mode-aware using the mode parameter already available in the function.

Affected Files

  • agent/usage_pricing.pynormalize_usage() (line 575–578)

Impact

  • All chat_completions-mode providers: opencode-go, opencode-zen, kilo, deepseek, openrouter, huggingface, nvidia, xiaomi, gmi, arcee, lmstudio, ollama-cloud, custom providers, and any others with transport: openai_chat
  • Unaffected: codex_responses-mode providers (OpenAI Codex, xAI, etc.) — these correctly use output_tokens_details
  • Unaffected: anthropic_messages-mode providers — Anthropic API doesn't break out reasoning tokens separately

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt buildertype/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions