Skip to content

fix(insights): show cache tokens in /insights token breakdown#18632

Open
liuhao1024 wants to merge 1 commit into
NousResearch:mainfrom
liuhao1024:fix/insights-token-cache-breakdown
Open

fix(insights): show cache tokens in /insights token breakdown#18632
liuhao1024 wants to merge 1 commit into
NousResearch:mainfrom
liuhao1024:fix/insights-token-cache-breakdown

Conversation

@liuhao1024
Copy link
Copy Markdown
Contributor

Summary

The /insights report's total_tokens includes cache_read_tokens + cache_write_tokens, but the (in: / out: ) breakdown only showed input and output tokens. For Anthropic users with prompt caching enabled, cache tokens can dominate the total (e.g., 5.2M total but only 2.6K input / 88K output), making the in/out numbers appear swapped or wrong.

Root Cause

In agent/insights.py, the _compute_overview() method correctly sums all four token categories into total_tokens:

total_tokens = total_input + total_output + total_cache_read + total_cache_write

But both display methods (format_terminal and format_gateway) only showed input and output:

Tokens: 5,239,805 (in: 2,642 / out: 88,463)   # 5.15M cache tokens hidden!

Fix

  • Terminal format: Added a Cache tokens line (with read/write breakdown) that appears only when cache tokens are non-zero
  • Gateway format: Appends / cache: N to the existing (in: / out: ) breakdown when cache tokens exist
  • Both formats remain clean (no cache mention) when cache tokens are all zero

After fix

Tokens: 5,239,805 (in: 2,642 / out: 88,463 / cache: 5,148,700)

Test Plan

  • 5 new regression tests for cache token display behavior
  • All 61 insights tests pass (pytest tests/agent/test_insights.py)
  • Existing zero-cache tests still pass (no noise when cache = 0)

Closes #18615

The /insights report total_tokens includes cache_read_tokens +
cache_write_tokens, but the (in: / out: ) breakdown only showed
input and output tokens. For Anthropic users with prompt caching,
the cache tokens dominate the total, making the in/out numbers
appear wrong or swapped.

- Terminal format: add a 'Cache tokens' line when cache > 0
- Gateway format: append '/ cache: N' to the token breakdown
- Both formats hide cache when all values are 0 (no noise)
- Add 5 regression tests for cache display behavior

Closes NousResearch#18615
@alt-glitch alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/cli CLI entry point, hermes_cli/, setup wizard labels May 2, 2026
Cyrene963 pushed a commit to Cyrene963/hermes-agent that referenced this pull request May 3, 2026
Community PRs applied:
- NousResearch#18596: Enable secret redaction by default (SECURITY)
- NousResearch#18650: Sanitize malformed tool messages + auto-recover on API 400
- NousResearch#18607: Emergency compression before max_iterations exhaustion
- NousResearch#18603: Compression fallback to main model on 413 rate limit
- NousResearch#18638: Pass threshold_percent on model switch
- NousResearch#18663: Strip extra_content from tool_calls for strict APIs
- NousResearch#18618: Forward explicit_api_key to OpenRouter
- NousResearch#18632: Show cache tokens in /insights breakdown
- NousResearch#18614: Add idempotency guard for patch duplicate loops
- NousResearch#18600: Raise ValueError when HERMES_HOME unset in profile mode
- NousResearch#18616: Allow ZWJ emoji in context files
- NousResearch#18582: Reload .env on /restart
- NousResearch#18547: Stabilize system prompt prefix for KV cache reuse
- NousResearch#18692: Strip FTS5 operators from session search truncation terms

Fix: Add order_by_last_active=True to list_sessions_rich call
(pre-existing commit 142b4bf code sync)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard P3 Low — cosmetic, nice to have type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Insights report: input/output token counts appear swapped or undercounted

2 participants