fix(gateway): persist token counts to session store for /status display by JezzaHehn · Pull Request #17158 · NousResearch/hermes-agent

JezzaHehn · 2026-04-28T21:59:40Z

What

Fixed the /status command showing 0 tokens despite active session activity.

Why

Two bugs:

Primary: gateway/run.py checked agent_result.get("total_tokens") which returns None (agent only returns input_tokens/output_tokens, not total_tokens). The condition if total > 0 never executed.
Secondary: SessionEntry class lacked reasoning_tokens field, causing AttributeError when update_token_counts() tried to increment it.

How to test

Start a Hermes gateway session (Discord/Telegram/etc)
Have a conversation (make 1+ agent calls)
Run /status
Verify token counts are non-zero

Files changed

gateway/run.py — calculate total from components, call update_token_counts()
gateway/session.py — add reasoning_tokens field to SessionEntry

alt-glitch · 2026-04-28T22:05:06Z

Related to #5960 (same symptom: /status shows 0 tokens) and competing with #9750 (also fixes token tracking). Check for overlap.

JezzaHehn · 2026-04-28T22:05:54Z

Small note that the arithmetic around line 990 of the updated session.py is most likely to be incorrect if the math is wrong for the total tokens. And, big thanks to the development team 💚

JezzaHehn · 2026-04-28T22:09:54Z

Looks like #9750 changes more files and includes code for a testing monkeypatch, not sure if the extra stuff is necessary, I defer to the judgement of those who have more oversight 👍

/status was reading session_entry.total_tokens from the in-memory SessionStore (gateway/session.py), which the agent never writes to — so the token count was always 0. The agent already persists token deltas to the SQLite SessionDB (run_agent.py:11497) for every platform with a session_id. Route /status through that single source of truth instead of duplicating token writes into a second store. Fix: - gateway/run.py: _handle_status_command now calls self._session_db.get_session(session_id) and sums the five token component columns (input/output/cache_read/cache_write/reasoning). Falls back to 0 when no SessionDB is configured or no row exists. - Two new regression tests covering the populated-row and missing-row paths. Co-authored-by: Hermes <127238744+teknium1@users.noreply.github.com>

teknium1 · 2026-05-01T03:29:37Z

Salvaged via #18206 — merged as commit 7abc9ce on main.

Your bug identification was spot-on: /status really was reading from a store nothing writes to. The salvage takes a different fix path — instead of adding a parallel write into the in-memory SessionStore, we route /status through the SQLite SessionDB that the agent already writes to (run_agent.py:11497). Single source of truth, no new fields or methods needed. Your commit was preserved as the authored commit on the salvage PR via rebase-merge, so your name lands in git log.

Thanks for the report and the fix!

…before _process_message_background snapshotted callback_generation from the interrupt event at the TOP of the task — before the handler ran. _hermes_run_generation is only set on the event by GatewayRunner._bind_adapter_run_generation during _handle_message_with_agent, which runs DURING the handler await. The early snapshot always captured None, which then flowed into pop_post_delivery_callback(..., generation=None) in the finally block. In pop_post_delivery_callback, generation=None with a tuple-registered entry (generation, callback) bypasses the ownership check — it pops and fires the callback regardless of which run owns it. Result: a stale run could fire a fresher run's post-delivery callback (e.g. a background-review notification attributed to the wrong turn). Fix: move the snapshot into the finally block, after the handler has run and _hermes_run_generation has been bound to the current run. Regression test added: simulates a stale handler at generation=1 and a fresher callback registered at generation=2. Pre-fix: snapshot=None → pop fires the generation=2 callback under generation=1's ownership ("newer" fires). Post-fix: snapshot=1 → pop skips the mismatched entry, callback stays in the dict for the correct run to claim. Verified: test FAILS on current main (captures "newer" in fired list), PASSES with this fix. Salvaged from PR #12565 (the callback-ownership portion only; the /status totals portion was already fixed on main in 7abc9ce via #17158). Co-authored-by: Oxidane-bot <1317078257maroon@gmail.com>

…17158) /status was reading session_entry.total_tokens from the in-memory SessionStore (gateway/session.py), which the agent never writes to — so the token count was always 0. The agent already persists token deltas to the SQLite SessionDB (run_agent.py:11497) for every platform with a session_id. Route /status through that single source of truth instead of duplicating token writes into a second store. Fix: - gateway/run.py: _handle_status_command now calls self._session_db.get_session(session_id) and sums the five token component columns (input/output/cache_read/cache_write/reasoning). Falls back to 0 when no SessionDB is configured or no row exists. - Two new regression tests covering the populated-row and missing-row paths. Co-authored-by: Hermes <127238744+teknium1@users.noreply.github.com>

…before _process_message_background snapshotted callback_generation from the interrupt event at the TOP of the task — before the handler ran. _hermes_run_generation is only set on the event by GatewayRunner._bind_adapter_run_generation during _handle_message_with_agent, which runs DURING the handler await. The early snapshot always captured None, which then flowed into pop_post_delivery_callback(..., generation=None) in the finally block. In pop_post_delivery_callback, generation=None with a tuple-registered entry (generation, callback) bypasses the ownership check — it pops and fires the callback regardless of which run owns it. Result: a stale run could fire a fresher run's post-delivery callback (e.g. a background-review notification attributed to the wrong turn). Fix: move the snapshot into the finally block, after the handler has run and _hermes_run_generation has been bound to the current run. Regression test added: simulates a stale handler at generation=1 and a fresher callback registered at generation=2. Pre-fix: snapshot=None → pop fires the generation=2 callback under generation=1's ownership ("newer" fires). Post-fix: snapshot=1 → pop skips the mismatched entry, callback stays in the dict for the correct run to claim. Verified: test FAILS on current main (captures "newer" in fired list), PASSES with this fix. Salvaged from PR NousResearch#12565 (the callback-ownership portion only; the /status totals portion was already fixed on main in 7abc9ce via NousResearch#17158). Co-authored-by: Oxidane-bot <1317078257maroon@gmail.com>

…17158) /status was reading session_entry.total_tokens from the in-memory SessionStore (gateway/session.py), which the agent never writes to — so the token count was always 0. The agent already persists token deltas to the SQLite SessionDB (run_agent.py:11497) for every platform with a session_id. Route /status through that single source of truth instead of duplicating token writes into a second store. Fix: - gateway/run.py: _handle_status_command now calls self._session_db.get_session(session_id) and sums the five token component columns (input/output/cache_read/cache_write/reasoning). Falls back to 0 when no SessionDB is configured or no row exists. - Two new regression tests covering the populated-row and missing-row paths. Co-authored-by: Hermes <127238744+teknium1@users.noreply.github.com>

…before _process_message_background snapshotted callback_generation from the interrupt event at the TOP of the task — before the handler ran. _hermes_run_generation is only set on the event by GatewayRunner._bind_adapter_run_generation during _handle_message_with_agent, which runs DURING the handler await. The early snapshot always captured None, which then flowed into pop_post_delivery_callback(..., generation=None) in the finally block. In pop_post_delivery_callback, generation=None with a tuple-registered entry (generation, callback) bypasses the ownership check — it pops and fires the callback regardless of which run owns it. Result: a stale run could fire a fresher run's post-delivery callback (e.g. a background-review notification attributed to the wrong turn). Fix: move the snapshot into the finally block, after the handler has run and _hermes_run_generation has been bound to the current run. Regression test added: simulates a stale handler at generation=1 and a fresher callback registered at generation=2. Pre-fix: snapshot=None → pop fires the generation=2 callback under generation=1's ownership ("newer" fires). Post-fix: snapshot=1 → pop skips the mismatched entry, callback stays in the dict for the correct run to claim. Verified: test FAILS on current main (captures "newer" in fired list), PASSES with this fix. Salvaged from PR NousResearch#12565 (the callback-ownership portion only; the /status totals portion was already fixed on main in 7abc9ce via NousResearch#17158). Co-authored-by: Oxidane-bot <1317078257maroon@gmail.com>

fix(gateway): persist token counts to session store for /status display

9846280

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery labels Apr 28, 2026

teknium1 mentioned this pull request May 1, 2026

fix(gateway): read /status token totals from SessionDB (#17158) #18206

Merged

teknium1 closed this in #18206 May 1, 2026

teknium1 mentioned this pull request May 1, 2026

fix(gateway): /status reads token totals from SessionDB (regression of #1465) #18207

Closed

teknium1 mentioned this pull request May 1, 2026

fix(gateway): snapshot callback generation after agent binds it, not before #18219

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): persist token counts to session store for /status display#17158

fix(gateway): persist token counts to session store for /status display#17158
JezzaHehn wants to merge 1 commit into
NousResearch:mainfrom
JezzaHehn:fix/gateway-status-token-count

JezzaHehn commented Apr 28, 2026

Uh oh!

alt-glitch commented Apr 28, 2026

Uh oh!

JezzaHehn commented Apr 28, 2026

Uh oh!

JezzaHehn commented Apr 28, 2026

Uh oh!

teknium1 commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JezzaHehn commented Apr 28, 2026

What

Why

How to test

Files changed

Uh oh!

alt-glitch commented Apr 28, 2026

Uh oh!

JezzaHehn commented Apr 28, 2026

Uh oh!

JezzaHehn commented Apr 28, 2026

Uh oh!

teknium1 commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants