Skip to content

fix(sessions): preserve terminal status after agent run completes#1913

Open
BingqingLyu wants to merge 5 commits intomainfrom
fork-pr-60290-fix-session-status-clobbered-after-run
Open

fix(sessions): preserve terminal status after agent run completes#1913
BingqingLyu wants to merge 5 commits intomainfrom
fork-pr-60290-fix-session-status-clobbered-after-run

Conversation

@BingqingLyu
Copy link
Copy Markdown
Owner

@BingqingLyu BingqingLyu commented Apr 28, 2026

Problem

After a run completes, a session can remain persisted with status: "running" even though endedAt and runtimeMs are already set. This wedges the session: new inbound messages are not accepted and stop/abort commands do not work until the store row is manually edited.

Root cause

persistGatewaySessionLifecycleEvent is called with void (fire-and-forget) in the agent event listener. It enqueues a write to set the terminal status on disk. Immediately after, updateSessionStoreAfterAgentRun enqueues its own write.

The lock queue runs them in order:

  1. Lifecycle end handler: reads disk, writes { status: "done", endedAt: X, runtimeMs: Y }
  2. updateSessionStoreAfterAgentRun: reads disk (has "done"), but patches from the in-memory sessionStore which was loaded during session initialisation — before the run — and still carries status: "running". mergeSessionEntry(diskState, next) spreads next.status = "running" over the disk's "done", leaving endedAt/runtimeMs intact (they were never in next).

Final persisted state: { status: "running", endedAt: X, runtimeMs: Y } — the exact stuck pattern described in the issue.

Fix

Omit status from the patch passed to mergeSessionEntry inside updateSessionStoreAfterAgentRun. This function is responsible for token/cost/model fields only; lifecycle status is exclusively owned by persistGatewaySessionLifecycleEvent.

Test

Added a regression test in src/commands/agent/session-store.test.ts:

  • Disk pre-seeded with { status: "done", endedAt: X, runtimeMs: 1234 } (as if lifecycle end already ran)
  • In-memory store has { status: "running" } (stale)
  • After updateSessionStoreAfterAgentRun, asserts persisted.status === "done" and endedAt/runtimeMs preserved

Closes openclaw#60250

🤖 Generated with Claude Code

jwchmodx and others added 5 commits April 3, 2026 15:45
…ostReplyRootId

Direct messages in Mattermost were creating threads even with
replyToMode=off because resolveMattermostReplyRootId would fall back
to payload.replyToId regardless of chat kind. When a downstream payload
carried a replyToId (e.g. from block-streaming delivery), this
bypassed the earlier DM threading guard and set root_id on the outbound
post, making DM replies appear as threads instead of in the channel body.

Fix: pass kind through all three delivery call sites and hard-return
undefined inside resolveMattermostReplyRootId for kind="direct",
mirroring the resolveMattermostEffectiveReplyToId guard that already
existed for session-key resolution.

Fixes openclaw#59981
…hreshold

The generic_repeat detector only checked warningThreshold and always
returned level "warning", making criticalThreshold effectively a no-op
for the most common runaway loop pattern (same tool + same args).

Add a critical-threshold check before the warning check, consistent
with how known_poll_no_progress and ping_pong already behave.

Fixes openclaw#60111

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…aining-only fields

MiniMax's usage_percent / usagePercent fields report the *remaining* quota
as a percentage, not the consumed quota. When count fields (prompt_limit /
prompt_remain) are also present, fromCounts already computed the correct
usedPercent and the inverted value was silently ignored. But when only
usage_percent is returned (no count fields), the code treated it as a
used-percent and passed it through unchanged, causing the menu bar to show
"2% left" instead of "98% left".

Move usage_percent and usagePercent from PERCENT_KEYS to a new
REMAINING_PERCENT_KEYS array. deriveUsedPercent now inverts remaining-percent
values to obtain usedPercent, matching the behaviour already validated by the
existing "prefers count-based usage when percent looks inverted" test. Count-
based fromCounts still takes priority over both key groups.

Fixes openclaw#60193

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MiMo reasoning models (mimo-v2-pro, mimo-v2-omni) output their full
response to reasoning_content with an empty content field. This causes
OpenClaw to emit no visible text since the pi-ai stream emits
reasoning_content deltas as thinking blocks, not as main content.

Setting enable_thinking: false in the request payload directs the model
to write its reply to the standard content field instead.

Closes openclaw#60261

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
persistGatewaySessionLifecycleEvent writes the terminal status to disk
as a fire-and-forget write. updateSessionStoreAfterAgentRun runs after
it, but reads the in-memory sessionStore (loaded before the run) which
still carries status: "running". The stale status was being spread into
the merge patch, clobbering the "done"/"failed"/"killed" status on disk.

Result: sessions could remain stuck as "running" even though endedAt and
runtimeMs were correctly set, blocking new inbound messages and
stop/abort commands until the store row was edited manually.

Fix: omit the status field from the patch in updateSessionStoreAfterAgentRun
so that the lifecycle-managed terminal status on disk is always preserved.

Closes openclaw#60250

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Completed run can remain persisted as running, blocking new input and stop

2 participants