fix: surface silent task failures instead of acking them as completed#2167
Open
kpscheffel wants to merge 3 commits into
Open
fix: surface silent task failures instead of acking them as completed#2167kpscheffel wants to merge 3 commits into
kpscheffel wants to merge 3 commits into
Conversation
…asks completed
Scheduled tasks that failed (rate limit, max-turns, network drop, etc.)
were being acked as `completed` in processing_ack with zero messages_out,
leaving operators thinking silent days were healthy.
Three causes, all fixed:
1. The Claude provider yielded `{ type: 'result', text }` for every SDK
result subtype — `success`, `error_max_turns`, `error_during_execution`
were treated identically. The poll-loop's idempotent `markCompleted`
then ran on every terminal event regardless of outcome.
2. After `processQuery` returned (whether the SDK stream ended cleanly or
because it errored), `runPollLoop` always called `markCompleted`. So
even an `error` ProviderEvent followed by a silent stream close ended
up flagged as a clean turn.
3. `markFailed` was per-id while `markCompleted` was array-based, which
discouraged batched failure handling.
This commit:
* Adds `subtype` to the `result` ProviderEvent (types.ts).
- undefined → synthetic mid-turn (e.g. compact_boundary), do NOT ack.
- 'error_*' → terminal failure.
- other → terminal success.
* Plumbs the SDK subtype through the Claude provider (claude.ts).
* Tracks `sawTerminalResult` and `lastErrorMessage` per turn in
processQuery (poll-loop.ts). On error subtype: markFailed +
write a ⚠️ Task did not complete DM to the originating channel.
On stream close without terminal result: same.
* Updates the catch path in runPollLoop to markFailed (was always
markCompleted) and continue to the next turn.
* Makes `markFailed` array-aware to mirror `markCompleted`.
* Updates MockProvider to yield `subtype: 'success'` so existing
tests represent real terminal results, not synthetic mid-turn ones.
* Adds an optional `stopSignal: AbortSignal` to PollLoopConfig so
tests can deterministically halt the loop after assertions.
New tests in poll-loop-failure.test.ts cover:
- error_during_execution subtype → markFailed + DM
- stream close without terminal result → markFailed + DM
- in-stream error event captured in the failure DM
- empty success result (legitimate "no chat reply needed") still acks
- synthetic mid-turn result followed by terminal success acks correctly
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lapse to 'completed' The host sync was collapsing both 'completed' and 'failed' processing_ack statuses into messages_in.status='completed', erasing the runner's signal that a turn errored out. Combined with the agent-runner's old unconditional markCompleted, this is what made silent task failures look healthy in the central DB. Now propagates each status as itself, with a guard so an already- completed row isn't downgraded to 'failed' on a stale ack. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a sweep step that scans for kind='task' rows acked completed/failed in the last 15 min with zero messages_out in the same window. On detection: logs WARN and DMs the session's originating messaging group via the channel adapter, so an operator sees the failure even if the runner-side fix missed an SDK event shape. Per-process Set<string> dedup (capped at 1000, reaped to currently- relevant ids each sweep) prevents re-DMing every minute. Pairs with the agent-runner's terminal-result handling — that's the primary fix; this scan catches anything that bypasses it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Type of Change
Description
What. Stops the agent-runner from acking scheduled tasks as
completedwhen the SDK call actually failed silently — and adds a host-side scan that catches anything that bypasses the runner-side fix.Why. When the Claude SDK terminates with an error subtype (
error_max_turns,error_during_execution, …) or the stream closes without a terminalresultevent, the v2 poll-loop unconditionally calledmarkCompletedand produced no user-visible signal. Combined with the host'ssyncProcessingAckscollapsing bothcompletedandfailedprocessing_ackrows intomessages_in.status='completed', scheduled tasks could fail every day and look healthy — operators had no way to tell. Reproduced in real data: three consecutive morning-briefing runs marked complete with zeromessages_out.How it works.
container/agent-runner/src/providers/types.ts—resultProviderEventcarries the SDK'ssubtype. Three buckets:undefined= synthetic mid-turn (e.g.compact_boundary, dispatch text, do not ack);error_*= terminal failure; otherwise = terminal success.container/agent-runner/src/providers/claude.ts— yield site reads(message as { subtype?: string }).subtypeand passes it through.container/agent-runner/src/poll-loop.ts—processQuerytrackssawTerminalResultandlastErrorMessage. Error subtype →markFailed(initialBatchIds)and write a⚠️ Task did not complete: <subtype>chat to the originating channel. After the stream ends without a terminal result → same. The outer catch path now callsmarkFailedinstead of alwaysmarkCompleted.markFailedbecomes array-aware to mirrormarkCompleted.MockProviderupdated to yieldsubtype: 'success'. NewstopSignal: AbortSignalonPollLoopConfigso tests can halt the loop deterministically (production callers omit it).src/db/session-db.ts—syncProcessingAckspropagatesfailedasfailedinstead of collapsing tocompleted, with a guard so an already-completed row isn't downgraded by a stale ack.src/host-sweep.ts— newdetectAndNotifySilentTasksstep. Scanskind='task'rows acked completed/failed in the last 15 min with zeromessages_outin the same window, DMs the session's originating messaging group viagetDeliveryAdapter().deliver(...), dedups per-process. Caps the dedupSetat 1000 entries.How it was tested.
container/agent-runner/src/poll-loop-failure.test.tscovering: error subtype path, stream-closed-without-terminal path, in-stream error captured in DM, empty success result legitimately ack'd, synthetic mid-turn followed by terminal success.src/db/session-db.test.tscoveringsyncProcessingAckspropagatingfailedasfailed, propagatingcompletedascompleted, and refusing to downgrade an already-completed row.task-1777349525294-capq6s,task-1777435932858-rqa22y,task-1777522278899-8mqkfc) each ackedcompletedwith 0messages_out. With this patch deployed, future failures willmarkFailedand DM⚠️ Task did not complete: …to the channel; the host-side scan will catch any path the runner-side missed.