Skip to content

fix(gateway): preserve pending update prompts across restarts#18477

Closed
simbam99 wants to merge 1 commit into
NousResearch:mainfrom
simbam99:fix/update-prompt-restart-persistence
Closed

fix(gateway): preserve pending update prompts across restarts#18477
simbam99 wants to merge 1 commit into
NousResearch:mainfrom
simbam99:fix/update-prompt-restart-persistence

Conversation

@simbam99
Copy link
Copy Markdown
Contributor

@simbam99 simbam99 commented May 1, 2026

Summary

This fixes a gateway /update state-loss bug when Hermes asks the user an interactive update question and the gateway restarts before the user replies.

Previously, _watch_update_progress() deleted .update_prompt.json immediately after forwarding the prompt to the user. If the gateway restarted while the update subprocess was still waiting on .update_response, the new gateway process had no durable prompt state left to recover from. The user's later reply would fall through as a normal message, and the update subprocess would remain blocked until timeout.

What changed

  • Keep .update_prompt.json on disk after forwarding the prompt.
  • Only remove the prompt marker once the user actually replies and the gateway writes .update_response.
  • Apply the same cleanup to the recognized-slash-command cancel path, where the gateway writes a blank response to unblock the update subprocess.

This keeps restart recovery intact while still preventing duplicate prompt spam within a single watcher process via _update_prompt_pending.

Why this is safe

  • Duplicate sends are still suppressed in-process by _update_prompt_pending.
  • Prompt cleanup still happens once the gateway has a definitive response to hand back to the update subprocess.
  • The change is scoped to the /update gateway IPC flow only.

Tests

Added/updated coverage in tests/gateway/test_update_streaming.py for:

  • prompt forwarding still happening only once within a single watcher
  • prompt recovery after watcher/gateway restart while the prompt is still pending
  • prompt marker cleanup after a normal text response
  • prompt marker cleanup after recognized slash-command cancellation
  • prompt marker cleanup after unrecognized slash-command responses

Repro before fix

  1. Start /update from a gateway platform.
  2. Let the update subprocess emit an interactive prompt.
  3. Gateway forwards the prompt, then restarts before the user replies.
  4. User replies y or n.

Before this fix, the prompt marker was already gone, so the restarted gateway could not recover the pending prompt and the reply would not reach the update subprocess.

Result after fix

The restarted gateway can see the still-pending .update_prompt.json, re-forward the prompt, and route the user's eventual response back to the waiting update subprocess.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery labels May 1, 2026
@teknium1
Copy link
Copy Markdown
Contributor

teknium1 commented May 5, 2026

Merged via #20160 — cherry-picked your commit onto current main with authorship preserved in git log. Thanks for the fix! Catching that the prompt marker got deleted before the user could reply was a good find, and adding the restart-recovery test alongside the original duplicate-send test made the intent crystal clear. #20160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants