Skip to content

fix(agent): move archived summary into system prompt for KV cache stability#3711

Merged
Re-bin merged 1 commit into
mainfrom
fix/summary-persistence
May 10, 2026
Merged

fix(agent): move archived summary into system prompt for KV cache stability#3711
Re-bin merged 1 commit into
mainfrom
fix/summary-persistence

Conversation

@chengyongru
Copy link
Copy Markdown
Collaborator

@chengyongru chengyongru commented May 9, 2026

Problem

Archived conversation summaries were injected into the runtime context (user message block) via _build_runtime_context. This has two drawbacks:

  1. KV cache waste — the summary changes between turns, so the system prompt prefix cannot reuse cached KV from the previous turn.
  2. Consecutive same-role messages — prepending a summary string before the user content can create adjacent user-role messages in edge cases.

Additionally, _last_summary was pop()'ed from session metadata on consumption, meaning it could not survive a process restart.

Approach

  1. Move summary into system promptbuild_system_prompt() accepts an optional session_summary and appends it as [Archived Context Summary]. build_messages() forwards session_summary to build_system_prompt() so both the normal path and the ask_user path receive the summary in the system prompt. Since the system prompt is stable across turns (only changes when a new archive is produced), this improves KV cache reuse.

  2. Freeze summary format at archive time_format_summary now uses a static last_active timestamp instead of computing a dynamic "Inactive for X minutes" on every turn, preserving KV cache stability.

  3. _last_summary persists in metadata — read with get() instead of pop(), ensuring it survives process restarts. The summary is re-injected every turn via the stable system prompt.

  4. estimate_session_prompt_tokens reads summary from metadata — so the token budget estimation accounts for the archived summary accurately.

  5. Remove session_summary parameter from _build_runtime_context, maybe_consolidate_by_tokens, and estimate_session_prompt_tokens — these no longer need to handle the summary directly; it flows through build_messagesbuild_system_prompt instead.

  6. /new clears _last_summarySession.clear() now pops _last_summary from metadata so a fresh session starts clean.

Files changed

  • nanobot/agent/autocompact.py — freeze _format_summary output; stop popping _last_summary
  • nanobot/agent/context.py — add session_summary to build_messages(); move summary from runtime context to build_system_prompt()
  • nanobot/agent/loop.py — pass session_summary to build_messages in all call sites
  • nanobot/agent/memory.pyestimate_session_prompt_tokens reads summary from metadata
  • nanobot/session/manager.pyclear() removes _last_summary
  • tests/agent/test_auto_compact.py — updated assertions + new lifecycle tests
  • tests/agent/test_loop_consolidation_tokens.py — updated assertions
  • tests/agent/test_unified_session.py — updated assertions

@chengyongru chengyongru force-pushed the fix/summary-persistence branch 6 times, most recently from 07562a7 to e6827ca Compare May 9, 2026 08:10
@chengyongru chengyongru added the bug Something isn't working label May 9, 2026
@chengyongru chengyongru force-pushed the fix/summary-persistence branch 2 times, most recently from 1e5ffcd to 7f744de Compare May 9, 2026 10:44
@chengyongru chengyongru changed the title fix(agent): persist _last_summary across restarts with used sentinel fix(agent): move archived summary into system prompt for KV cache stability May 9, 2026
…bility

- Append [Archived Context Summary] to system prompt instead of injecting
  it into the user message runtime context, improving KV cache reuse across
  turns and avoiding consecutive same-role messages.
- _last_summary persists in metadata (no pop) for restart survival;
  summary is re-injected every turn via the stable system prompt.
- Remove dynamic "Inactive for X minutes" from _format_summary — use
  static last_active timestamp instead to preserve KV cache stability.
- Pass session_summary through build_messages() so both normal and
  ask_user paths receive the archived summary in the system prompt.
- estimate_session_prompt_tokens now reads _last_summary from metadata
  to include the summary in token budget estimation.
- Remove obsolete session_summary parameter from
  maybe_consolidate_by_tokens and estimate_session_prompt_tokens
  call sites in loop.py (summary flows through build_messages instead).
- Ensure /new (session.clear()) clears _last_summary from metadata.
@chengyongru chengyongru force-pushed the fix/summary-persistence branch from 7f744de to e03d7b8 Compare May 10, 2026 15:43
@chengyongru
Copy link
Copy Markdown
Collaborator Author

Manual tested with langfuse

Copy link
Copy Markdown
Collaborator

@Re-bin Re-bin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chengyongru!

@Re-bin Re-bin merged commit a6e993d into main May 10, 2026
2 checks passed
@Re-bin Re-bin deleted the fix/summary-persistence branch May 10, 2026 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working valid

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants