Skip to content

fix(agent): prevent history.jsonl bloat from raw_archive and stuck consolidation#3412

Merged
Re-bin merged 3 commits into
mainfrom
fix/consolidation-raw-archive
Apr 23, 2026
Merged

fix(agent): prevent history.jsonl bloat from raw_archive and stuck consolidation#3412
Re-bin merged 3 commits into
mainfrom
fix/consolidation-raw-archive

Conversation

@chengyongru
Copy link
Copy Markdown
Collaborator

@chengyongru chengyongru commented Apr 23, 2026

Summary

Four changes to prevent history.jsonl from poisoning the system prompt:

  • Remove _cap_consolidation_boundary: the 60-message cap caused consolidation to get stuck on sessions with long tool chains (200+ iterations), because it would search backward for a user message that didn't exist within the cap range, return None, and abort the entire consolidation loop
  • Truncate archive() input: use tiktoken to precisely cap formatted text to the model's input token budget before sending to the consolidation LLM
  • Truncate raw_archive() output: cap history.jsonl entries at 16K chars as a safety net
  • Cap recent history in build_system_prompt(): truncate the "Recent History" section to 32K chars so accumulated entries can't bloat the system prompt

Root Cause

Cascading failure chain: consolidation stuck (60-msg cap) → auto-compact mass-archive → consolidation LLM also fails → raw_archive() writes ~1MB untruncated dump to history.jsonlbuild_system_prompt() injects it into every system prompt → all LLM calls permanently exceed 200K context window (error 1261)

Test plan

  • All 2146 tests pass, 0 failures
  • New TestRawArchiveTruncation (3 tests): verifies raw_archive truncates large content, preserves small content, respects custom max_chars
  • New TestArchiveTruncation (2 tests): verifies archive truncates formatted text to token budget
  • Updated existing cap-related tests to verify behavior without _cap_consolidation_boundary

…nsolidation

Root cause: when consolidation LLM fails, raw_archive() dumped full message
content (~1MB) into history.jsonl with no size limit. Since build_system_prompt()
injects history.jsonl into every system prompt, all subsequent LLM calls exceeded
the 200K context window with error 1261.

Additionally, _cap_consolidation_boundary's 60-message cap caused consolidation
to get stuck on sessions with long tool chains (200+ iterations), triggering
the raw_archive fallback in the first place.

Three-layer fix:
- Remove _cap_consolidation_boundary: let pick_consolidation_boundary drive
  chunk sizing based solely on token budget
- Truncate archive() input: use tiktoken to cap formatted text to the model's
  input token budget before sending to consolidation LLM
- Truncate raw_archive() output: cap history.jsonl entries at 16K chars
Truncate the "Recent History" section injected by build_system_prompt()
to 32K chars. Without this, many accumulated history.jsonl entries could
still bloat the system prompt even with per-entry truncation in place.
…h and history char cap

Cover two untested boundaries from #3412:
- _truncate_to_token_budget with positive budget exercises tiktoken
- _MAX_HISTORY_CHARS caps Recent History section in system prompt

Made-with: Cursor
Copy link
Copy Markdown
Collaborator

@Re-bin Re-bin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the right kind of fix — it addresses a real cascading failure chain and puts the right guardrails in the right places.

What this PR does

Breaks a cascading failure where stuck consolidation (60-msg cap couldn't find a user turn) → mass raw_archive dumps → 1MB entries in history.jsonl → system prompt permanently exceeds context window. Four targeted changes:

  1. Remove _cap_consolidation_boundary — the 60-message cap was fundamentally flawed: with long tool chains (200+ assistant turns), it searched backward for a user turn that didn't exist within the cap range, returned None, and aborted the entire consolidation loop. pick_consolidation_boundary alone is the correct gatekeeper.
  2. Truncate archive() input — tiktoken-based truncation to the model's input token budget before sending to consolidation LLM.
  3. Truncate raw_archive() output — 16K char cap as a safety net on history.jsonl entries.
  4. Cap Recent History in build_system_prompt() — 32K char hard cap, defense-in-depth so accumulated entries can't bloat the system prompt.

Diff scope

4 files: nanobot/agent/memory.py, nanobot/agent/context.py, tests/agent/test_consolidator.py, tests/agent/test_context_prompt_cache.py. +145/−45.

Testing

  • All 2363 tests pass (0 failures) after merging latest main.
  • I added 2 focused regression tests that were missing:
    • test_archive_truncates_via_tiktoken_with_positive_budget — exercises the tiktoken path with a realistic positive budget (the original 2 archive truncation tests both hit negative budget → char-based fallback, leaving the production-primary tiktoken path uncovered).
    • test_recent_history_truncated_at_max_chars — locks the new _MAX_HISTORY_CHARS boundary in build_system_prompt().

Verdict

Direction is right, boundaries are clean, implementation solves the actual cascading problem without over-engineering. Ready to merge.

@Re-bin Re-bin merged commit 81a5af2 into main Apr 23, 2026
8 checks passed
@Re-bin Re-bin deleted the fix/consolidation-raw-archive branch April 23, 2026 19:58
Re-bin added a commit that referenced this pull request Apr 23, 2026
#3412 stopped the headline raw_archive bloat but left four adjacent leaks
on the same pollution chain:

- archive() success path appended uncapped LLM summaries to history.jsonl,
  so a misbehaving LLM could re-open the #3412 bug from the happy path.
- maybe_consolidate_by_tokens did not advance last_consolidated when
  archive() fell back to raw_archive, causing duplicate [RAW] dumps of
  the same chunk on every subsequent call.
- Dream's Phase 1/2 prompt injected MEMORY.md / SOUL.md / USER.md and
  each history entry without caps, so any legacy oversized record (or an
  unbounded user edit) would blow past the context window every dream.
- append_history itself had no default cap, leaving future new callers
  one forgotten-cap-away from the same vector.

Changes:

- Cap LLM-produced summaries at 8K chars (_ARCHIVE_SUMMARY_MAX_CHARS)
  before writing to history.jsonl.
- Advance session.last_consolidated after archive() regardless of whether
  it summarized or raw-archived — both outcomes materialize the chunk;
  still break the round loop on fallback so a degraded LLM isn't hammered.
- Truncate MEMORY.md / SOUL.md / USER.md and each history entry in Dream's
  Phase 1 prompt preview (Phase 2 still reaches full files via read_file).
- Add _HISTORY_ENTRY_HARD_CAP (64K) as belt-and-suspenders default in
  append_history with a once-per-store warning, so any new caller that
  forgets its own tighter cap gets caught and observable.

Layer the caps by scope: raw_archive=16K, archive summary=8K,
append_history default=64K. Tight per-caller values cover expected
payloads; the wide default only catches regressions.

Tests: +9 regression tests covering each fix. Full suite: 2372 passed.
Made-with: Cursor
yanghan-cyber added a commit to yanghan-cyber/nanobot that referenced this pull request Apr 25, 2026
Upstream changes:
- fix(agent): bound memory/history pollution paths (HKUDS#3412)
- fix(agent): cap recent history section in system prompt (32K chars)
- fix(agent): prevent history.jsonl bloat from raw_archive and stuck consolidation
- test: add regression tests for tiktoken truncation and history char cap

Conflict resolution: none needed (auto-merged cleanly).
Post-merge fix: relaxed files_with_matches sort assertion to set comparison
(ripgrep does not guarantee mtime ordering in this mode).
JiajunBernoulli pushed a commit to JiajunBernoulli/nanobot that referenced this pull request Apr 26, 2026
…h and history char cap

Cover two untested boundaries from HKUDS#3412:
- _truncate_to_token_budget with positive budget exercises tiktoken
- _MAX_HISTORY_CHARS caps Recent History section in system prompt

Made-with: Cursor
JiajunBernoulli pushed a commit to JiajunBernoulli/nanobot that referenced this pull request Apr 26, 2026
…#3412

HKUDS#3412 stopped the headline raw_archive bloat but left four adjacent leaks
on the same pollution chain:

- archive() success path appended uncapped LLM summaries to history.jsonl,
  so a misbehaving LLM could re-open the HKUDS#3412 bug from the happy path.
- maybe_consolidate_by_tokens did not advance last_consolidated when
  archive() fell back to raw_archive, causing duplicate [RAW] dumps of
  the same chunk on every subsequent call.
- Dream's Phase 1/2 prompt injected MEMORY.md / SOUL.md / USER.md and
  each history entry without caps, so any legacy oversized record (or an
  unbounded user edit) would blow past the context window every dream.
- append_history itself had no default cap, leaving future new callers
  one forgotten-cap-away from the same vector.

Changes:

- Cap LLM-produced summaries at 8K chars (_ARCHIVE_SUMMARY_MAX_CHARS)
  before writing to history.jsonl.
- Advance session.last_consolidated after archive() regardless of whether
  it summarized or raw-archived — both outcomes materialize the chunk;
  still break the round loop on fallback so a degraded LLM isn't hammered.
- Truncate MEMORY.md / SOUL.md / USER.md and each history entry in Dream's
  Phase 1 prompt preview (Phase 2 still reaches full files via read_file).
- Add _HISTORY_ENTRY_HARD_CAP (64K) as belt-and-suspenders default in
  append_history with a once-per-store warning, so any new caller that
  forgets its own tighter cap gets caught and observable.

Layer the caps by scope: raw_archive=16K, archive summary=8K,
append_history default=64K. Tight per-caller values cover expected
payloads; the wide default only catches regressions.

Tests: +9 regression tests covering each fix. Full suite: 2372 passed.
Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants