fix(agent): prevent history.jsonl bloat from raw_archive and stuck consolidation#3412
Merged
Conversation
…nsolidation Root cause: when consolidation LLM fails, raw_archive() dumped full message content (~1MB) into history.jsonl with no size limit. Since build_system_prompt() injects history.jsonl into every system prompt, all subsequent LLM calls exceeded the 200K context window with error 1261. Additionally, _cap_consolidation_boundary's 60-message cap caused consolidation to get stuck on sessions with long tool chains (200+ iterations), triggering the raw_archive fallback in the first place. Three-layer fix: - Remove _cap_consolidation_boundary: let pick_consolidation_boundary drive chunk sizing based solely on token budget - Truncate archive() input: use tiktoken to cap formatted text to the model's input token budget before sending to consolidation LLM - Truncate raw_archive() output: cap history.jsonl entries at 16K chars
5439998 to
a435274
Compare
2 tasks
Truncate the "Recent History" section injected by build_system_prompt() to 32K chars. Without this, many accumulated history.jsonl entries could still bloat the system prompt even with per-entry truncation in place.
…h and history char cap Cover two untested boundaries from #3412: - _truncate_to_token_budget with positive budget exercises tiktoken - _MAX_HISTORY_CHARS caps Recent History section in system prompt Made-with: Cursor
Re-bin
approved these changes
Apr 23, 2026
Collaborator
Re-bin
left a comment
There was a problem hiding this comment.
This is the right kind of fix — it addresses a real cascading failure chain and puts the right guardrails in the right places.
What this PR does
Breaks a cascading failure where stuck consolidation (60-msg cap couldn't find a user turn) → mass raw_archive dumps → 1MB entries in history.jsonl → system prompt permanently exceeds context window. Four targeted changes:
- Remove
_cap_consolidation_boundary— the 60-message cap was fundamentally flawed: with long tool chains (200+ assistant turns), it searched backward for a user turn that didn't exist within the cap range, returnedNone, and aborted the entire consolidation loop.pick_consolidation_boundaryalone is the correct gatekeeper. - Truncate
archive()input — tiktoken-based truncation to the model's input token budget before sending to consolidation LLM. - Truncate
raw_archive()output — 16K char cap as a safety net onhistory.jsonlentries. - Cap Recent History in
build_system_prompt()— 32K char hard cap, defense-in-depth so accumulated entries can't bloat the system prompt.
Diff scope
4 files: nanobot/agent/memory.py, nanobot/agent/context.py, tests/agent/test_consolidator.py, tests/agent/test_context_prompt_cache.py. +145/−45.
Testing
- All 2363 tests pass (0 failures) after merging latest
main. - I added 2 focused regression tests that were missing:
test_archive_truncates_via_tiktoken_with_positive_budget— exercises the tiktoken path with a realistic positive budget (the original 2 archive truncation tests both hit negative budget → char-based fallback, leaving the production-primary tiktoken path uncovered).test_recent_history_truncated_at_max_chars— locks the new_MAX_HISTORY_CHARSboundary inbuild_system_prompt().
Verdict
Direction is right, boundaries are clean, implementation solves the actual cascading problem without over-engineering. Ready to merge.
Re-bin
added a commit
that referenced
this pull request
Apr 23, 2026
#3412 stopped the headline raw_archive bloat but left four adjacent leaks on the same pollution chain: - archive() success path appended uncapped LLM summaries to history.jsonl, so a misbehaving LLM could re-open the #3412 bug from the happy path. - maybe_consolidate_by_tokens did not advance last_consolidated when archive() fell back to raw_archive, causing duplicate [RAW] dumps of the same chunk on every subsequent call. - Dream's Phase 1/2 prompt injected MEMORY.md / SOUL.md / USER.md and each history entry without caps, so any legacy oversized record (or an unbounded user edit) would blow past the context window every dream. - append_history itself had no default cap, leaving future new callers one forgotten-cap-away from the same vector. Changes: - Cap LLM-produced summaries at 8K chars (_ARCHIVE_SUMMARY_MAX_CHARS) before writing to history.jsonl. - Advance session.last_consolidated after archive() regardless of whether it summarized or raw-archived — both outcomes materialize the chunk; still break the round loop on fallback so a degraded LLM isn't hammered. - Truncate MEMORY.md / SOUL.md / USER.md and each history entry in Dream's Phase 1 prompt preview (Phase 2 still reaches full files via read_file). - Add _HISTORY_ENTRY_HARD_CAP (64K) as belt-and-suspenders default in append_history with a once-per-store warning, so any new caller that forgets its own tighter cap gets caught and observable. Layer the caps by scope: raw_archive=16K, archive summary=8K, append_history default=64K. Tight per-caller values cover expected payloads; the wide default only catches regressions. Tests: +9 regression tests covering each fix. Full suite: 2372 passed. Made-with: Cursor
This was referenced Apr 24, 2026
yanghan-cyber
added a commit
to yanghan-cyber/nanobot
that referenced
this pull request
Apr 25, 2026
Upstream changes: - fix(agent): bound memory/history pollution paths (HKUDS#3412) - fix(agent): cap recent history section in system prompt (32K chars) - fix(agent): prevent history.jsonl bloat from raw_archive and stuck consolidation - test: add regression tests for tiktoken truncation and history char cap Conflict resolution: none needed (auto-merged cleanly). Post-merge fix: relaxed files_with_matches sort assertion to set comparison (ripgrep does not guarantee mtime ordering in this mode).
JiajunBernoulli
pushed a commit
to JiajunBernoulli/nanobot
that referenced
this pull request
Apr 26, 2026
…h and history char cap Cover two untested boundaries from HKUDS#3412: - _truncate_to_token_budget with positive budget exercises tiktoken - _MAX_HISTORY_CHARS caps Recent History section in system prompt Made-with: Cursor
JiajunBernoulli
pushed a commit
to JiajunBernoulli/nanobot
that referenced
this pull request
Apr 26, 2026
…#3412 HKUDS#3412 stopped the headline raw_archive bloat but left four adjacent leaks on the same pollution chain: - archive() success path appended uncapped LLM summaries to history.jsonl, so a misbehaving LLM could re-open the HKUDS#3412 bug from the happy path. - maybe_consolidate_by_tokens did not advance last_consolidated when archive() fell back to raw_archive, causing duplicate [RAW] dumps of the same chunk on every subsequent call. - Dream's Phase 1/2 prompt injected MEMORY.md / SOUL.md / USER.md and each history entry without caps, so any legacy oversized record (or an unbounded user edit) would blow past the context window every dream. - append_history itself had no default cap, leaving future new callers one forgotten-cap-away from the same vector. Changes: - Cap LLM-produced summaries at 8K chars (_ARCHIVE_SUMMARY_MAX_CHARS) before writing to history.jsonl. - Advance session.last_consolidated after archive() regardless of whether it summarized or raw-archived — both outcomes materialize the chunk; still break the round loop on fallback so a degraded LLM isn't hammered. - Truncate MEMORY.md / SOUL.md / USER.md and each history entry in Dream's Phase 1 prompt preview (Phase 2 still reaches full files via read_file). - Add _HISTORY_ENTRY_HARD_CAP (64K) as belt-and-suspenders default in append_history with a once-per-store warning, so any new caller that forgets its own tighter cap gets caught and observable. Layer the caps by scope: raw_archive=16K, archive summary=8K, append_history default=64K. Tight per-caller values cover expected payloads; the wide default only catches regressions. Tests: +9 regression tests covering each fix. Full suite: 2372 passed. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four changes to prevent
history.jsonlfrom poisoning the system prompt:_cap_consolidation_boundary: the 60-message cap caused consolidation to get stuck on sessions with long tool chains (200+ iterations), because it would search backward for a user message that didn't exist within the cap range, returnNone, and abort the entire consolidation looparchive()input: use tiktoken to precisely cap formatted text to the model's input token budget before sending to the consolidation LLMraw_archive()output: caphistory.jsonlentries at 16K chars as a safety netbuild_system_prompt(): truncate the "Recent History" section to 32K chars so accumulated entries can't bloat the system promptRoot Cause
Cascading failure chain: consolidation stuck (60-msg cap) → auto-compact mass-archive → consolidation LLM also fails →
raw_archive()writes ~1MB untruncated dump tohistory.jsonl→build_system_prompt()injects it into every system prompt → all LLM calls permanently exceed 200K context window (error 1261)Test plan
TestRawArchiveTruncation(3 tests): verifies raw_archive truncates large content, preserves small content, respects custom max_charsTestArchiveTruncation(2 tests): verifies archive truncates formatted text to token budget_cap_consolidation_boundary