fix(agent): prevent history.jsonl bloat from raw_archive and stuck consolidation by chengyongru · Pull Request #3412 · HKUDS/nanobot

chengyongru · 2026-04-23T17:30:55Z

Summary

Four changes to prevent history.jsonl from poisoning the system prompt:

Remove _cap_consolidation_boundary: the 60-message cap caused consolidation to get stuck on sessions with long tool chains (200+ iterations), because it would search backward for a user message that didn't exist within the cap range, return None, and abort the entire consolidation loop
Truncate archive() input: use tiktoken to precisely cap formatted text to the model's input token budget before sending to the consolidation LLM
Truncate raw_archive() output: cap history.jsonl entries at 16K chars as a safety net
Cap recent history in build_system_prompt(): truncate the "Recent History" section to 32K chars so accumulated entries can't bloat the system prompt

Root Cause

Cascading failure chain: consolidation stuck (60-msg cap) → auto-compact mass-archive → consolidation LLM also fails → raw_archive() writes ~1MB untruncated dump to history.jsonl → build_system_prompt() injects it into every system prompt → all LLM calls permanently exceed 200K context window (error 1261)

Test plan

All 2146 tests pass, 0 failures
New TestRawArchiveTruncation (3 tests): verifies raw_archive truncates large content, preserves small content, respects custom max_chars
New TestArchiveTruncation (2 tests): verifies archive truncates formatted text to token budget
Updated existing cap-related tests to verify behavior without _cap_consolidation_boundary

…nsolidation Root cause: when consolidation LLM fails, raw_archive() dumped full message content (~1MB) into history.jsonl with no size limit. Since build_system_prompt() injects history.jsonl into every system prompt, all subsequent LLM calls exceeded the 200K context window with error 1261. Additionally, _cap_consolidation_boundary's 60-message cap caused consolidation to get stuck on sessions with long tool chains (200+ iterations), triggering the raw_archive fallback in the first place. Three-layer fix: - Remove _cap_consolidation_boundary: let pick_consolidation_boundary drive chunk sizing based solely on token budget - Truncate archive() input: use tiktoken to cap formatted text to the model's input token budget before sending to consolidation LLM - Truncate raw_archive() output: cap history.jsonl entries at 16K chars

Truncate the "Recent History" section injected by build_system_prompt() to 32K chars. Without this, many accumulated history.jsonl entries could still bloat the system prompt even with per-entry truncation in place.

…h and history char cap Cover two untested boundaries from #3412: - _truncate_to_token_budget with positive budget exercises tiktoken - _MAX_HISTORY_CHARS caps Recent History section in system prompt Made-with: Cursor

Re-bin

This is the right kind of fix — it addresses a real cascading failure chain and puts the right guardrails in the right places.

What this PR does

Breaks a cascading failure where stuck consolidation (60-msg cap couldn't find a user turn) → mass raw_archive dumps → 1MB entries in history.jsonl → system prompt permanently exceeds context window. Four targeted changes:

Remove _cap_consolidation_boundary — the 60-message cap was fundamentally flawed: with long tool chains (200+ assistant turns), it searched backward for a user turn that didn't exist within the cap range, returned None, and aborted the entire consolidation loop. pick_consolidation_boundary alone is the correct gatekeeper.
Truncate archive() input — tiktoken-based truncation to the model's input token budget before sending to consolidation LLM.
Truncate raw_archive() output — 16K char cap as a safety net on history.jsonl entries.
Cap Recent History in build_system_prompt() — 32K char hard cap, defense-in-depth so accumulated entries can't bloat the system prompt.

Diff scope

4 files: nanobot/agent/memory.py, nanobot/agent/context.py, tests/agent/test_consolidator.py, tests/agent/test_context_prompt_cache.py. +145/−45.

Testing

All 2363 tests pass (0 failures) after merging latest main.
I added 2 focused regression tests that were missing:
- test_archive_truncates_via_tiktoken_with_positive_budget — exercises the tiktoken path with a realistic positive budget (the original 2 archive truncation tests both hit negative budget → char-based fallback, leaving the production-primary tiktoken path uncovered).
- test_recent_history_truncated_at_max_chars — locks the new _MAX_HISTORY_CHARS boundary in build_system_prompt().

Verdict

Direction is right, boundaries are clean, implementation solves the actual cascading problem without over-engineering. Ready to merge.

#3412 stopped the headline raw_archive bloat but left four adjacent leaks on the same pollution chain: - archive() success path appended uncapped LLM summaries to history.jsonl, so a misbehaving LLM could re-open the #3412 bug from the happy path. - maybe_consolidate_by_tokens did not advance last_consolidated when archive() fell back to raw_archive, causing duplicate [RAW] dumps of the same chunk on every subsequent call. - Dream's Phase 1/2 prompt injected MEMORY.md / SOUL.md / USER.md and each history entry without caps, so any legacy oversized record (or an unbounded user edit) would blow past the context window every dream. - append_history itself had no default cap, leaving future new callers one forgotten-cap-away from the same vector. Changes: - Cap LLM-produced summaries at 8K chars (_ARCHIVE_SUMMARY_MAX_CHARS) before writing to history.jsonl. - Advance session.last_consolidated after archive() regardless of whether it summarized or raw-archived — both outcomes materialize the chunk; still break the round loop on fallback so a degraded LLM isn't hammered. - Truncate MEMORY.md / SOUL.md / USER.md and each history entry in Dream's Phase 1 prompt preview (Phase 2 still reaches full files via read_file). - Add _HISTORY_ENTRY_HARD_CAP (64K) as belt-and-suspenders default in append_history with a once-per-store warning, so any new caller that forgets its own tighter cap gets caught and observable. Layer the caps by scope: raw_archive=16K, archive summary=8K, append_history default=64K. Tight per-caller values cover expected payloads; the wide default only catches regressions. Tests: +9 regression tests covering each fix. Full suite: 2372 passed. Made-with: Cursor

Upstream changes: - fix(agent): bound memory/history pollution paths (HKUDS#3412) - fix(agent): cap recent history section in system prompt (32K chars) - fix(agent): prevent history.jsonl bloat from raw_archive and stuck consolidation - test: add regression tests for tiktoken truncation and history char cap Conflict resolution: none needed (auto-merged cleanly). Post-merge fix: relaxed files_with_matches sort assertion to set comparison (ripgrep does not guarantee mtime ordering in this mode).

…h and history char cap Cover two untested boundaries from HKUDS#3412: - _truncate_to_token_budget with positive budget exercises tiktoken - _MAX_HISTORY_CHARS caps Recent History section in system prompt Made-with: Cursor

…#3412 HKUDS#3412 stopped the headline raw_archive bloat but left four adjacent leaks on the same pollution chain: - archive() success path appended uncapped LLM summaries to history.jsonl, so a misbehaving LLM could re-open the HKUDS#3412 bug from the happy path. - maybe_consolidate_by_tokens did not advance last_consolidated when archive() fell back to raw_archive, causing duplicate [RAW] dumps of the same chunk on every subsequent call. - Dream's Phase 1/2 prompt injected MEMORY.md / SOUL.md / USER.md and each history entry without caps, so any legacy oversized record (or an unbounded user edit) would blow past the context window every dream. - append_history itself had no default cap, leaving future new callers one forgotten-cap-away from the same vector. Changes: - Cap LLM-produced summaries at 8K chars (_ARCHIVE_SUMMARY_MAX_CHARS) before writing to history.jsonl. - Advance session.last_consolidated after archive() regardless of whether it summarized or raw-archived — both outcomes materialize the chunk; still break the round loop on fallback so a degraded LLM isn't hammered. - Truncate MEMORY.md / SOUL.md / USER.md and each history entry in Dream's Phase 1 prompt preview (Phase 2 still reaches full files via read_file). - Add _HISTORY_ENTRY_HARD_CAP (64K) as belt-and-suspenders default in append_history with a once-per-store warning, so any new caller that forgets its own tighter cap gets caught and observable. Layer the caps by scope: raw_archive=16K, archive summary=8K, append_history default=64K. Tight per-caller values cover expected payloads; the wide default only catches regressions. Tests: +9 regression tests covering each fix. Full suite: 2372 passed. Made-with: Cursor

chengyongru force-pushed the fix/consolidation-raw-archive branch from 5439998 to a435274 Compare April 23, 2026 17:34

chengyongru mentioned this pull request Apr 23, 2026

fix(agent): prevent history.jsonl bloat from raw_archive and stuck consolidation #3413

Merged

2 tasks

fix(agent): cap recent history section in system prompt

9b7207a

Truncate the "Recent History" section injected by build_system_prompt() to 32K chars. Without this, many accumulated history.jsonl entries could still bloat the system prompt even with per-entry truncation in place.

chengyongru mentioned this pull request Apr 23, 2026

fix(agent): cap recent history section in system prompt #3414

Merged

Re-bin approved these changes Apr 23, 2026

View reviewed changes

Re-bin merged commit 81a5af2 into main Apr 23, 2026
8 checks passed

Re-bin deleted the fix/consolidation-raw-archive branch April 23, 2026 19:58

Re-bin mentioned this pull request Apr 23, 2026

fix(agent): bound remaining memory/history pollution paths #3415

Merged

This was referenced Apr 24, 2026

🦞 OpenClaw 生态日报 2026-04-24 gsscsd/big_model_radar#236

Open

🦞 OpenClaw 生态日报 2026-04-24 borq168/big_model_radar#3

Open

🦞 OpenClaw Ecosystem Digest 2026-04-24 borq168/big_model_radar#6

Open

chengyongru mentioned this pull request Apr 28, 2026

The history.jsonl file is loaded into the context #3494

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): prevent history.jsonl bloat from raw_archive and stuck consolidation#3412

fix(agent): prevent history.jsonl bloat from raw_archive and stuck consolidation#3412
Re-bin merged 3 commits into
mainfrom
fix/consolidation-raw-archive

chengyongru commented Apr 23, 2026 •

edited

Loading

Uh oh!

Re-bin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chengyongru commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Test plan

Uh oh!

Re-bin left a comment

Choose a reason for hiding this comment

What this PR does

Diff scope

Testing

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chengyongru commented Apr 23, 2026 •

edited

Loading