fix(cron): disable auto_save for cron agent jobs to prevent recursive memory bloat#5664
Conversation
… memory bloat Cron agent jobs inherit the global `auto_save = true` config, which saves the enriched prompt (including the [Memory context] wrapper) back into brain.db as a Conversation memory. On the next cron run, the agent loop's build_context() recalls that entry (no Conversation category filter), wraps it in another [Memory context], and saves an even larger entry. This creates exponential growth — observed entries of 2MB, 1MB, 533K in production, totalling 4.4MB across 15 context dumps. The bloated context causes `estimated tokens exceed budget` on every cron run, aggressive trimming destroys the actual prompt, and the provider call fails with exit code 1. Fix: set `auto_save = false` on the cloned config before calling agent::run() for cron jobs. Also add `[Memory context]` to the should_skip_autosave_content filter as defense-in-depth.
Agent Review — PR #5664Triage Result: Skipped — High-Risk PathComprehension Summary: This PR disables Why skipped: This PR modifies Initial observations (for the maintainer who picks this up):
|
Agent Review — Ready to MergeComprehension summary: This PR fixes recursive memory bloat in cron agent jobs. Cron jobs inherit Thank you, @guitaripod. This is an excellent bug fix with a thorough root cause analysis, production evidence, and a clean implementation. What was reviewed and verified:
Security/performance assessment:
CI Status: All checks pass (both CI and Quality Gate workflows, including CI Required Gate). Missing labels: No PR template: Not filled out in standard template format, but the PR body contains equivalent information for all required sections (summary, validation, security, rollback, risks/mitigations, blast radius). This PR is ready for maintainer merge.
|
Raw per-turn user messages are stored under 'user_msg' / 'user_msg_*' keys by the auto-save path. Without this fix, all three context-building callers were recalling and injecting these entries back into the LLM context window, causing exponential bloat: each new turn recalled the previous turn's full message (which itself contained all prior turns), growing unboundedly. Wire is_user_autosave_key() (introduced in the preceding commit) into: - build_context() in zeroclaw-runtime/src/agent/loop_.rs - DefaultMemoryLoader::load_context() in memory_loader.rs - should_skip_memory_context_entry() in zeroclaw-channels orchestrator Placement is consistent across all three callers: after is_assistant_autosave_key and before should_skip_autosave_content, maintaining filter ordering convention. Also renames the pass-through key in the existing build_context test from 'user_msg_real' (which would now be filtered) to 'user_preference', and adds three new tests — one per caller — verifying user_msg_* keys are excluded while non-prefixed semantic keys (user_preference, user_fact) pass through. Complementary to zeroclaw-labs#5664 (disabled auto_save on the cron write path); this PR addresses the read path across all context-assembly callers.
Summary
Cron agent jobs inherit the global
auto_save = truememory config. When the cron scheduler prepends[Memory context]to the prompt (from recalled memories), the agent loop's auto-save persists the entire enriched prompt — including the[Memory context]wrapper — back intobrain.dbas aConversationmemory.On the next cron run, the agent loop's
build_context()recalls that saved entry (it has noConversationcategory filter, unlike the cron scheduler's own recall at L286), wraps it in another[Memory context], and auto-saves an even larger entry. This creates exponential growth.Root cause
The existing
should_skip_autosave_content()guard catches messages starting with[cron:, but the cron scheduler prepends memory context before the cron prefix:So the skip check fails and auto-save proceeds.
Evidence from production
brain.dbafter ~2 weeks of daily cron jobs:user_msg_558a38ed…user_msg_76f44c38…user_msg_3257a4b4…user_msg_cf57c25f…Each entry is roughly 2x the previous — classic exponential doubling from the recursive save-recall-save cycle.
This caused the agent loop to log on every cron run:
After aggressive trimming, the actual cron prompt was destroyed and the provider call failed:
Fix
scheduler.rs: Setauto_save = falseon the cloned config before callingagent::run()for cron jobs. Cron prompts are synthetic — they should never be persisted as user conversation memories.lib.rs(defense-in-depth): Add[Memory context]toshould_skip_autosave_content()so that even if another code path passes an enriched prompt to auto-save, the synthetic wrapper is caught.Test plan
autosave_content_filter_drops_cron_and_distilled_noisetest updated with[Memory context]casecargo clippy --workspace --features ci-all -- -D warningsclean