fix: prevent irreversible context loss when compaction archive write fails#754
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical data loss vulnerability within the compaction process. Previously, if the archival write of a summary or context failed, the system would still truncate old conversation turns, leading to irreversible loss of conversational history. The changes ensure that turns are only truncated if the archival persistence is successful, thereby safeguarding against data loss in scenarios where storage might be unavailable or partially failing. This significantly improves the reliability of automatic context management. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Pull request overview
Fixes a data-loss bug in the agent context compaction flow by ensuring old turns are only truncated after successful archival to the workspace, preventing irreversible history loss when persistence fails.
Changes:
- Make truncation conditional on successful workspace archival for both “summarize” and “move to workspace” strategies.
- Update warning logs to reflect that turns are preserved on archival failure.
- Add regression tests (libsql-gated) covering workspace write failure paths.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Generate summary | ||
| let summary = self.generate_summary(&to_summarize).await?; | ||
|
|
||
| // Write to workspace if available | ||
| let summary_written = if let Some(ws) = workspace { | ||
| // Write to workspace if available. | ||
| // If archival fails, preserve turns to avoid context loss. | ||
| let (summary_written, turns_removed) = if let Some(ws) = workspace { |
There was a problem hiding this comment.
In compact_with_summary, the summary is generated via an LLM call before attempting the workspace append. With the new behavior, a workspace write failure preserves turns (no truncation), which means this LLM call becomes pure cost/latency in the failure path and could repeat on subsequent compaction attempts while the workspace remains unhealthy. Consider adding a cheap workspace writability/migrations check before calling generate_summary(), or returning an error/explicit status when archival fails so callers can avoid retry loops and/or switch strategies without repeatedly invoking the LLM.
| // Write to workspace. If archival fails, preserve turns. | ||
| let (written, turns_removed) = match self.write_context_to_workspace(ws, &content).await { | ||
| Ok(()) => { | ||
| thread.truncate_turns(keep_recent); | ||
| (true, turns_to_remove) | ||
| } | ||
| Err(e) => { | ||
| tracing::warn!( | ||
| "Compaction context write failed (turns will still be truncated): {}", | ||
| e | ||
| ); | ||
| false | ||
| tracing::warn!("Compaction context write failed (turns preserved): {}", e); | ||
| (false, 0) | ||
| } |
There was a problem hiding this comment.
When write_context_to_workspace fails, compact_to_workspace now returns Ok with turns_removed = 0 and leaves the thread unmodified. This makes the operation effectively a no-op (aside from formatting work) and may lead to repeated compaction attempts without any token reduction while storage remains unavailable. Consider propagating the error (while still preserving turns), or extending the result with an explicit archival-failed/compaction-skipped signal so callers can notify the user and/or choose a fallback strategy intentionally.
There was a problem hiding this comment.
Code Review
This pull request correctly addresses a critical data-loss bug by making context truncation conditional on the successful archival of conversation turns. The changes are well-implemented and include valuable regression tests to cover the failure paths. My feedback focuses on refactoring the new logic in compact_with_summary and compact_to_workspace to improve readability and maintainability by simplifying the control flow.
| let (summary_written, turns_removed) = if let Some(ws) = workspace { | ||
| match self.write_summary_to_workspace(ws, &summary).await { | ||
| Ok(()) => true, | ||
| Ok(()) => { | ||
| thread.truncate_turns(keep_recent); | ||
| (true, turns_to_remove) | ||
| } | ||
| Err(e) => { | ||
| tracing::warn!( | ||
| "Compaction summary write failed (turns will still be truncated): {}", | ||
| e | ||
| ); | ||
| false | ||
| tracing::warn!("Compaction summary write failed (turns preserved): {}", e); | ||
| (false, 0) | ||
| } | ||
| } | ||
| } else { | ||
| false | ||
| thread.truncate_turns(keep_recent); | ||
| (false, turns_to_remove) | ||
| }; | ||
|
|
||
| // Truncate thread | ||
| thread.truncate_turns(keep_recent); | ||
|
|
||
| Ok(CompactionPartial { | ||
| turns_removed: turns_to_remove, | ||
| turns_removed, | ||
| summary_written, | ||
| summary: Some(summary), | ||
| }) |
There was a problem hiding this comment.
While the logic is correct, this block can be refactored to be more linear and less repetitive, improving readability. Using an early return on write failure can simplify the control flow.
let summary_written = if let Some(ws) = workspace {
if let Err(e) = self.write_summary_to_workspace(ws, &summary).await {
tracing::warn!("Compaction summary write failed (turns preserved): {}", e);
// On write failure, don't truncate. Return early with summary but no turns removed.
return Ok(CompactionPartial {
turns_removed: 0,
summary_written: false,
summary: Some(summary),
});
}
true // Summary was written successfully.
} else {
false // No workspace, so no summary written.
};
// Truncate turns if summary was successfully written, or if no workspace was provided.
thread.truncate_turns(keep_recent);
Ok(CompactionPartial {
turns_removed: turns_to_remove,
summary_written,
summary: Some(summary),
})| // Write to workspace. If archival fails, preserve turns. | ||
| let (written, turns_removed) = match self.write_context_to_workspace(ws, &content).await { | ||
| Ok(()) => { | ||
| thread.truncate_turns(keep_recent); | ||
| (true, turns_to_remove) | ||
| } | ||
| Err(e) => { | ||
| tracing::warn!( | ||
| "Compaction context write failed (turns will still be truncated): {}", | ||
| e | ||
| ); | ||
| false | ||
| tracing::warn!("Compaction context write failed (turns preserved): {}", e); | ||
| (false, 0) | ||
| } | ||
| }; | ||
|
|
||
| // Truncate | ||
| thread.truncate_turns(keep_recent); | ||
|
|
||
| Ok(CompactionPartial { | ||
| turns_removed: turns_to_remove, | ||
| turns_removed, | ||
| summary_written: written, | ||
| summary: None, | ||
| }) |
There was a problem hiding this comment.
Similar to compact_with_summary, this block can be refactored for better readability and to reduce complexity. Using an early return on write failure simplifies the control flow.
| // Write to workspace. If archival fails, preserve turns. | |
| let (written, turns_removed) = match self.write_context_to_workspace(ws, &content).await { | |
| Ok(()) => { | |
| thread.truncate_turns(keep_recent); | |
| (true, turns_to_remove) | |
| } | |
| Err(e) => { | |
| tracing::warn!( | |
| "Compaction context write failed (turns will still be truncated): {}", | |
| e | |
| ); | |
| false | |
| tracing::warn!("Compaction context write failed (turns preserved): {}", e); | |
| (false, 0) | |
| } | |
| }; | |
| // Truncate | |
| thread.truncate_turns(keep_recent); | |
| Ok(CompactionPartial { | |
| turns_removed: turns_to_remove, | |
| turns_removed, | |
| summary_written: written, | |
| summary: None, | |
| }) | |
| // Write to workspace. If archival fails, preserve turns. | |
| if let Err(e) = self.write_context_to_workspace(ws, &content).await { | |
| tracing::warn!("Compaction context write failed (turns preserved): {}", e); | |
| return Ok(CompactionPartial::empty()); | |
| } | |
| // Truncate since write was successful. | |
| thread.truncate_turns(keep_recent); | |
| Ok(CompactionPartial { | |
| turns_removed: turns_to_remove, | |
| summary_written: true, | |
| summary: None, | |
| }) |
zmanian
left a comment
There was a problem hiding this comment.
Fix is correct, well-scoped, and addresses a genuine data-integrity bug. Making truncation conditional on successful archival is the right pattern. Two regression tests using real failure paths (unmigrated libsql backend) rather than mocks.
Minor notes (non-blocking):
- Tests are gated on
#[cfg(feature = "libsql")]-- won't run with default features, but pragmatic CompactionResult.summary_writtenfield name is misleading for MoveToWorkspace strategy (pre-existing)
|
CI is failing on formatting. Please run - .compact(&mut thread, CompactionStrategy::MoveToWorkspace, Some(&workspace))
+ .compact(
+ &mut thread,
+ CompactionStrategy::MoveToWorkspace,
+ Some(&workspace),
+ )Auto-merge is enabled and will trigger once CI passes. |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@zmanian Thanks a lot for the review and the thoughtful notes. Really appreciate it, and looking forward to contributing more here. Since |
zmanian
left a comment
There was a problem hiding this comment.
All CI checks passing after the formatting fix. Approving for auto-merge.
…fails (nearai#754) * fix(compaction): preserve turns when archival write fails * style: cargo fmt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Zaki <zaki@iqlusion.io> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…fails (nearai#754) * fix(compaction): preserve turns when archival write fails * style: cargo fmt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Zaki <zaki@iqlusion.io> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Compaction currently truncates old turns even when archival writes fail. That creates irreversible context loss exactly in the failure path and can silently destroy conversation history.
This PR makes truncation conditional on successful archival persistence.
What changed
Summarizecompaction now preserves turns whenwrite_summary_to_workspacefails.MoveToWorkspacecompaction now preserves turns whenwrite_context_to_workspacefails.Why this is severe
When storage is unavailable or partially failing, the prior behavior deleted context without backup. This is a data-loss bug in a reliability-critical path (automatic context management under pressure).
Validation
OPENSSL_DIR=/usr OPENSSL_LIB_DIR=/usr/lib/x86_64-linux-gnu OPENSSL_INCLUDE_DIR=/usr/include cargo test --lib agent::compaction::tests