Skip to content

fix: prevent irreversible context loss when compaction archive write fails#754

Merged
zmanian merged 2 commits intonearai:mainfrom
pikaxinge:fix/compaction-no-data-loss
Mar 9, 2026
Merged

fix: prevent irreversible context loss when compaction archive write fails#754
zmanian merged 2 commits intonearai:mainfrom
pikaxinge:fix/compaction-no-data-loss

Conversation

@pikaxinge
Copy link
Copy Markdown
Contributor

@pikaxinge pikaxinge commented Mar 9, 2026

Summary

Compaction currently truncates old turns even when archival writes fail. That creates irreversible context loss exactly in the failure path and can silently destroy conversation history.

This PR makes truncation conditional on successful archival persistence.

What changed

  • Summarize compaction now preserves turns when write_summary_to_workspace fails.
  • MoveToWorkspace compaction now preserves turns when write_context_to_workspace fails.
  • Updated warning logs to explicitly state turns are preserved on write failure.
  • Added regression tests covering both failure paths.

Why this is severe

When storage is unavailable or partially failing, the prior behavior deleted context without backup. This is a data-loss bug in a reliability-critical path (automatic context management under pressure).

Validation

  • OPENSSL_DIR=/usr OPENSSL_LIB_DIR=/usr/lib/x86_64-linux-gnu OPENSSL_INCLUDE_DIR=/usr/include cargo test --lib agent::compaction::tests
  • Result: 20 passed, 0 failed

Copilot AI review requested due to automatic review settings March 9, 2026 14:31
@github-actions github-actions Bot added scope: agent Agent core (agent loop, router, scheduler) size: M 50-199 changed lines labels Mar 9, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical data loss vulnerability within the compaction process. Previously, if the archival write of a summary or context failed, the system would still truncate old conversation turns, leading to irreversible loss of conversational history. The changes ensure that turns are only truncated if the archival persistence is successful, thereby safeguarding against data loss in scenarios where storage might be unavailable or partially failing. This significantly improves the reliability of automatic context management.

Highlights

  • Data Preservation: Compaction now preserves conversation turns when archival writes to the workspace fail, preventing irreversible context loss.
  • Improved Logging: Warning logs have been updated to explicitly state that turns are preserved when a write failure occurs during compaction.
  • Regression Tests: New regression tests were added to cover both summary and context archival write failure paths, ensuring the new behavior is correctly implemented.
Changelog
  • src/agent/compaction.rs
    • Modified the compact function to conditionally truncate turns only after a successful summary write to the workspace.
    • Modified the compact_to_workspace function to conditionally truncate turns only after a successful context write to the workspace.
    • Updated warning messages to clearly indicate that turns are preserved when archival writes fail.
    • Added a new helper function make_unmigrated_workspace to facilitate testing of write failure scenarios.
    • Introduced test_compact_with_summary_preserves_turns_when_workspace_write_fails to validate summary compaction behavior during write failures.
    • Added test_compact_to_workspace_preserves_turns_when_workspace_write_fails to validate context compaction behavior during write failures.
Activity
  • 20 tests were run as part of the validation process.
  • All 20 tests passed successfully with 0 failures, 0 ignored, and 0 measured.
  • The test suite completed in 0.01 seconds.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions Bot added risk: medium Business logic, config, or moderate-risk modules contributor: new First-time contributor labels Mar 9, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a data-loss bug in the agent context compaction flow by ensuring old turns are only truncated after successful archival to the workspace, preventing irreversible history loss when persistence fails.

Changes:

  • Make truncation conditional on successful workspace archival for both “summarize” and “move to workspace” strategies.
  • Update warning logs to reflect that turns are preserved on archival failure.
  • Add regression tests (libsql-gated) covering workspace write failure paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/agent/compaction.rs
Comment on lines 103 to +108
// Generate summary
let summary = self.generate_summary(&to_summarize).await?;

// Write to workspace if available
let summary_written = if let Some(ws) = workspace {
// Write to workspace if available.
// If archival fails, preserve turns to avoid context loss.
let (summary_written, turns_removed) = if let Some(ws) = workspace {
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In compact_with_summary, the summary is generated via an LLM call before attempting the workspace append. With the new behavior, a workspace write failure preserves turns (no truncation), which means this LLM call becomes pure cost/latency in the failure path and could repeat on subsequent compaction attempts while the workspace remains unhealthy. Consider adding a cheap workspace writability/migrations check before calling generate_summary(), or returning an error/explicit status when archival fails so callers can avoid retry loops and/or switch strategies without repeatedly invoking the LLM.

Copilot uses AI. Check for mistakes.
Comment thread src/agent/compaction.rs
Comment on lines +167 to 176
// Write to workspace. If archival fails, preserve turns.
let (written, turns_removed) = match self.write_context_to_workspace(ws, &content).await {
Ok(()) => {
thread.truncate_turns(keep_recent);
(true, turns_to_remove)
}
Err(e) => {
tracing::warn!(
"Compaction context write failed (turns will still be truncated): {}",
e
);
false
tracing::warn!("Compaction context write failed (turns preserved): {}", e);
(false, 0)
}
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When write_context_to_workspace fails, compact_to_workspace now returns Ok with turns_removed = 0 and leaves the thread unmodified. This makes the operation effectively a no-op (aside from formatting work) and may lead to repeated compaction attempts without any token reduction while storage remains unavailable. Consider propagating the error (while still preserving turns), or extending the result with an explicit archival-failed/compaction-skipped signal so callers can notify the user and/or choose a fallback strategy intentionally.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a critical data-loss bug by making context truncation conditional on the successful archival of conversation turns. The changes are well-implemented and include valuable regression tests to cover the failure paths. My feedback focuses on refactoring the new logic in compact_with_summary and compact_to_workspace to improve readability and maintainability by simplifying the control flow.

Comment thread src/agent/compaction.rs
Comment on lines +108 to 128
let (summary_written, turns_removed) = if let Some(ws) = workspace {
match self.write_summary_to_workspace(ws, &summary).await {
Ok(()) => true,
Ok(()) => {
thread.truncate_turns(keep_recent);
(true, turns_to_remove)
}
Err(e) => {
tracing::warn!(
"Compaction summary write failed (turns will still be truncated): {}",
e
);
false
tracing::warn!("Compaction summary write failed (turns preserved): {}", e);
(false, 0)
}
}
} else {
false
thread.truncate_turns(keep_recent);
(false, turns_to_remove)
};

// Truncate thread
thread.truncate_turns(keep_recent);

Ok(CompactionPartial {
turns_removed: turns_to_remove,
turns_removed,
summary_written,
summary: Some(summary),
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While the logic is correct, this block can be refactored to be more linear and less repetitive, improving readability. Using an early return on write failure can simplify the control flow.

        let summary_written = if let Some(ws) = workspace {
            if let Err(e) = self.write_summary_to_workspace(ws, &summary).await {
                tracing::warn!("Compaction summary write failed (turns preserved): {}", e);
                // On write failure, don't truncate. Return early with summary but no turns removed.
                return Ok(CompactionPartial {
                    turns_removed: 0,
                    summary_written: false,
                    summary: Some(summary),
                });
            }
            true // Summary was written successfully.
        } else {
            false // No workspace, so no summary written.
        };

        // Truncate turns if summary was successfully written, or if no workspace was provided.
        thread.truncate_turns(keep_recent);

        Ok(CompactionPartial {
            turns_removed: turns_to_remove,
            summary_written,
            summary: Some(summary),
        })

Comment thread src/agent/compaction.rs
Comment on lines +167 to 183
// Write to workspace. If archival fails, preserve turns.
let (written, turns_removed) = match self.write_context_to_workspace(ws, &content).await {
Ok(()) => {
thread.truncate_turns(keep_recent);
(true, turns_to_remove)
}
Err(e) => {
tracing::warn!(
"Compaction context write failed (turns will still be truncated): {}",
e
);
false
tracing::warn!("Compaction context write failed (turns preserved): {}", e);
(false, 0)
}
};

// Truncate
thread.truncate_turns(keep_recent);

Ok(CompactionPartial {
turns_removed: turns_to_remove,
turns_removed,
summary_written: written,
summary: None,
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to compact_with_summary, this block can be refactored for better readability and to reduce complexity. Using an early return on write failure simplifies the control flow.

Suggested change
// Write to workspace. If archival fails, preserve turns.
let (written, turns_removed) = match self.write_context_to_workspace(ws, &content).await {
Ok(()) => {
thread.truncate_turns(keep_recent);
(true, turns_to_remove)
}
Err(e) => {
tracing::warn!(
"Compaction context write failed (turns will still be truncated): {}",
e
);
false
tracing::warn!("Compaction context write failed (turns preserved): {}", e);
(false, 0)
}
};
// Truncate
thread.truncate_turns(keep_recent);
Ok(CompactionPartial {
turns_removed: turns_to_remove,
turns_removed,
summary_written: written,
summary: None,
})
// Write to workspace. If archival fails, preserve turns.
if let Err(e) = self.write_context_to_workspace(ws, &content).await {
tracing::warn!("Compaction context write failed (turns preserved): {}", e);
return Ok(CompactionPartial::empty());
}
// Truncate since write was successful.
thread.truncate_turns(keep_recent);
Ok(CompactionPartial {
turns_removed: turns_to_remove,
summary_written: true,
summary: None,
})

zmanian
zmanian previously approved these changes Mar 9, 2026
Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix is correct, well-scoped, and addresses a genuine data-integrity bug. Making truncation conditional on successful archival is the right pattern. Two regression tests using real failure paths (unmigrated libsql backend) rather than mocks.

Minor notes (non-blocking):

  • Tests are gated on #[cfg(feature = "libsql")] -- won't run with default features, but pragmatic
  • CompactionResult.summary_written field name is misleading for MoveToWorkspace strategy (pre-existing)

@zmanian zmanian enabled auto-merge (squash) March 9, 2026 15:37
@zmanian
Copy link
Copy Markdown
Collaborator

zmanian commented Mar 9, 2026

CI is failing on formatting. Please run cargo fmt -- there's one line in src/agent/compaction.rs:665 where the .compact() call needs to be broken across multiple lines:

-            .compact(&mut thread, CompactionStrategy::MoveToWorkspace, Some(&workspace))
+            .compact(
+                &mut thread,
+                CompactionStrategy::MoveToWorkspace,
+                Some(&workspace),
+            )

Auto-merge is enabled and will trigger once CI passes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pikaxinge
Copy link
Copy Markdown
Contributor Author

@zmanian Thanks a lot for the review and the thoughtful notes. Really appreciate it, and looking forward to contributing more here. Since
those are non-blocking, would you prefer that I address them in this PR,?

Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All CI checks passing after the formatting fix. Approving for auto-merge.

@zmanian zmanian merged commit e86b372 into nearai:main Mar 9, 2026
22 checks passed
@zmanian zmanian mentioned this pull request Mar 9, 2026
@github-actions github-actions Bot mentioned this pull request Mar 10, 2026
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
…fails (nearai#754)

* fix(compaction): preserve turns when archival write fails

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Zaki <zaki@iqlusion.io>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
drchirag1991 pushed a commit to drchirag1991/ironclaw that referenced this pull request Apr 8, 2026
…fails (nearai#754)

* fix(compaction): preserve turns when archival write fails

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Zaki <zaki@iqlusion.io>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: new First-time contributor risk: medium Business logic, config, or moderate-risk modules scope: agent Agent core (agent loop, router, scheduler) size: M 50-199 changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants