Skip to content

refactor:improve checkpoint and ensure gc to improve disk space #3695

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gauravsaini
Copy link

@gauravsaini gauravsaini commented May 17, 2025

Related GitHub Issue

Closes #3391
Closes #3348
Closes #3080

Description

Garbage collection and perf improvements
Upgrade to git 2.49.0 for additional benefits

Test Procedure

Unit tests added
Sanity test on local

Type of Change

  • 🐛 Bug Fix: Non-breaking change that fixes an issue.
  • New Feature: Non-breaking change that adds functionality.
  • 💥 Breaking Change: Fix or feature that would cause existing functionality to not work as expected.
  • ♻️ Refactor: Code change that neither fixes a bug nor adds a feature.
  • 💅 Style: Changes that do not affect the meaning of the code (white-space, formatting, etc.).
  • 📚 Documentation: Updates to documentation files.
  • ⚙️ Build/CI: Changes to the build process or CI configuration.
  • 🧹 Chore: Other changes that don't modify src or test files.

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Code Quality:
    • My code adheres to the project's style guidelines.
    • There are no new linting errors or warnings (npm run lint).
    • All debug code (e.g., console.log) has been removed.
  • Testing:
    • New and/or updated tests have been added to cover my changes.
    • All tests pass locally (npm test).
    • The application builds successfully with my changes.
  • Branch Hygiene: My branch is up-to-date (rebased) with the main branch.
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Changeset: A changeset has been created using npm run changeset if this PR includes user-facing changes or dependency updates.
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

Documentation Updates

Additional Notes


Important

Refactor ShadowCheckpointService to improve performance and disk space management by caching nested git paths, optimizing file staging, and introducing garbage collection.

  • Performance Improvements:
    • Cache nested git repo paths in ShadowCheckpointService to avoid repeated scans.
    • Optimize stageAll() to use git status for specific file changes instead of git add ..
    • Modify renameNestedGitRepos() to use cached paths.
  • Garbage Collection:
    • Run git repack during initialization if a shadow repo exists.
    • Periodically run git gc after every 20 checkpoints in saveCheckpoint().
    • Run git gc --prune=now after deleting a branch in deleteBranch().
  • Diff Calculation:
    • Improve getDiff() to use git diff --name-status for precise change tracking and handle file content retrieval more accurately.

This description was created by Ellipsis for e72e069. You can customize this summary. It will automatically update as commits are pushed.

Copy link

changeset-bot bot commented May 17, 2025

⚠️ No Changeset found

Latest commit: 4de36f9

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label May 17, 2025
@hannesrudolph
Copy link
Collaborator

@gauravsaini would you be able to convert the PR to draft and generate and link to an issue please?

@gauravsaini gauravsaini force-pushed the checkpoint-perf-improvement branch from e72e069 to 43387b1 Compare May 18, 2025 02:46
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
@gauravsaini gauravsaini changed the title WIP refactor:improve checkpoint and ensure gc to improve disk space refactor:improve checkpoint and ensure gc to improve disk space May 18, 2025
@gauravsaini
Copy link
Author

gauravsaini commented May 18, 2025

@hannesrudolph
Hey, I've added unit tests to make sure that it works fine . I've also added possible issues it might fix.
Ellipses dev bot gives a very good summary of the changes

@hannesrudolph
Copy link
Collaborator

@cte would you be able to review this before it gets outdated? Looks promising.

@hannesrudolph hannesrudolph moved this from New to PR [Pre Approval Review] in Roo Code Roadmap May 20, 2025
@hannesrudolph hannesrudolph moved this from PR [Needs Review] to TEMP in Roo Code Roadmap May 26, 2025
@daniel-lxs daniel-lxs moved this from TEMP to PR [Needs Review] in Roo Code Roadmap May 27, 2025
@daniel-lxs
Copy link
Collaborator

Hey @gauravsaini, thank you for your contribution, I really like your implementation.
I noticed some comments that probably need to be removed:
// --- CHANGE START:
// --- CHANGE END:
// --- ADDITION:

Since they don't add useful information to the code itself.

// Copied
filesToAdd.push(filePath) // Add the new path
}
// Other statuses like 'U' (unmerged) might need specific handling if relevant
Copy link
Collaborator

@daniel-lxs daniel-lxs May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @gauravsaini, quick thought on the new stageAll logic: how does it handle files that were part of a merge conflict and have been resolved in the working directory but not yet git added? The if/else chain doesn't seem to explicitly cover 'U' (unmerged) statuses. Just wondering if this might mean resolved conflicts aren't always included in checkpoints?

@daniel-lxs daniel-lxs moved this from PR [Needs Preliminary Review] to PR [Needs Review] in Roo Code Roadmap May 27, 2025
@KJ7LNW
Copy link
Collaborator

KJ7LNW commented May 28, 2025

Question: Can .git/ for checkpoints be moved to .roo/checkpoints/.git ?

It would be extremely useful for the out-of-band per-workspace git tree, (ie, .roo/checkpoints/.git) because then users could do something like git remote add roo .roo/checkpoints) and work with task branches accordingly. Future pull requests could allow branch names like roo/task/<TASK_ID> to be renamed by users in the task title bar, or AI could come up with a reasonable branch name for each task something like roo/task/<TASK_ID>-human-readable-name.

AI profiles could be given instructions to work with that repository to see and integrate into the existing main repository, mostly for cherry picks because it is only a partial file tree, but maybe other options as well. Here are some other really cool ideas (o4-mini).

I especially like the feature that provide to ask the AI things like search for the task that does X and integrate the change Y into this current task --- I can not tell you how many times I have wanted to find something in my ancient task history, but it is seriously difficult because there is so much and the search tool really is not sufficient:

By giving Roo a small set of Git primitives over .roo/checkpoints.git plus these inspection commands, you get full “time-travel” and merge strategies for just the files Roo touched—and you can ask the AI to audit, summarize, or triage any task before it ever lands in your main codebase.
Here’s a much shorter sketch of how you’d teach Roo (and your VS Code extension) to work with a private “.roo/.git” alongside your main repo—and especially how to get it to inspect tasks before you ever merge anything:

  1. Core “Roo Git” Verbs
    • list-tasks
    – List all roo/task/* branches (with status: in-progress, done, abandoned).
    • diff-task
    git diff main…roo/task/123 (or any two checkpoints on that branch).
    • cherry-pick-task
    – Apply one or all commits from roo/task/123 into your current work branch.
    • rebase-task
    – Rebase the task branch onto the latest main.
    • sandbox-task
    – Create a throw-away worktree: git worktree add .roo/sandbox/123 roo/task/123 for isolated testing.
    • prune-tasks
    – Delete or archive stale/merged/abandoned task branches automatically.

  2. AI Inspection Hooks
    Teach Roo to query and summarize any task branch before you merge or prune it:

    a. change-summary
    – “What’s the high-level purpose of task 123?”
    – AI parses the full diff and emits a bullet-list: feature added, bug fixed, files/modules touched.

    b. impact-analysis
    – “Which public APIs, config files, or documentation does task 123 affect?”
    – AI reads filenames/AST to highlight potential ripple effects.

    c. conflict-risk
    – “Where might merging task 123 into main conflict?”
    – AI scans overlapping hunks or related modules and flags hotspots.

    d. quality-audit
    – “Run lint/tests on sandbox 123 and summarize failures or coverage gaps.”
    – AI invokes your CI locally, collates errors, suggests fixes.

    e. security/perf review
    – “Any new TODOs, insecure patterns, or performance regressions in task 123?”
    – AI greps for e.g. raw SQL, disables asserts, expensive loops, etc., then reports.

  3. Common Inspection Scenarios
    • Before merging a big refactor—get a natural-language summary + risk map so you know where to spot-check.
    • On an abandoned branch—ask “Why did I abandon task 234?” and AI reviews commits/comments to surface the blocker.
    • When parallel tasks touch similar modules—have AI compare two branches and advise which to cherry-pick first.
    • After an AI-generated feature—“Generate a PR description and list missing tests” so you can complete the review.
    • During cleanup—“Which abandoned tasks haven’t run tests in 30 days?” or “Which feature tasks never got finished?”

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR - Needs Review size:L This PR changes 100-499 lines, ignoring generated files.
Projects
Status: PR [Needs Review]
Development

Successfully merging this pull request may close these issues.

tasks vanished (from log aswell) Tasks not loading Checkpoints creating excessive disk usage (40GB+) in VSCode global storage
4 participants