-
Notifications
You must be signed in to change notification settings - Fork 140
Commit-graph: Write incremental files #184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Commit-graph: Write incremental files #184
Conversation
6ce0285
to
3c52385
Compare
/submit |
Submitted as [email protected] |
On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):
|
3c52385
to
c088c0e
Compare
a891a13
to
72fc0a1
Compare
/submit |
Submitted as [email protected] |
72fc0a1
to
1a45793
Compare
1a45793
to
1549126
Compare
1549126
to
ca41bf0
Compare
/submit |
Submitted as [email protected] |
This patch series was integrated into pu via git@176fcc2. |
Submitted as [email protected] |
This patch series was integrated into pu via git@3e97402. |
This patch series was integrated into pu via git@28e1fac. |
This patch series was integrated into pu via git@fcb5eb2. |
This patch series was integrated into pu via git@55a58a6. |
This patch series was integrated into pu via git@34048e9. |
This patch series was integrated into pu via git@bbcd30c. |
This patch series was integrated into pu via git@67a44b2. |
This patch series was integrated into pu via git@a389eb0. |
This patch series was integrated into pu via git@adf5366. |
This patch series was integrated into pu via git@76acd85. |
This patch series was integrated into pu via git@eff1a8c. |
This patch series was integrated into pu via git@4ba9254. |
This patch series was integrated into next via git@5dee5ed. |
This patch series was integrated into pu via git@63a594c. |
This patch series was integrated into pu via git@0371b27. |
This patch series was integrated into pu via git@21fb063. |
This patch series was integrated into pu via git@d10b1d6. |
This patch series was integrated into pu via git@92b1ea6. |
This patch series was integrated into next via git@92b1ea6. |
This patch series was integrated into master via git@92b1ea6. |
Closed via 92b1ea6. |
This version is now ready for review.
The commit-graph is a valuable performance feature for repos with large commit histories, but suffers from the same problem as
git repack
: it rewrites the entire file every time. This can be slow when there are millions of commits, especially after we stopped reading from the commit-graph file during a write in 43d3561 (commit-graph write: don't die if the existing graph is corrupt).Instead, create a "chain" of commit-graphs in the .git/objects/info/commit-graphs folder with name graph-{hash}.graph. The list of hashes is given by the commit-graph-chain file, and also in a "base graph chunk" in the commit-graph format. As we read a chain, we can verify that the hashes match the trailing hash of each commit-graph we read along the way and each hash below a level is expected by that graph file.
When writing, we don't always want to add a new level to the stack. This would eventually result in performance degradation, especially when searching for a commit (before we know its graph position). We decide to merge levels of the stack when the new commits we will write is less than half of the commits in the level above. This can be tweaked by the --size-multiple and --max-commits options.
The performance is necessarily amortized across multiple writes, so I tested by writing commit-graphs from the (non-rc) tags in the Linux repo. My test included 72 tags, and wrote everything reachable from the tag using
--stdin-commits
. Here are the overall perf numbers:Updates in V3:
git commit-graph verify
now works on commit-graph chains. We do a simple test to check the behavior of a new--shallow
option.When someone writes a flat commit-graph, we now expire the old chain according to the expire time.
The "max commits" limit is no longer enabled by default, but instead is enabled by a
--max-commits=<n>
option. Ignored if n=0.Updates in V4:
Johannes pointed out some test failures on the Windows platform. We found that the tests were not running on Windows in the gitgitgadget PR builds, which is now resolved.
We need to close commit-graphs recursively down the chain. This prevented an unlink() from working because of an open handle.
Creating the alternates file used a path-specification that didn't work on Windows.
Renaming a file to the same name failed, but is probably related to the unlink() error mentioned above.
Updates in V5:
Responding to multiple items of feedback. Thanks Philip, Junio, and Ramsay!
Used the test coverage report to find holes in the test coverage. While adding tests, I found a bug in octopus merges. The fix is in the rewrite of "deduplicate_commits()" as "sort_and_scan_merged_commits()" and covered by the new tests.
Updates in V6:
Rebased onto ds/close-object-store and resolved conflicts around close_commit_graph().
Updated path normalization to be resilient to double-slashes and trailing slashes.
Added a prepare_alt_odb() call in load_commit_graph_one() for cross-alternate graph loads during 'verify' subcommands.
Thanks,
-Stolee
[1] git@43d3561
commit-graph write: don't die if the existing graph is corrupt
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]