Skip to content

Commit 8d87826

Browse files
derrickstoleedscho
authored andcommitted
Add path walk API and its use in 'git pack-objects' (#5171)
This is a follow up to #5157 as well as motivated by the RFC in gitgitgadget#1786. We have ways of walking all objects, but it is focused on visiting a single commit and then expanding the new trees and blobs reachable from that commit that have not been visited yet. This means that objects arrive without any locality based on their path. Add a new "path walk API" that focuses on walking objects in batches according to their type and path. This will walk all annotated tags, all commits, all root trees, and then start a depth-first search among all paths in the repo to collect trees and blobs in batches. The most important application for this is being fast-tracked to Git for Windows: `git pack-objects --path-walk`. This application of the path walk API discovers the objects to pack via this batched walk, and automatically groups objects that appear at a common path so they can be checked for delta comparisons. This use completely avoids any name-hash collisions (even the collisions that sometimes occur with the new `--full-name-hash` option) and can be much faster to compute since the first pass of delta calculations does not waste time on objects that are unlikely to be diffable. Some statistics are available in the commit messages.
2 parents 7da37e6 + 56a2d8d commit 8d87826

23 files changed

+534
-41
lines changed

Documentation/config/feature.adoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ walking fewer objects.
2020
+
2121
* `pack.allowPackReuse=multi` may improve the time it takes to create a pack by
2222
reusing objects from multiple packs instead of just one.
23+
+
24+
* `pack.usePathWalk` may speed up packfile creation and make the packfiles be
25+
significantly smaller in the presence of certain filename collisions with Git's
26+
default name-hash.
2327
2428
feature.manyFiles::
2529
Enable config options that optimize for repos with many files in the

Documentation/config/pack.adoc

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,14 @@ pack.useSparse::
155155
commits contain certain types of direct renames. Default is
156156
`true`.
157157

158+
pack.usePathWalk::
159+
When true, git will default to using the '--path-walk' option in
160+
'git pack-objects' when the '--revs' option is present. This
161+
algorithm groups objects by path to maximize the ability to
162+
compute delta chains across historical versions of the same
163+
object. This may disable other options, such as using bitmaps to
164+
enumerate objects.
165+
158166
pack.preferBitmapTips::
159167
When selecting which commits will receive bitmaps, prefer a
160168
commit at the tip of any reference that is a suffix of any value

Documentation/git-pack-objects.adoc

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ SYNOPSIS
1616
[--cruft] [--cruft-expiration=<time>]
1717
[--stdout [--filter=<filter-spec>] | <base-name>]
1818
[--shallow] [--keep-true-parents] [--[no-]sparse]
19-
[--name-hash-version=<n>] < <object-list>
19+
[--name-hash-version=<n>] [--path-walk] < <object-list>
2020

2121

2222
DESCRIPTION
@@ -375,6 +375,16 @@ many different directories. At the moment, this version is not allowed
375375
when writing reachability bitmap files with `--write-bitmap-index` and it
376376
will be automatically changed to version `1`.
377377
378+
--path-walk::
379+
By default, `git pack-objects` walks objects in an order that
380+
presents trees and blobs in an order unrelated to the path they
381+
appear relative to a commit's root tree. The `--path-walk` option
382+
enables a different walking algorithm that organizes trees and
383+
blobs by path. This has the potential to improve delta compression
384+
especially in the presence of filenames that cause collisions in
385+
Git's default name-hash algorithm. Due to changing how the objects
386+
are walked, this option is not compatible with `--delta-islands`,
387+
`--shallow`, or `--filter`.
378388
379389
DELTA ISLANDS
380390
-------------

Documentation/git-repack.adoc

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ SYNOPSIS
1111
[verse]
1212
'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]
1313
[--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>]
14-
[--write-midx] [--name-hash-version=<n>]
14+
[--write-midx] [--name-hash-version=<n>] [--path-walk]
1515

1616
DESCRIPTION
1717
-----------
@@ -258,6 +258,18 @@ linkgit:git-multi-pack-index[1]).
258258
Provide this argument to the underlying `git pack-objects` process.
259259
See linkgit:git-pack-objects[1] for full details.
260260

261+
--path-walk::
262+
This option passes the `--path-walk` option to the underlying
263+
`git pack-options` process (see linkgit:git-pack-objects[1]).
264+
By default, `git pack-objects` walks objects in an order that
265+
presents trees and blobs in an order unrelated to the path they
266+
appear relative to a commit's root tree. The `--path-walk` option
267+
enables a different walking algorithm that organizes trees and
268+
blobs by path. This has the potential to improve delta compression
269+
especially in the presence of filenames that cause collisions in
270+
Git's default name-hash algorithm. Due to changing how the objects
271+
are walked, this option is not compatible with `--delta-islands`
272+
or `--filter`.
261273

262274
CONFIGURATION
263275
-------------

Documentation/technical/api-path-walk.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,3 +70,4 @@ Examples
7070
See example usages in:
7171
`t/helper/test-path-walk.c`,
7272
`builtin/backfill.c`
73+
`builtin/pack-objects.c`

0 commit comments

Comments
 (0)