Skip to content

Commit fd467d3

Browse files
derrickstoleedscho
authored andcommitted
Add path walk API and its use in 'git pack-objects' (#5171)
This is a follow up to #5157 as well as motivated by the RFC in gitgitgadget#1786. We have ways of walking all objects, but it is focused on visiting a single commit and then expanding the new trees and blobs reachable from that commit that have not been visited yet. This means that objects arrive without any locality based on their path. Add a new "path walk API" that focuses on walking objects in batches according to their type and path. This will walk all annotated tags, all commits, all root trees, and then start a depth-first search among all paths in the repo to collect trees and blobs in batches. The most important application for this is being fast-tracked to Git for Windows: `git pack-objects --path-walk`. This application of the path walk API discovers the objects to pack via this batched walk, and automatically groups objects that appear at a common path so they can be checked for delta comparisons. This use completely avoids any name-hash collisions (even the collisions that sometimes occur with the new `--full-name-hash` option) and can be much faster to compute since the first pass of delta calculations does not waste time on objects that are unlikely to be diffable. Some statistics are available in the commit messages.
2 parents df498b6 + 57d7866 commit fd467d3

23 files changed

+537
-42
lines changed

Documentation/config/feature.adoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ walking fewer objects.
2020
+
2121
* `pack.allowPackReuse=multi` may improve the time it takes to create a pack by
2222
reusing objects from multiple packs instead of just one.
23+
+
24+
* `pack.usePathWalk` may speed up packfile creation and make the packfiles be
25+
significantly smaller in the presence of certain filename collisions with Git's
26+
default name-hash.
2327
2428
feature.manyFiles::
2529
Enable config options that optimize for repos with many files in the

Documentation/config/pack.adoc

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,14 @@ pack.useSparse::
155155
commits contain certain types of direct renames. Default is
156156
`true`.
157157

158+
pack.usePathWalk::
159+
When true, git will default to using the '--path-walk' option in
160+
'git pack-objects' when the '--revs' option is present. This
161+
algorithm groups objects by path to maximize the ability to
162+
compute delta chains across historical versions of the same
163+
object. This may disable other options, such as using bitmaps to
164+
enumerate objects.
165+
158166
pack.preferBitmapTips::
159167
When selecting which commits will receive bitmaps, prefer a
160168
commit at the tip of any reference that is a suffix of any value

Documentation/git-pack-objects.adoc

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ SYNOPSIS
1616
[--cruft] [--cruft-expiration=<time>]
1717
[--stdout [--filter=<filter-spec>] | <base-name>]
1818
[--shallow] [--keep-true-parents] [--[no-]sparse]
19-
[--full-name-hash] < <object-list>
19+
[--full-name-hash] [--path-walk] < <object-list>
2020

2121

2222
DESCRIPTION
@@ -346,6 +346,16 @@ raise an error.
346346
Restrict delta matches based on "islands". See DELTA ISLANDS
347347
below.
348348
349+
--path-walk::
350+
By default, `git pack-objects` walks objects in an order that
351+
presents trees and blobs in an order unrelated to the path they
352+
appear relative to a commit's root tree. The `--path-walk` option
353+
enables a different walking algorithm that organizes trees and
354+
blobs by path. This has the potential to improve delta compression
355+
especially in the presence of filenames that cause collisions in
356+
Git's default name-hash algorithm. Due to changing how the objects
357+
are walked, this option is not compatible with `--delta-islands`,
358+
`--shallow`, or `--filter`.
349359
350360
DELTA ISLANDS
351361
-------------

Documentation/git-repack.adoc

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ SYNOPSIS
1111
[verse]
1212
'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]
1313
[--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>]
14-
[--write-midx] [--full-name-hash]
14+
[--write-midx] [--full-name-hash] [--path-walk]
1515

1616
DESCRIPTION
1717
-----------
@@ -251,6 +251,19 @@ linkgit:git-multi-pack-index[1]).
251251
Write a multi-pack index (see linkgit:git-multi-pack-index[1])
252252
containing the non-redundant packs.
253253

254+
--path-walk::
255+
This option passes the `--path-walk` option to the underlying
256+
`git pack-options` process (see linkgit:git-pack-objects[1]).
257+
By default, `git pack-objects` walks objects in an order that
258+
presents trees and blobs in an order unrelated to the path they
259+
appear relative to a commit's root tree. The `--path-walk` option
260+
enables a different walking algorithm that organizes trees and
261+
blobs by path. This has the potential to improve delta compression
262+
especially in the presence of filenames that cause collisions in
263+
Git's default name-hash algorithm. Due to changing how the objects
264+
are walked, this option is not compatible with `--delta-islands`
265+
or `--filter`.
266+
254267
CONFIGURATION
255268
-------------
256269

Documentation/technical/api-path-walk.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,4 +60,5 @@ Examples
6060
--------
6161

6262
See example usages in:
63-
`t/helper/test-path-walk.c`
63+
`t/helper/test-path-walk.c`,
64+
`builtin/pack-objects.c`

0 commit comments

Comments
 (0)