Skip to content

Commit d33fd18

Browse files
committed
Merge pull request git-for-windows#24 Match multi-pack-index feature from upstream
This includes commits that fixup!-revert all the midx-related commits from our GVFS branch and replaces them with the exact commits that are being merged upstream. This should automatically remove the commits during our next version rebase-and-merge action. Changes upstream: - The builtin is called 'git multi-pack-index'. - The command-line takes a 'write' verb and an '--object-dir' parameter. - We no longer have a 'midx-head' or '*.midx' files. - Instead, we have a 'multi-pack-index' file in the pack-dir. - It no longer makes sense to specify '--update-head'
2 parents f522f1b + 9d0f2de commit d33fd18

33 files changed

+2125
-67
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,8 +102,9 @@
102102
/git-mergetool--lib
103103
/git-mktag
104104
/git-mktree
105-
/git-name-rev
105+
/git-multi-pack-index
106106
/git-mv
107+
/git-name-rev
107108
/git-notes
108109
/git-p4
109110
/git-pack-redundant

Documentation/config.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -966,6 +966,11 @@ core.useReplaceRefs::
966966
option was given on the command line. See linkgit:git[1] and
967967
linkgit:git-replace[1] for more information.
968968

969+
core.multiPackIndex::
970+
Use the multi-pack-index file to track multiple packfiles using a
971+
single index. See link:technical/multi-pack-index.html[the
972+
multi-pack-index design document].
973+
969974
core.gvfs::
970975
Enable the features needed for GVFS. This value can be set to true
971976
to indicate all features should be turned on or the bit values listed
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
git-multi-pack-index(1)
2+
=======================
3+
4+
NAME
5+
----
6+
git-multi-pack-index - Write and verify multi-pack-indexes
7+
8+
9+
SYNOPSIS
10+
--------
11+
[verse]
12+
'git multi-pack-index' [--object-dir=<dir>] <verb>
13+
14+
DESCRIPTION
15+
-----------
16+
Write or verify a multi-pack-index (MIDX) file.
17+
18+
OPTIONS
19+
-------
20+
21+
--object-dir=<dir>::
22+
Use given directory for the location of Git objects. We check
23+
`<dir>/packs/multi-pack-index` for the current MIDX file, and
24+
`<dir>/packs` for the pack-files to index.
25+
26+
write::
27+
When given as the verb, write a new MIDX file to
28+
`<dir>/packs/multi-pack-index`.
29+
30+
verify::
31+
When given as the verb, verify the contents of the MIDX file
32+
at `<dir>/packs/multi-pack-index`.
33+
34+
35+
EXAMPLES
36+
--------
37+
38+
* Write a MIDX file for the packfiles in the current .git folder.
39+
+
40+
-----------------------------------------------
41+
$ git multi-pack-index write
42+
-----------------------------------------------
43+
44+
* Write a MIDX file for the packfiles in an alternate object store.
45+
+
46+
-----------------------------------------------
47+
$ git multi-pack-index --object-dir <alt> write
48+
-----------------------------------------------
49+
50+
* Verify the MIDX file for the packfiles in the current .git folder.
51+
+
52+
-----------------------------------------------
53+
$ git multi-pack-index verify
54+
-----------------------------------------------
55+
56+
57+
SEE ALSO
58+
--------
59+
See link:technical/multi-pack-index.html[The Multi-Pack-Index Design
60+
Document] and link:technical/pack-format.html[The Multi-Pack-Index
61+
Format] for more information on the multi-pack-index feature.
62+
63+
64+
GIT
65+
---
66+
Part of the linkgit:git[1] suite
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
Multi-Pack-Index (MIDX) Design Notes
2+
====================================
3+
4+
The Git object directory contains a 'pack' directory containing
5+
packfiles (with suffix ".pack") and pack-indexes (with suffix
6+
".idx"). The pack-indexes provide a way to lookup objects and
7+
navigate to their offset within the pack, but these must come
8+
in pairs with the packfiles. This pairing depends on the file
9+
names, as the pack-index differs only in suffix with its pack-
10+
file. While the pack-indexes provide fast lookup per packfile,
11+
this performance degrades as the number of packfiles increases,
12+
because abbreviations need to inspect every packfile and we are
13+
more likely to have a miss on our most-recently-used packfile.
14+
For some large repositories, repacking into a single packfile
15+
is not feasible due to storage space or excessive repack times.
16+
17+
The multi-pack-index (MIDX for short) stores a list of objects
18+
and their offsets into multiple packfiles. It contains:
19+
20+
- A list of packfile names.
21+
- A sorted list of object IDs.
22+
- A list of metadata for the ith object ID including:
23+
- A value j referring to the jth packfile.
24+
- An offset within the jth packfile for the object.
25+
- If large offsets are required, we use another list of large
26+
offsets similar to version 2 pack-indexes.
27+
28+
Thus, we can provide O(log N) lookup time for any number
29+
of packfiles.
30+
31+
Design Details
32+
--------------
33+
34+
- The MIDX is stored in a file named 'multi-pack-index' in the
35+
.git/objects/pack directory. This could be stored in the pack
36+
directory of an alternate. It refers only to packfiles in that
37+
same directory.
38+
39+
- The pack.multiIndex config setting must be on to consume MIDX files.
40+
41+
- The file format includes parameters for the object ID hash
42+
function, so a future change of hash algorithm does not require
43+
a change in format.
44+
45+
- The MIDX keeps only one record per object ID. If an object appears
46+
in multiple packfiles, then the MIDX selects the copy in the most-
47+
recently modified packfile.
48+
49+
- If there exist packfiles in the pack directory not registered in
50+
the MIDX, then those packfiles are loaded into the `packed_git`
51+
list and `packed_git_mru` cache.
52+
53+
- The pack-indexes (.idx files) remain in the pack directory so we
54+
can delete the MIDX file, set core.midx to false, or downgrade
55+
without any loss of information.
56+
57+
- The MIDX file format uses a chunk-based approach (similar to the
58+
commit-graph file) that allows optional data to be added.
59+
60+
Future Work
61+
-----------
62+
63+
- Add a 'verify' subcommand to the 'git midx' builtin to verify the
64+
contents of the multi-pack-index file match the offsets listed in
65+
the corresponding pack-indexes.
66+
67+
- The multi-pack-index allows many packfiles, especially in a context
68+
where repacking is expensive (such as a very large repo), or
69+
unexpected maintenance time is unacceptable (such as a high-demand
70+
build machine). However, the multi-pack-index needs to be rewritten
71+
in full every time. We can extend the format to be incremental, so
72+
writes are fast. By storing a small "tip" multi-pack-index that
73+
points to large "base" MIDX files, we can keep writes fast while
74+
still reducing the number of binary searches required for object
75+
lookups.
76+
77+
- The reachability bitmap is currently paired directly with a single
78+
packfile, using the pack-order as the object order to hopefully
79+
compress the bitmaps well using run-length encoding. This could be
80+
extended to pair a reachability bitmap with a multi-pack-index. If
81+
the multi-pack-index is extended to store a "stable object order"
82+
(a function Order(hash) = integer that is constant for a given hash,
83+
even as the multi-pack-index is updated) then a reachability bitmap
84+
could point to a multi-pack-index and be updated independently.
85+
86+
- Packfiles can be marked as "special" using empty files that share
87+
the initial name but replace ".pack" with ".keep" or ".promisor".
88+
We can add an optional chunk of data to the multi-pack-index that
89+
records flags of information about the packfiles. This allows new
90+
states, such as 'repacked' or 'redeltified', that can help with
91+
pack maintenance in a multi-pack environment. It may also be
92+
helpful to organize packfiles by object type (commit, tree, blob,
93+
etc.) and use this metadata to help that maintenance.
94+
95+
- The partial clone feature records special "promisor" packs that
96+
may point to objects that are not stored locally, but available
97+
on request to a server. The multi-pack-index does not currently
98+
track these promisor packs.
99+
100+
Related Links
101+
-------------
102+
[0] https://bugs.chromium.org/p/git/issues/detail?id=6
103+
Chromium work item for: Multi-Pack Index (MIDX)
104+
105+
[1] https://public-inbox.org/git/[email protected]/
106+
An earlier RFC for the multi-pack-index feature
107+
108+
[2] https://public-inbox.org/git/alpine.DEB.2.20.1803091557510.23109@alexmv-linux/
109+
Git Merge 2018 Contributor's summit notes (includes discussion of MIDX)

Documentation/technical/pack-format.txt

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -252,3 +252,80 @@ Pack file entry: <+
252252
corresponding packfile.
253253

254254
20-byte SHA-1-checksum of all of the above.
255+
256+
== multi-pack-index (MIDX) files have the following format:
257+
258+
The multi-pack-index files refer to multiple pack-files and loose objects.
259+
260+
In order to allow extensions that add extra data to the MIDX, we organize
261+
the body into "chunks" and provide a lookup table at the beginning of the
262+
body. The header includes certain length values, such as the number of packs,
263+
the number of base MIDX files, hash lengths and types.
264+
265+
All 4-byte numbers are in network order.
266+
267+
HEADER:
268+
269+
4-byte signature:
270+
The signature is: {'M', 'I', 'D', 'X'}
271+
272+
1-byte version number:
273+
Git only writes or recognizes version 1.
274+
275+
1-byte Object Id Version
276+
Git only writes or recognizes version 1 (SHA1).
277+
278+
1-byte number of "chunks"
279+
280+
1-byte number of base multi-pack-index files:
281+
This value is currently always zero.
282+
283+
4-byte number of pack files
284+
285+
CHUNK LOOKUP:
286+
287+
(C + 1) * 12 bytes providing the chunk offsets:
288+
First 4 bytes describe chunk id. Value 0 is a terminating label.
289+
Other 8 bytes provide offset in current file for chunk to start.
290+
(Chunks are provided in file-order, so you can infer the length
291+
using the next chunk position if necessary.)
292+
293+
The remaining data in the body is described one chunk at a time, and
294+
these chunks may be given in any order. Chunks are required unless
295+
otherwise specified.
296+
297+
CHUNK DATA:
298+
299+
Packfile Names (ID: {'P', 'N', 'A', 'M'})
300+
Stores the packfile names as concatenated, null-terminated strings.
301+
Packfiles must be listed in lexicographic order for fast lookups by
302+
name. This is the only chunk not guaranteed to be a multiple of four
303+
bytes in length, so should be the last chunk for alignment reasons.
304+
305+
OID Fanout (ID: {'O', 'I', 'D', 'F'})
306+
The ith entry, F[i], stores the number of OIDs with first
307+
byte at most i. Thus F[255] stores the total
308+
number of objects.
309+
310+
OID Lookup (ID: {'O', 'I', 'D', 'L'})
311+
The OIDs for all objects in the MIDX are stored in lexicographic
312+
order in this chunk.
313+
314+
Object Offsets (ID: {'O', 'O', 'F', 'F'})
315+
Stores two 4-byte values for every object.
316+
1: The pack-int-id for the pack storing this object.
317+
2: The offset within the pack.
318+
If all offsets are less than 2^31, then the large offset chunk
319+
will not exist and offsets are stored as in IDX v1.
320+
If there is at least one offset value larger than 2^32-1, then
321+
the large offset chunk must exist. If the large offset chunk
322+
exists and the 31st bit is on, then removing that bit reveals
323+
the row in the large offsets containing the 8-byte offset of
324+
this object.
325+
326+
[Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'})
327+
8-byte offsets into large packfiles.
328+
329+
TRAILER:
330+
331+
20-byte SHA1-checksum of the above contents.

Makefile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -726,6 +726,7 @@ TEST_BUILTINS_OBJS += test-online-cpus.o
726726
TEST_BUILTINS_OBJS += test-path-utils.o
727727
TEST_BUILTINS_OBJS += test-prio-queue.o
728728
TEST_BUILTINS_OBJS += test-read-cache.o
729+
TEST_BUILTINS_OBJS += test-read-midx.o
729730
TEST_BUILTINS_OBJS += test-ref-store.o
730731
TEST_BUILTINS_OBJS += test-regex.o
731732
TEST_BUILTINS_OBJS += test-repository.o
@@ -905,6 +906,7 @@ LIB_OBJS += merge.o
905906
LIB_OBJS += merge-blobs.o
906907
LIB_OBJS += merge-recursive.o
907908
LIB_OBJS += mergesort.o
909+
LIB_OBJS += midx.o
908910
LIB_OBJS += name-hash.o
909911
LIB_OBJS += negotiator/default.o
910912
LIB_OBJS += negotiator/skipping.o
@@ -1078,6 +1080,7 @@ BUILTIN_OBJS += builtin/merge-recursive.o
10781080
BUILTIN_OBJS += builtin/merge-tree.o
10791081
BUILTIN_OBJS += builtin/mktag.o
10801082
BUILTIN_OBJS += builtin/mktree.o
1083+
BUILTIN_OBJS += builtin/multi-pack-index.o
10811084
BUILTIN_OBJS += builtin/mv.o
10821085
BUILTIN_OBJS += builtin/name-rev.o
10831086
BUILTIN_OBJS += builtin/notes.o

builtin.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,7 @@ extern int cmd_merge_recursive(int argc, const char **argv, const char *prefix);
191191
extern int cmd_merge_tree(int argc, const char **argv, const char *prefix);
192192
extern int cmd_mktag(int argc, const char **argv, const char *prefix);
193193
extern int cmd_mktree(int argc, const char **argv, const char *prefix);
194+
extern int cmd_multi_pack_index(int argc, const char **argv, const char *prefix);
194195
extern int cmd_mv(int argc, const char **argv, const char *prefix);
195196
extern int cmd_name_rev(int argc, const char **argv, const char *prefix);
196197
extern int cmd_notes(int argc, const char **argv, const char *prefix);

builtin/count-objects.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
123123
struct strbuf pack_buf = STRBUF_INIT;
124124
struct strbuf garbage_buf = STRBUF_INIT;
125125

126-
for (p = get_packed_git(the_repository); p; p = p->next) {
126+
for (p = get_all_packs(the_repository); p; p = p->next) {
127127
if (!p->pack_local)
128128
continue;
129129
if (open_pack_index(p))

builtin/fsck.c

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ static int name_objects;
4949
#define ERROR_PACK 04
5050
#define ERROR_REFS 010
5151
#define ERROR_COMMIT_GRAPH 020
52+
#define ERROR_MULTI_PACK_INDEX 040
5253

5354
static const char *describe_object(struct object *obj)
5455
{
@@ -740,7 +741,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
740741
struct progress *progress = NULL;
741742

742743
if (show_progress) {
743-
for (p = get_packed_git(the_repository); p;
744+
for (p = get_all_packs(the_repository); p;
744745
p = p->next) {
745746
if (open_pack_index(p))
746747
continue;
@@ -749,7 +750,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
749750

750751
progress = start_progress(_("Checking objects"), total);
751752
}
752-
for (p = get_packed_git(the_repository); p;
753+
for (p = get_all_packs(the_repository); p;
753754
p = p->next) {
754755
/* verify gives error messages itself */
755756
if (verify_pack(p, fsck_obj_buffer,
@@ -848,5 +849,23 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
848849
}
849850
}
850851

852+
if (!git_config_get_bool("core.multipackindex", &i) && i) {
853+
struct child_process midx_verify = CHILD_PROCESS_INIT;
854+
const char *midx_argv[] = { "multi-pack-index", "verify", NULL, NULL, NULL };
855+
856+
midx_verify.argv = midx_argv;
857+
midx_verify.git_cmd = 1;
858+
if (run_command(&midx_verify))
859+
errors_found |= ERROR_MULTI_PACK_INDEX;
860+
861+
prepare_alt_odb(the_repository);
862+
for (alt = the_repository->objects->alt_odb_list; alt; alt = alt->next) {
863+
midx_argv[2] = "--object-dir";
864+
midx_argv[3] = alt->path;
865+
if (run_command(&midx_verify))
866+
errors_found |= ERROR_MULTI_PACK_INDEX;
867+
}
868+
}
869+
851870
return errors_found;
852871
}

builtin/gc.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,7 @@ static struct packed_git *find_base_packs(struct string_list *packs,
183183
{
184184
struct packed_git *p, *base = NULL;
185185

186-
for (p = get_packed_git(the_repository); p; p = p->next) {
186+
for (p = get_all_packs(the_repository); p; p = p->next) {
187187
if (!p->pack_local)
188188
continue;
189189
if (limit) {
@@ -208,7 +208,7 @@ static int too_many_packs(void)
208208
if (gc_auto_pack_limit <= 0)
209209
return 0;
210210

211-
for (cnt = 0, p = get_packed_git(the_repository); p; p = p->next) {
211+
for (cnt = 0, p = get_all_packs(the_repository); p; p = p->next) {
212212
if (!p->pack_local)
213213
continue;
214214
if (p->pack_keep)

0 commit comments

Comments
 (0)