Skip to content

Commit 129e261

Browse files
committed
Merge branch 'jt/fetch-cdn-offload' into pu
WIP for allowing a response to "git fetch" to instruct the bulk of the pack contents to be instead taken from elsewhere (aka CDN). * jt/fetch-cdn-offload: SQUASH??? upload-pack: send part of packfile response as uri fetch-pack: support more than one pack lockfile upload-pack: refactor reading of pack-objects out Documentation: add Packfile URIs design doc Documentation: order protocol v2 sections http-fetch: support fetching packfiles by URL http: improve documentation of http_pack_request http: use --stdin when getting dumb HTTP pack
2 parents 279bb9f + 1eba6eb commit 129e261

17 files changed

+669
-132
lines changed

Documentation/git-http-fetch.txt

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ git-http-fetch - Download from a remote Git repository via HTTP
99
SYNOPSIS
1010
--------
1111
[verse]
12-
'git http-fetch' [-c] [-t] [-a] [-d] [-v] [-w filename] [--recover] [--stdin] <commit> <url>
12+
'git http-fetch' [-c] [-t] [-a] [-d] [-v] [-w filename] [--recover] [--stdin | --packfile | <commit>] <url>
1313

1414
DESCRIPTION
1515
-----------
@@ -40,6 +40,12 @@ commit-id::
4040

4141
<commit-id>['\t'<filename-as-in--w>]
4242

43+
--packfile::
44+
Instead of a commit id on the command line (which is not expected in
45+
this case), 'git http-fetch' fetches the packfile directly at the given
46+
URL and uses index-pack to generate corresponding .idx and .keep files.
47+
The output of index-pack is printed to stdout.
48+
4349
--recover::
4450
Verify that everything reachable from target is fetched. Used after
4551
an earlier fetch is interrupted.
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
Packfile URIs
2+
=============
3+
4+
This feature allows servers to serve part of their packfile response as URIs.
5+
This allows server designs that improve scalability in bandwidth and CPU usage
6+
(for example, by serving some data through a CDN), and (in the future) provides
7+
some measure of resumability to clients.
8+
9+
This feature is available only in protocol version 2.
10+
11+
Protocol
12+
--------
13+
14+
The server advertises `packfile-uris`.
15+
16+
If the client then communicates which protocols (HTTPS, etc.) it supports with
17+
a `packfile-uris` argument, the server MAY send a `packfile-uris` section
18+
directly before the `packfile` section (right after `wanted-refs` if it is
19+
sent) containing URIs of any of the given protocols. The URIs point to
20+
packfiles that use only features that the client has declared that it supports
21+
(e.g. ofs-delta and thin-pack). See protocol-v2.txt for the documentation of
22+
this section.
23+
24+
Clients then should understand that the returned packfile could be incomplete,
25+
and that it needs to download all the given URIs before the fetch or clone is
26+
complete.
27+
28+
Server design
29+
-------------
30+
31+
The server can be trivially made compatible with the proposed protocol by
32+
having it advertise `packfile-uris`, tolerating the client sending
33+
`packfile-uris`, and never sending any `packfile-uris` section. But we should
34+
include some sort of non-trivial implementation in the Minimum Viable Product,
35+
at least so that we can test the client.
36+
37+
This is the implementation: a feature, marked experimental, that allows the
38+
server to be configured by one or more `uploadpack.blobPackfileUri=<sha1>
39+
<uri>` entries. Whenever the list of objects to be sent is assembled, a blob
40+
with the given sha1 can be replaced by the given URI. This allows, for example,
41+
servers to delegate serving of large blobs to CDNs.
42+
43+
Client design
44+
-------------
45+
46+
While fetching, the client needs to remember the list of URIs and cannot
47+
declare that the fetch is complete until all URIs have been downloaded as
48+
packfiles.
49+
50+
The division of work (initial fetch + additional URIs) introduces convenient
51+
points for resumption of an interrupted clone - such resumption can be done
52+
after the Minimum Viable Product (see "Future work").
53+
54+
The client can inhibit this feature (i.e. refrain from sending the
55+
`packfile-uris` parameter) by passing --no-packfile-uris to `git fetch`.
56+
57+
Future work
58+
-----------
59+
60+
The protocol design allows some evolution of the server and client without any
61+
need for protocol changes, so only a small-scoped design is included here to
62+
form the MVP. For example, the following can be done:
63+
64+
* On the server, a long-running process that takes in entire requests and
65+
outputs a list of URIs and the corresponding inclusion and exclusion sets of
66+
objects. This allows, e.g., signed URIs to be used and packfiles for common
67+
requests to be cached.
68+
* On the client, resumption of clone. If a clone is interrupted, information
69+
could be recorded in the repository's config and a "clone-resume" command
70+
can resume the clone in progress. (Resumption of subsequent fetches is more
71+
difficult because that must deal with the user wanting to use the repository
72+
even after the fetch was interrupted.)
73+
74+
There are some possible features that will require a change in protocol:
75+
76+
* Additional HTTP headers (e.g. authentication)
77+
* Byte range support
78+
* Different file formats referenced by URIs (e.g. raw object)

Documentation/technical/protocol-v2.txt

Lines changed: 34 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -323,13 +323,26 @@ included in the client's request:
323323
indicating its sideband (1, 2, or 3), and the server may send "0005\2"
324324
(a PKT-LINE of sideband 2 with no payload) as a keepalive packet.
325325

326+
If the 'packfile-uris' feature is advertised, the following argument
327+
can be included in the client's request as well as the potential
328+
addition of the 'packfile-uris' section in the server's response as
329+
explained below.
330+
331+
packfile-uris <comma-separated list of protocols>
332+
Indicates to the server that the client is willing to receive
333+
URIs of any of the given protocols in place of objects in the
334+
sent packfile. Before performing the connectivity check, the
335+
client should download from all given URIs. Currently, the
336+
protocols supported are "http" and "https".
337+
326338
The response of `fetch` is broken into a number of sections separated by
327339
delimiter packets (0001), with each section beginning with its section
328-
header.
340+
header. Most sections are sent only when the packfile is sent.
329341

330-
output = *section
331-
section = (acknowledgments | shallow-info | wanted-refs | packfile)
332-
(flush-pkt | delim-pkt)
342+
output = acknowledgements flush-pkt |
343+
[acknowledgments delim-pkt] [shallow-info delim-pkt]
344+
[wanted-refs delim-pkt] [packfile-uris delim-pkt]
345+
packfile flush-pkt
333346

334347
acknowledgments = PKT-LINE("acknowledgments" LF)
335348
(nak | *ack)
@@ -347,13 +360,17 @@ header.
347360
*PKT-LINE(wanted-ref LF)
348361
wanted-ref = obj-id SP refname
349362

363+
packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri
364+
packfile-uri = PKT-LINE(40*(HEXDIGIT) SP *%x20-ff LF)
365+
350366
packfile = PKT-LINE("packfile" LF)
351367
*PKT-LINE(%x01-03 *%x00-ff)
352368

353369
acknowledgments section
354-
* If the client determines that it is finished with negotiations
355-
by sending a "done" line, the acknowledgments sections MUST be
356-
omitted from the server's response.
370+
* If the client determines that it is finished with negotiations by
371+
sending a "done" line (thus requiring the server to send a packfile),
372+
the acknowledgments sections MUST be omitted from the server's
373+
response.
357374

358375
* Always begins with the section header "acknowledgments"
359376

@@ -404,9 +421,6 @@ header.
404421
which the client has not indicated was shallow as a part of
405422
its request.
406423

407-
* This section is only included if a packfile section is also
408-
included in the response.
409-
410424
wanted-refs section
411425
* This section is only included if the client has requested a
412426
ref using a 'want-ref' line and if a packfile section is also
@@ -420,6 +434,16 @@ header.
420434
* The server MUST NOT send any refs which were not requested
421435
using 'want-ref' lines.
422436

437+
packfile-uris section
438+
* This section is only included if the client sent
439+
'packfile-uris' and the server has at least one such URI to
440+
send.
441+
442+
* Always begins with the section header "packfile-uris".
443+
444+
* For each URI the server sends, it sends a hash of the pack's
445+
contents (as output by git index-pack) followed by the URI.
446+
423447
packfile section
424448
* This section is only included if the client has sent 'want'
425449
lines in its request and either requested that no more

builtin/fetch-pack.c

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,8 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
4848
struct ref **sought = NULL;
4949
int nr_sought = 0, alloc_sought = 0;
5050
int fd[2];
51-
char *pack_lockfile = NULL;
52-
char **pack_lockfile_ptr = NULL;
51+
struct string_list pack_lockfiles = STRING_LIST_INIT_DUP;
52+
struct string_list *pack_lockfiles_ptr = NULL;
5353
struct child_process *conn;
5454
struct fetch_pack_args args;
5555
struct oid_array shallow = OID_ARRAY_INIT;
@@ -138,7 +138,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
138138
}
139139
if (!strcmp("--lock-pack", arg)) {
140140
args.lock_pack = 1;
141-
pack_lockfile_ptr = &pack_lockfile;
141+
pack_lockfiles_ptr = &pack_lockfiles;
142142
continue;
143143
}
144144
if (!strcmp("--check-self-contained-and-connected", arg)) {
@@ -239,10 +239,15 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
239239
}
240240

241241
ref = fetch_pack(&args, fd, ref, sought, nr_sought,
242-
&shallow, pack_lockfile_ptr, version);
243-
if (pack_lockfile) {
244-
printf("lock %s\n", pack_lockfile);
242+
&shallow, pack_lockfiles_ptr, version);
243+
if (pack_lockfiles.nr) {
244+
int i;
245+
246+
printf("lock %s\n", pack_lockfiles.items[0].string);
245247
fflush(stdout);
248+
for (i = 1; i < pack_lockfiles.nr; i++)
249+
warning(_("Lockfile created but not reported: %s"),
250+
pack_lockfiles.items[i].string);
246251
}
247252
if (args.check_self_contained_and_connected &&
248253
args.self_contained_and_connected) {

builtin/pack-objects.c

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,8 @@ static unsigned long window_memory_limit = 0;
115115

116116
static struct list_objects_filter_options filter_options;
117117

118+
static struct string_list uri_protocols = STRING_LIST_INIT_NODUP;
119+
118120
enum missing_action {
119121
MA_ERROR = 0, /* fail if any missing objects are encountered */
120122
MA_ALLOW_ANY, /* silently allow ALL missing objects */
@@ -123,6 +125,15 @@ enum missing_action {
123125
static enum missing_action arg_missing_action;
124126
static show_object_fn fn_show_object;
125127

128+
struct configured_exclusion {
129+
struct oidmap_entry e;
130+
char *pack_hash_hex;
131+
char *uri;
132+
};
133+
static struct oidmap configured_exclusions;
134+
135+
static struct oidset excluded_by_config;
136+
126137
/*
127138
* stats
128139
*/
@@ -837,6 +848,25 @@ static off_t write_reused_pack(struct hashfile *f)
837848
return reuse_packfile_offset - sizeof(struct pack_header);
838849
}
839850

851+
static void write_excluded_by_configs(void)
852+
{
853+
struct oidset_iter iter;
854+
const struct object_id *oid;
855+
856+
oidset_iter_init(&excluded_by_config, &iter);
857+
while ((oid = oidset_iter_next(&iter))) {
858+
struct configured_exclusion *ex =
859+
oidmap_get(&configured_exclusions, oid);
860+
861+
if (!ex)
862+
BUG("configured exclusion wasn't configured");
863+
write_in_full(1, ex->pack_hash_hex, strlen(ex->pack_hash_hex));
864+
write_in_full(1, " ", 1);
865+
write_in_full(1, ex->uri, strlen(ex->uri));
866+
write_in_full(1, "\n", 1);
867+
}
868+
}
869+
840870
static const char no_split_warning[] = N_(
841871
"disabling bitmap writing, packs are split due to pack.packSizeLimit"
842872
);
@@ -1133,6 +1163,25 @@ static int want_object_in_pack(const struct object_id *oid,
11331163
}
11341164
}
11351165

1166+
if (uri_protocols.nr) {
1167+
struct configured_exclusion *ex =
1168+
oidmap_get(&configured_exclusions, oid);
1169+
int i;
1170+
const char *p;
1171+
1172+
if (ex) {
1173+
for (i = 0; i < uri_protocols.nr; i++) {
1174+
if (skip_prefix(ex->uri,
1175+
uri_protocols.items[i].string,
1176+
&p) &&
1177+
*p == ':') {
1178+
oidset_insert(&excluded_by_config, oid);
1179+
return 0;
1180+
}
1181+
}
1182+
}
1183+
}
1184+
11361185
return 1;
11371186
}
11381187

@@ -2734,6 +2783,29 @@ static int git_pack_config(const char *k, const char *v, void *cb)
27342783
pack_idx_opts.version);
27352784
return 0;
27362785
}
2786+
if (!strcmp(k, "uploadpack.blobpackfileuri")) {
2787+
struct configured_exclusion *ex = xmalloc(sizeof(*ex));
2788+
const char *oid_end, *pack_end;
2789+
/*
2790+
* Stores the pack hash. This is not a true object ID, but is
2791+
* of the same form.
2792+
*/
2793+
struct object_id pack_hash;
2794+
2795+
if (parse_oid_hex(v, &ex->e.oid, &oid_end) ||
2796+
*oid_end != ' ' ||
2797+
parse_oid_hex(oid_end + 1, &pack_hash, &pack_end) ||
2798+
*pack_end != ' ')
2799+
die(_("value of uploadpack.blobpackfileuri must be "
2800+
"of the form '<object-hash> <pack-hash> <uri>' (got '%s')"), v);
2801+
if (oidmap_get(&configured_exclusions, &ex->e.oid))
2802+
die(_("object already configured in another "
2803+
"uploadpack.blobpackfileuri (got '%s')"), v);
2804+
ex->pack_hash_hex = xcalloc(1, pack_end - oid_end);
2805+
memcpy(ex->pack_hash_hex, oid_end + 1, pack_end - oid_end - 1);
2806+
ex->uri = xstrdup(pack_end + 1);
2807+
oidmap_put(&configured_exclusions, ex);
2808+
}
27372809
return git_default_config(k, v, cb);
27382810
}
27392811

@@ -3331,6 +3403,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
33313403
N_("do not pack objects in promisor packfiles")),
33323404
OPT_BOOL(0, "delta-islands", &use_delta_islands,
33333405
N_("respect islands during delta compression")),
3406+
OPT_STRING_LIST(0, "uri-protocol", &uri_protocols,
3407+
N_("protocol"),
3408+
N_("exclude any configured uploadpack.blobpackfileuri with this protocol")),
33343409
OPT_END(),
33353410
};
33363411

@@ -3519,6 +3594,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
35193594
the_repository);
35203595
}
35213596

3597+
write_excluded_by_configs();
35223598
trace2_region_enter("pack-objects", "write-pack-file", the_repository);
35233599
write_pack_file();
35243600
trace2_region_leave("pack-objects", "write-pack-file", the_repository);

connected.c

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,10 +42,12 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
4242

4343
if (transport && transport->smart_options &&
4444
transport->smart_options->self_contained_and_connected &&
45-
transport->pack_lockfile &&
46-
strip_suffix(transport->pack_lockfile, ".keep", &base_len)) {
45+
transport->pack_lockfiles.nr == 1 &&
46+
strip_suffix(transport->pack_lockfiles.items[0].string,
47+
".keep", &base_len)) {
4748
struct strbuf idx_file = STRBUF_INIT;
48-
strbuf_add(&idx_file, transport->pack_lockfile, base_len);
49+
strbuf_add(&idx_file, transport->pack_lockfiles.items[0].string,
50+
base_len);
4951
strbuf_addstr(&idx_file, ".idx");
5052
new_pack = add_packed_git(idx_file.buf, idx_file.len, 1);
5153
strbuf_release(&idx_file);

0 commit comments

Comments
 (0)