Skip to content

Commit 87a4fa3

Browse files
derrickstoleedscho
authored andcommitted
push: don't reuse deltas with path walk
The --path-walk option in `git pack-objects` is implied by the pack.usePathWalk=true config value. This is intended to help the packfile generation within `git push` specifically. While this config does enable the path-walk feature, it does not lead to the expected levels of compression in the cases it was designed to handle. This is due to the default implication of the --reuse-delta option as well as auto-GC. In the performance tests used to evaluate the --path-walk option, such as those in p5313, the --no-reuse-delta option is used to ensure that deltas are recomputed according to the new object walk. However, it was assumed (I assumed this) that when the objects were loose from client-side operations that better deltas would be computed during this operation. This wasn't confirmed because the test process used data that was fetched from real repositories and thus existed in packed form only. I was able to confirm that this does not reproduce when the objects to push are loose. Careful use of making the pushed commit unreachable and loosening the objects via `git repack -Ad` helps to confirm my suspicions here. Independent of this change, I'm pushing for these pipeline agents to set `gc.auto=0` before creating their Git objects. In the current setup, the repo is adding objects and then incrementally repacking them and ending up with bad cross-path deltas. This approach can help scenarios where that makes sense, but will not cover all of our users without them choosing to opt-in to background maintenance (and even then, an incremental repack could cost them efficiency). In order to make sure we are getting the intended compression in `git push`, this change enforces the spawned `git pack-objects` process to use `--no-reuse-delta`. As far as I can tell, the main motivation for implying the --reuse-delta option by default is two-fold: 1. The code in send-pack.c that executes 'git pack-objects' is ignorant of whether the current process is a client pushing to a remote or a remote sending a fetch or clone to a client. 2. For servers, it is critical that they trust the previously computed deltas whenever possible, or they could overload their CPU resources. There's also the side that most servers use repacking logic that will replace any bad deltas that are sent by clients (or at least, that's the hope; we've seen that repacks can also pick bad deltas). This commit also adds a test case that demonstrates that `git -c pack.usePathWalk=true push` now avoids reusing deltas. To do this, the test case constructs a pack with a horrendously inefficient delta object, then verifies that the pack on the receiving side of the `push` fails to have such an inefficient delta. The test case would probably be a lot more readable if hex numbers were used instead of octal numbers, but alas, `printf "\x<hex>"` is not portable, only `printf "\<octal>"` is. For example, dash's built-in `printf` function simply prints `\x` verbatim while bash's built-in happily converts this construct to the corresponding byte. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
1 parent 2c95417 commit 87a4fa3

File tree

7 files changed

+119
-0
lines changed

7 files changed

+119
-0
lines changed

builtin/push.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -619,6 +619,10 @@ int cmd_push(int argc,
619619
else if (recurse_submodules == RECURSE_SUBMODULES_ONLY)
620620
flags |= TRANSPORT_RECURSE_SUBMODULES_ONLY;
621621

622+
prepare_repo_settings(the_repository);
623+
if (the_repository->settings.pack_use_path_walk)
624+
flags |= TRANSPORT_PUSH_NO_REUSE_DELTA;
625+
622626
if (tags)
623627
refspec_append(&rs, "refs/tags/*");
624628

send-pack.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,8 @@ static int pack_objects(int fd, struct ref *refs, struct oid_array *advertised,
9292
strvec_push(&po.args, "--shallow");
9393
if (args->disable_bitmaps)
9494
strvec_push(&po.args, "--no-use-bitmap-index");
95+
if (args->no_reuse_delta)
96+
strvec_push(&po.args, "--no-reuse-delta");
9597
po.in = -1;
9698
po.out = args->stateless_rpc ? -1 : fd;
9799
po.git_cmd = 1;

send-pack.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ struct send_pack_args {
2222
force_update:1,
2323
use_thin_pack:1,
2424
use_ofs_delta:1,
25+
no_reuse_delta:1,
2526
dry_run:1,
2627
/* One of the SEND_PACK_PUSH_CERT_* constants. */
2728
push_cert:2,

t/meson.build

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -710,6 +710,7 @@ integration_tests = [
710710
't5582-fetch-negative-refspec.sh',
711711
't5583-push-branches.sh',
712712
't5584-vfs.sh',
713+
't5590-push-path-walk.sh',
713714
't5600-clone-fail-cleanup.sh',
714715
't5601-clone.sh',
715716
't5602-clone-remote-exec.sh',

t/t5590-push-path-walk.sh

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
#!/bin/sh
2+
3+
test_description='verify that push respects `pack.usePathWalk`'
4+
5+
TEST_PASSES_SANITIZE_LEAK=true
6+
. ./test-lib.sh
7+
. "$TEST_DIRECTORY"/lib-pack.sh
8+
9+
test_expect_success 'setup bare repository and clone' '
10+
git init --bare -b main bare.git &&
11+
git --git-dir=bare.git config receive.unpackLimit 0 &&
12+
git --git-dir bare.git commit-tree -m initial $EMPTY_TREE >head_oid &&
13+
git --git-dir bare.git update-ref refs/heads/main $(cat head_oid) &&
14+
git clone --bare bare.git clone.git
15+
'
16+
test_expect_success 'avoid reusing deltified objects' '
17+
# construct two commits, one containing a file with the hex digits
18+
# repeated 16 times, the next reducing that to 8 times. The crucial
19+
# part is that the blob of the second commit is deltified _really_
20+
# badly and it is therefore easy to detect if a `git push` reused that
21+
# delta.
22+
x="0123456789abcdef" &&
23+
printf "$x$x$x$x$x$x$x$x" >x128 &&
24+
printf "$x$x$x$x$x$x$x$x$x$x$x$x$x$x$x$x" >x256 &&
25+
26+
pack=clone.git/objects/pack/pack-tmp.pack &&
27+
pack_header 2 >$pack &&
28+
29+
# add x256 as a non-deltified object, using an uncompressed zlib stream
30+
# for simplicity
31+
# 060 = OBJ_BLOB << 4, 0200 = size larger than 15,
32+
# 0 = lower 4 bits of size, 020 = bits 5-9 of size (size = 256)
33+
printf "\260\020" >>$pack &&
34+
# Uncompressed zlib stream always starts with 0170 1 1, followed
35+
# by two bytes encoding the size, little endian, then two bytes with
36+
# the bitwise-complement of that size, then the payload, and then the
37+
# Adler32 checksum. For some reason, the checksum is in big-endian
38+
# format.
39+
printf "\170\001\001\0\001\377\376" >>$pack &&
40+
cat x256 >>$pack &&
41+
# Manually-computed Adler32 checksum: 0xd7ae4621
42+
printf "\327\256\106\041" >>$pack &&
43+
44+
# add x128 as a very badly deltified object
45+
# 0120 = OBJ_OFS_DELTA << 4, 0200 = total size larger than 15,
46+
# 4 = lower 4 bits of size, 030 = bits 5-9 of size
47+
# (size = 128 * 3 + 2 + 2)
48+
printf "\344\030" >>$pack &&
49+
# 0415 = size (i.e. the relative negative offset) of the previous
50+
# object (x256, used as base object)
51+
# encoded as 0200 | ((0415 >> 7) - 1), 0415 & 0177
52+
printf "\201\015" >>$pack &&
53+
# Uncompressed zlib stream, as before, size = 2 + 2 + 128 * 3 (i.e.
54+
# 0604)
55+
printf "\170\001\001\204\001\173\376" >>$pack &&
56+
# base object size = 0400 (encoded as 0200 | (0400 & 0177),
57+
# 0400 >> 7)
58+
printf "\200\002" >>$pack &&
59+
# object size = 0200 (encoded as 0200 | (0200 & 0177), 0200 >> 7
60+
printf "\200\001" >>$pack &&
61+
# massively badly-deltified object: copy every single byte individually
62+
# 0200 = copy, 1 = use 1 byte to encode the offset (counter),
63+
# 020 = use 1 byte to encode the size (1)
64+
printf "$(printf "\\\\221\\\\%03o\\\\001" $(test_seq 0 127))" >>$pack &&
65+
# Manually-computed Adler32 checksum: 0x99c369c4
66+
printf "\231\303\151\304" >>$pack &&
67+
68+
pack_trailer $pack &&
69+
git index-pack -v $pack &&
70+
71+
oid256=$(git hash-object x256) &&
72+
printf "100755 blob $oid256\thex\n" >tree &&
73+
tree_oid="$(git --git-dir=clone.git mktree <tree)" &&
74+
commit_oid=$(git --git-dir=clone.git commit-tree \
75+
-p $(git --git-dir=clone.git rev-parse main) \
76+
-m 256 $tree_oid) &&
77+
78+
oid128=$(git hash-object x128) &&
79+
printf "100755 blob $oid128\thex\n" >tree &&
80+
tree_oid="$(git --git-dir=clone.git mktree <tree)" &&
81+
commit_oid=$(git --git-dir=clone.git commit-tree \
82+
-p $commit_oid \
83+
-m 128 $tree_oid) &&
84+
85+
# Verify that the on-disk size of the delta object is suboptimal in the
86+
# clone (see below why 18 bytes or smaller is the optimal size):
87+
git index-pack --verify-stat clone.git/objects/pack/pack-*.pack >verify &&
88+
size="$(sed -n "s/^$oid128 blob *\([^ ]*\).*/\1/p" <verify)" &&
89+
test $size -gt 18 &&
90+
91+
git --git-dir=clone.git update-ref refs/heads/main $commit_oid &&
92+
git --git-dir=clone.git -c pack.usePathWalk=true push origin main &&
93+
git index-pack --verify-stat bare.git/objects/pack/pack-*.pack >verify &&
94+
size="$(sed -n "s/^$oid128 blob *\([^ ]*\).*/\1/p" <verify)" &&
95+
# The on-disk size of the delta object should be smaller than, or equal
96+
# to, 18 bytes, as that would be the size if storing the payload
97+
# uncompressed:
98+
# 3 bytes: 0170 01 01
99+
# + 2 bytes: zlib stream size
100+
# + 2 bytes: but-wise complement of the zlib stream size
101+
# + 7 bytes: payload
102+
# (= 2 bytes for the size of tbe base object
103+
# + 2 bytes for the size of the delta command
104+
# + 3 bytes for the copy command)
105+
# + 2 + 2 bytes: Adler32 checksum
106+
test $size -le 18
107+
'
108+
109+
test_done

transport.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -916,6 +916,7 @@ static int git_transport_push(struct transport *transport, struct ref *remote_re
916916
args.dry_run = !!(flags & TRANSPORT_PUSH_DRY_RUN);
917917
args.porcelain = !!(flags & TRANSPORT_PUSH_PORCELAIN);
918918
args.atomic = !!(flags & TRANSPORT_PUSH_ATOMIC);
919+
args.no_reuse_delta = !!(flags & TRANSPORT_PUSH_NO_REUSE_DELTA);
919920
args.push_options = transport->push_options;
920921
args.url = transport->url;
921922

transport.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,7 @@ struct transport {
158158
#define TRANSPORT_RECURSE_SUBMODULES_ONLY (1<<15)
159159
#define TRANSPORT_PUSH_FORCE_IF_INCLUDES (1<<16)
160160
#define TRANSPORT_PUSH_AUTO_UPSTREAM (1<<17)
161+
#define TRANSPORT_PUSH_NO_REUSE_DELTA (1<<18)
161162

162163
int transport_summary_width(const struct ref *refs);
163164

0 commit comments

Comments
 (0)