-
Notifications
You must be signed in to change notification settings - Fork 140
Szeder's commit-graph cleanups #650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Szeder's commit-graph cleanups #650
Conversation
Submodules should be handled the same as regular directories with respect to the presence of a trailing slash, i.e. commands like: git diff rev1 rev2 -- $path git rev-list HEAD -- $path should produce the same output whether $path is 'submod' or 'submod/'. This has been fixed in commit 74b4f7f (tree-walk.c: ignore trailing slash on submodule in tree_entry_interesting(), 2014-01-23). Unfortunately, that commit had the unintended side effect to handle 'submod/anything' the same as 'submod' and 'submod/' as well, e.g.: $ git log --oneline --name-only -- sha1collisiondetection/whatever 4125f78 sha1dc: update from upstream sha1collisiondetection 07a20f5 Makefile: fix unaligned loads in sha1dc with UBSan sha1collisiondetection 23e37f8 sha1dc: update from upstream sha1collisiondetection 86cfd61 sha1dc: optionally use sha1collisiondetection as a submodule sha1collisiondetection Fix this by rejecting submodules as partial pathnames when their trailing slash is followed by anything. Signed-off-by: SZEDER Gábor <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
d06e038
to
6e5f313
Compare
The commit-graph file format specifies that the chunks may be in any order. However, if the OID Lookup chunk happens to be the last one in the file, then any command attempting to access the commit-graph data will fail with: fatal: invalid commit position. commit-graph is likely corrupt In this case the error is wrong, the commit-graph file does conform to the specification, but the parsing of the Chunk Lookup table is a bit buggy, and leaves the field holding the number of commits in the commit-graph zero-initialized. The number of commits in the commit-graph is determined while parsing the Chunk Lookup table, by dividing the size of the OID Lookup chunk with the hash size. However, the Chunk Lookup table doesn't actually store the size of the chunks, but it stores their starting offset. Consequently, the size of a chunk can only be calculated by subtracting the starting offsets of that chunk from the offset of the subsequent chunk, or in case of the last chunk from the offset recorded in the terminating label. This is currenly implemented in a bit complicated way: as we iterate over the entries of the Chunk Lookup table, we check the ID of each chunk and store its starting offset, then we check the ID of the last seen chunk and calculate its size using its previously saved offset if necessary (at the moment it's only necessary for the OID Lookup chunk). Alas, while parsing the Chunk Lookup table we only interate through the "real" chunks, but never look at the terminating label, thus don't even check whether it's necessary to calulate the size of the last chunk. Consequently, if the OID Lookup chunk is the last one, then we don't calculate its size and turn don't run the piece of code determining the number of commits in the commit graph, leaving the field holding that number unchanged (i.e. zero-initialized), eventually triggering the sanity check in load_oid_from_graph(). Fix this by iterating through all entries in the Chunk Lookup table, including the terminating label. Note that this is the minimal fix, suitable for the maintenance track. A better fix would be to simplify how the chunk sizes are calculated, but that is a more invasive change, less suitable for 'maint', so that will be done in later patches. This additional flexibility of scanning more chunks breaks a test for "git commit-graph verify" so alter that test to mutate the commit-graph to have an even lower chunk count. Signed-off-by: SZEDER Gábor <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
…rder The commit-graph format specifies that "All 4-byte numbers are in network order", but the commit-graph contains 8-byte integers as well (file offsets in the Chunk Lookup table), and their byte order is unspecified. Clarify that all multi-byte integers are in network byte order. Signed-off-by: SZEDER Gábor <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
clear_##slabname() frees only the memory allocated for a commit slab itself, but entries in the commit slab might own additional memory outside the slab that should be freed as well. We already have (at least) one such commit slab, and this patch series is about to add one more. To free all additional memory owned by entries on the commit slab the user of such a slab could iterate over all commits it knows about, peek whether there is a valid entry associated with each commit, and free the additional memory, if any. Or it could rely on intimate knowledge about the internals of the commit slab implementation, and could itself iterate directly through all entries in the slab, and free the additional memory. Or it could just leak the additional memory... Introduce deep_clear_##slabname() to allow releasing memory owned by commit slab entries by invoking the 'void free_fn(elemtype *ptr)' function specified as parameter for each entry in the slab. Use it in get_shallow_commits() in 'shallow.c' to replace an open-coded iteration over a commit slab's entries. Signed-off-by: SZEDER Gábor <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
ll_diff_tree_oid() has only ever returned 0 [1], so it's return value is basically useless. It's only caller diff_tree_oid() has only ever returned the return value of ll_diff_tree_oid() as-is [2], so its return value is just as useless. Most of diff_tree_oid()'s callers simply ignore its return value, except: - diff_root_tree_oid() is a thin wrapper around diff_tree_oid() and returns with its return value, but all of diff_root_tree_oid()'s callers ignore its return value. - rev_compare_tree() and rev_same_tree_as_empty() do look at the return value in a condition, but, since the return value is always 0, the former's < 0 condition is never fulfilled, while the latter's >= 0 condition is always fulfilled. So let's drop the return value of ll_diff_tree_oid(), diff_tree_oid() and diff_root_tree_oid(), and drop those conditions from rev_compare_tree() and rev_same_tree_as_empty() as well. [1] ll_diff_tree_oid() and its ancestors have been returning only 0 ever since it was introduced as diff_tree() in 9174026 (Add "diff-tree" program to show which files have changed between two trees., 2005-04-09). [2] diff_tree_oid() traces back to diff-tree.c:main() in 9174026 as well. Signed-off-by: SZEDER Gábor <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
Our CodingGuidelines says that it's sufficient to include one of 'git-compat-util.h' and 'cache.h', but both 'commit-graph.c' and 'commit-graph.h' include both. Let's include only 'git-compat-util.h' to loose a bunch of unnecessary dependencies; but include 'hash.h', because 'commit-graph.h' does require the definition of 'struct object_id'. 'commit-graph.h' explicitly includes 'repository.h' and 'string-list.h', but only needs the declaration of a few structs from them. Drop these includes and forward-declare the necessary structs instead. 'commit-graph.c' includes 'dir.h', but doesn't actually use anything from there, so let's drop that #include as well. Signed-off-by: SZEDER Gábor <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
While we iterate over all entries of the Chunk Lookup table we make sure that we don't attempt to read past the end of the mmap-ed commit-graph file, and check in each iteration that the chunk ID and offset we are about to read is still within the mmap-ed memory region. However, these checks in each iteration are not really necessary, because the number of chunks in the commit-graph file is already known before this loop from the just parsed commit-graph header. So let's check that the commit-graph file is large enough for all entries in the Chunk Lookup table before we start iterating over those entries, and drop those per-iteration checks. While at it, take into account the size of everything that is necessary to have a valid commit-graph file, i.e. the size of the header, the size of the mandatory OID Fanout chunk, and the size of the signature in the trailer as well. Note that this necessitates the change of the error message as well, and, consequently, have to update the 'detect incorrect chunk count' test in 't5318-commit-graph.sh' as well. Signed-off-by: SZEDER Gábor <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
The Chunk Lookup table stores the chunks' starting offset in the commit-graph file, not their sizes. Consequently, the size of a chunk can only be calculated by subtracting its offset from the offset of the subsequent chunk (or that of the terminating label). This is currenly implemented in a bit complicated way: as we iterate over the entries of the Chunk Lookup table, we check the id of each chunk and store its starting offset, then we check the id of the last seen chunk and calculate its size using its previously saved offset. At the moment there is only one chunk for which we calculate its size, but this patch series will add more, and the repeated chunk id checks are not that pretty. Instead let's read ahead the offset of the next chunk on each iteration, so we can calculate the size of each chunk right away, right where we store its starting offset. Signed-off-by: SZEDER Gábor <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
In write_commit_graph_file() one block of code fills the array of chunk IDs, another block of code fills the array of chunk offsets, then the chunk IDs and offsets are written to the Chunk Lookup table, and finally a third block of code writes the actual chunks. In case of optional chunks like Extra Edge List and Base Graphs List there is also a condition checking whether that chunk is necessary/desired, and that same condition is repeated in all those three blocks of code. This patch series is about to add more optional chunks, so there would be even more repeated conditions. Those chunk offsets are relative to the beginning of the file, so they inherently depend on the size of the Chunk Lookup table, which in turn depends on the number of chunks that are to be written to the commit-graph file. IOW at the time we set the first chunk's ID we can't yet know its offset, because we don't yet know how many chunks there are. Simplify this by initially filling an array of chunk sizes, not offsets, and calculate the offsets based on the chunk sizes only later, while we are writing the Chunk Lookup table. This way we can fill the arrays of chunk IDs and sizes in one go, eliminating one set of repeated conditions. Signed-off-by: SZEDER Gábor <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
Unify the 'chunk_ids' and 'chunk_sizes' arrays into an array of 'struct chunk_info'. This will allow more cleanups in the following patches. Signed-off-by: SZEDER Gábor <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
6e5f313
to
5984fb0
Compare
/submit |
Submitted as [email protected] |
On the Git mailing list, Junio C Hamano wrote (reply to this):
|
This branch is now known as |
This patch series was integrated into pu via git@e6c557e. |
This patch series was integrated into pu via git@ac8bbc1. |
This patch series was integrated into pu via git@456223b. |
This patch series was integrated into pu via git@37b9fd1. |
This patch series was integrated into pu via git@451d9cf. |
This patch series was integrated into pu via git@bea46bd. |
This patch series was integrated into pu via git@cacad19. |
This patch series was integrated into pu via git@7ac860a. |
On the Git mailing list, Derrick Stolee wrote (reply to this):
|
This patch series was integrated into pu via git@7fd559e. |
@@ -32,6 +32,7 @@ struct slabname { \ | |||
void init_ ##slabname## _with_stride(struct slabname *s, unsigned stride); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, René Scharfe wrote (reply to this):
Am 05.06.20 um 15:00 schrieb SZEDER Gábor via GitGitGadget:
> From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= <[email protected]>
>
> clear_##slabname() frees only the memory allocated for a commit slab
> itself, but entries in the commit slab might own additional memory
> outside the slab that should be freed as well. We already have (at
> least) one such commit slab, and this patch series is about to add one
> more.
>
> To free all additional memory owned by entries on the commit slab the
> user of such a slab could iterate over all commits it knows about,
> peek whether there is a valid entry associated with each commit, and
> free the additional memory, if any. Or it could rely on intimate
> knowledge about the internals of the commit slab implementation, and
> could itself iterate directly through all entries in the slab, and
> free the additional memory. Or it could just leak the additional
> memory...
>
> Introduce deep_clear_##slabname() to allow releasing memory owned by
> commit slab entries by invoking the 'void free_fn(elemtype *ptr)'
> function specified as parameter for each entry in the slab.
Adding a new function instead of extending the existing ones makes
sense, as this is a rare requirement.
>
> Use it in get_shallow_commits() in 'shallow.c' to replace an
> open-coded iteration over a commit slab's entries.
>
> Signed-off-by: SZEDER Gábor <[email protected]>
> Signed-off-by: Derrick Stolee <[email protected]>
> ---
> commit-slab-decl.h | 1 +
> commit-slab-impl.h | 13 +++++++++++++
> commit-slab.h | 10 ++++++++++
> shallow.c | 14 +++++---------
> 4 files changed, 29 insertions(+), 9 deletions(-)
>
> diff --git a/commit-slab-decl.h b/commit-slab-decl.h
> index adc7b46c83b..286164b7e27 100644
> --- a/commit-slab-decl.h
> +++ b/commit-slab-decl.h
> @@ -32,6 +32,7 @@ struct slabname { \
> void init_ ##slabname## _with_stride(struct slabname *s, unsigned stride); \
> void init_ ##slabname(struct slabname *s); \
> void clear_ ##slabname(struct slabname *s); \
> +void deep_clear_ ##slabname(struct slabname *s, void (*free_fn)(elemtype *ptr)); \
> elemtype *slabname## _at_peek(struct slabname *s, const struct commit *c, int add_if_missing); \
> elemtype *slabname## _at(struct slabname *s, const struct commit *c); \
> elemtype *slabname## _peek(struct slabname *s, const struct commit *c)
> diff --git a/commit-slab-impl.h b/commit-slab-impl.h
> index 5c0eb91a5d1..557738df271 100644
> --- a/commit-slab-impl.h
> +++ b/commit-slab-impl.h
> @@ -38,6 +38,19 @@ scope void clear_ ##slabname(struct slabname *s) \
> FREE_AND_NULL(s->slab); \
> } \
> \
> +scope void deep_clear_ ##slabname(struct slabname *s, void (*free_fn)(elemtype *)) \
> +{ \
> + unsigned int i; \
> + for (i = 0; i < s->slab_count; i++) { \
> + unsigned int j; \
> + if (!s->slab[i]) \
> + continue; \
> + for (j = 0; j < s->slab_size; j++) \
> + free_fn(&s->slab[i][j * s->stride]); \
> + } \
> + clear_ ##slabname(s); > +} \
Why pass an elemtype pointer to the callback function instead of
a plain elemtype? Because it matches the return type of _at() and
_peek(). Consistency, good. Handing it a pointer allows the
callback to pass it on to free(), though, which would be bad,
since we do that in clear_() as well. Hmm.
> + \
> scope elemtype *slabname## _at_peek(struct slabname *s, \
> const struct commit *c, \
> int add_if_missing) \
> diff --git a/commit-slab.h b/commit-slab.h
> index 05b3f2804e7..8e72a305365 100644
> --- a/commit-slab.h
> +++ b/commit-slab.h
> @@ -47,6 +47,16 @@
> *
> * Call this function before the slab falls out of scope to avoid
> * leaking memory.
> + *
> + * - void deep_clear_indegree(struct indegree *, void (*free_fn)(int*))
> + *
> + * Empties the slab, similar to clear_indegree(), but in addition it
> + * calls the given 'free_fn' for each slab entry to release any
> + * additional memory that might be owned by the entry (but not the
> + * entry itself!).
> + * Note that 'free_fn' might be called even for entries for which no
> + * indegree_at() call has been made; in this case 'free_fn' is invoked
> + * with a pointer to a zero-initialized location.
> */
>
> #define define_commit_slab(slabname, elemtype) \
> diff --git a/shallow.c b/shallow.c
> index 7fd04afed19..c4ac8a73273 100644
> --- a/shallow.c
> +++ b/shallow.c
> @@ -84,6 +84,10 @@ int is_repository_shallow(struct repository *r)
> * supports a "valid" flag.
> */
> define_commit_slab(commit_depth, int *);
> +static void free_depth_in_slab(int **ptr)
> +{
> + FREE_AND_NULL(*ptr);
> +}
Why FREE_AND_NULL? The original loop below called free(). The slabs
are all released by deep_clear_() immediately after the callbacks are
done anyway, so what's the point in zeroing these pointers?
> struct commit_list *get_shallow_commits(struct object_array *heads, int depth,
> int shallow_flag, int not_shallow_flag)
> {
> @@ -150,15 +154,7 @@ struct commit_list *get_shallow_commits(struct object_array *heads, int depth,
> }
> }
> }
> - for (i = 0; i < depths.slab_count; i++) {
> - int j;
> -
> - if (!depths.slab[i])
> - continue;
> - for (j = 0; j < depths.slab_size; j++)
> - free(depths.slab[i][j]);
> - }
> - clear_commit_depth(&depths);
> + deep_clear_commit_depth(&depths, free_depth_in_slab);
What a relief!
>
> return result;
> }
>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 6/18/2020 4:59 PM, René Scharfe wrote:
> Am 05.06.20 um 15:00 schrieb SZEDER Gábor via GitGitGadget:
>> From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= <[email protected]>
>>
>> clear_##slabname() frees only the memory allocated for a commit slab
>> itself, but entries in the commit slab might own additional memory
>> outside the slab that should be freed as well. We already have (at
>> least) one such commit slab, and this patch series is about to add one
>> more.
>>
>> To free all additional memory owned by entries on the commit slab the
>> user of such a slab could iterate over all commits it knows about,
>> peek whether there is a valid entry associated with each commit, and
>> free the additional memory, if any. Or it could rely on intimate
>> knowledge about the internals of the commit slab implementation, and
>> could itself iterate directly through all entries in the slab, and
>> free the additional memory. Or it could just leak the additional
>> memory...
>>
>> Introduce deep_clear_##slabname() to allow releasing memory owned by
>> commit slab entries by invoking the 'void free_fn(elemtype *ptr)'
>> function specified as parameter for each entry in the slab.
>
> Adding a new function instead of extending the existing ones makes
> sense, as this is a rare requirement.
>
>>
>> Use it in get_shallow_commits() in 'shallow.c' to replace an
>> open-coded iteration over a commit slab's entries.
>>
>> Signed-off-by: SZEDER Gábor <[email protected]>
>> Signed-off-by: Derrick Stolee <[email protected]>
>> ---
>> commit-slab-decl.h | 1 +
>> commit-slab-impl.h | 13 +++++++++++++
>> commit-slab.h | 10 ++++++++++
>> shallow.c | 14 +++++---------
>> 4 files changed, 29 insertions(+), 9 deletions(-)
>>
>> diff --git a/commit-slab-decl.h b/commit-slab-decl.h
>> index adc7b46c83b..286164b7e27 100644
>> --- a/commit-slab-decl.h
>> +++ b/commit-slab-decl.h
>> @@ -32,6 +32,7 @@ struct slabname { \
>> void init_ ##slabname## _with_stride(struct slabname *s, unsigned stride); \
>> void init_ ##slabname(struct slabname *s); \
>> void clear_ ##slabname(struct slabname *s); \
>> +void deep_clear_ ##slabname(struct slabname *s, void (*free_fn)(elemtype *ptr)); \
>> elemtype *slabname## _at_peek(struct slabname *s, const struct commit *c, int add_if_missing); \
>> elemtype *slabname## _at(struct slabname *s, const struct commit *c); \
>> elemtype *slabname## _peek(struct slabname *s, const struct commit *c)
>> diff --git a/commit-slab-impl.h b/commit-slab-impl.h
>> index 5c0eb91a5d1..557738df271 100644
>> --- a/commit-slab-impl.h
>> +++ b/commit-slab-impl.h
>> @@ -38,6 +38,19 @@ scope void clear_ ##slabname(struct slabname *s) \
>> FREE_AND_NULL(s->slab); \
>> } \
>> \
>> +scope void deep_clear_ ##slabname(struct slabname *s, void (*free_fn)(elemtype *)) \
>> +{ \
>> + unsigned int i; \
>> + for (i = 0; i < s->slab_count; i++) { \
>> + unsigned int j; \
>> + if (!s->slab[i]) \
>> + continue; \
>> + for (j = 0; j < s->slab_size; j++) \
>> + free_fn(&s->slab[i][j * s->stride]); \
>> + } \
>> + clear_ ##slabname(s); > +} \
>
>
> Why pass an elemtype pointer to the callback function instead of
> a plain elemtype? Because it matches the return type of _at() and
> _peek(). Consistency, good. Handing it a pointer allows the
> callback to pass it on to free(), though, which would be bad,
> since we do that in clear_() as well. Hmm.
>
>> + \
>> scope elemtype *slabname## _at_peek(struct slabname *s, \
>> const struct commit *c, \
>> int add_if_missing) \
>> diff --git a/commit-slab.h b/commit-slab.h
>> index 05b3f2804e7..8e72a305365 100644
>> --- a/commit-slab.h
>> +++ b/commit-slab.h
>> @@ -47,6 +47,16 @@
>> *
>> * Call this function before the slab falls out of scope to avoid
>> * leaking memory.
>> + *
>> + * - void deep_clear_indegree(struct indegree *, void (*free_fn)(int*))
>> + *
>> + * Empties the slab, similar to clear_indegree(), but in addition it
>> + * calls the given 'free_fn' for each slab entry to release any
>> + * additional memory that might be owned by the entry (but not the
>> + * entry itself!).
>> + * Note that 'free_fn' might be called even for entries for which no
>> + * indegree_at() call has been made; in this case 'free_fn' is invoked
>> + * with a pointer to a zero-initialized location.
>> */
>>
>> #define define_commit_slab(slabname, elemtype) \
>> diff --git a/shallow.c b/shallow.c
>> index 7fd04afed19..c4ac8a73273 100644
>> --- a/shallow.c
>> +++ b/shallow.c
>> @@ -84,6 +84,10 @@ int is_repository_shallow(struct repository *r)
>> * supports a "valid" flag.
>> */
>> define_commit_slab(commit_depth, int *);
>> +static void free_depth_in_slab(int **ptr)
>> +{
>> + FREE_AND_NULL(*ptr);
>> +}
>
> Why FREE_AND_NULL? The original loop below called free(). The slabs
> are all released by deep_clear_() immediately after the callbacks are
> done anyway, so what's the point in zeroing these pointers?
I think the point was that a later change was going to free
elements in the slab on a one-by-one basis while computing
the filters, to save memory overall. To be future-proof
against such a change, we need to NULL the pointers here.
Perhaps that viewpoint also answers your other comment about
"why pass the pointer?"
Thanks,
-Stolee
This patch series was integrated into pu via git@6c75447. |
This patch series was integrated into pu via git@518b032. |
This patch series was integrated into seen via git@5d8ca57. |
This patch series was integrated into next via git@15c9d77. |
This patch series was integrated into seen via git@68879df. |
This patch series was integrated into seen via git@82e6359. |
This patch series was integrated into seen via git@ad94391. |
This patch series was integrated into seen via git@0dbb27a. |
This patch series was integrated into seen via git@c73f03f. |
This patch series was integrated into seen via git@8b70291. |
This patch series was integrated into seen via git@8cbb10d. |
This patch series was integrated into seen via git@c175d11. |
This patch series was integrated into seen via git@f90ca06. |
This patch series was integrated into seen via git@6b5fc67. |
This patch series was integrated into seen via git@e7bdaef. |
This patch series was integrated into seen via git@c5ce51c. |
This patch series was integrated into seen via git@91640a2. |
This patch series was integrated into seen via git@40fe550. |
This patch series was integrated into seen via git@1c79681. |
This patch series was integrated into seen via git@2c2adee. |
This patch series was integrated into seen via git@283ffd4. |
This patch series was integrated into seen via git@6c304a1. |
This patch series was integrated into seen via git@fc00109. |
This patch series was integrated into seen via git@cc3c145. |
This patch series was integrated into seen via git@de6dda0. |
This patch series was integrated into next via git@de6dda0. |
This patch series was integrated into master via git@de6dda0. |
Closed via de6dda0. |
This is based on ds/line-log-on-bloom.
Since Szeder so kindly shared his alternate Bloom filter implementation [1], I thought it worth my time to start the process of updating the patches to apply to more recent code in Git. Here is the effort to update the almost obviously-good commit-graph cleanups that he presented in that series.
[1] https://lore.kernel.org/git/[email protected]/
The range-diff below was created by applying his entire series onto v2.25.0 and then doing cherry-picks as appropriate onto ds/line-log-on-bloom and correcting conflicts. I have Szeder's original series available as "szeder-bloom" on my fork [2].
[2] https://github.com/derrickstolee/git/tree/szeder-bloom
As expected, the write_commit_graph_file() cleanups were the most difficult, in part because we added extra chunks with the changed-path Bloom filters.
I did not include this commit since we already handle it mostly with the MAX_NUM_CHUNKS macro from 08fd81c (commit-graph: implement write_commit_graph(), 2018-04-02):
There were a few cleanups that I did not apply because they are more involved to handle the conflicts with the changed-path Bloom filters, especially because we pass a "struct bloom_filter_settings" in the method signatures for the write_graph_chunk_bloom_*() functions:
I'm not saying that we shouldn't do these changes. I'm just saying that they are more involved and can wait for a second series. No need to rush things.
The rest will need to be completely re-implemented to keep the other things in mind, like split commit-graphs and the existing changed-path Bloom filters. However, the following commits would be particularly interesting to have equivalents on top of our existing Bloom filter implementation. That would allow a more fair comparison between the two options:
Full range diff:
I'm stepping out of the range-diff to point out that this change is mostly stylistic.
Thanks,
-Stolee
Cc: [email protected], [email protected], [email protected], [email protected], [email protected]