-
Notifications
You must be signed in to change notification settings - Fork 140
Optimize run_diff_files()' rename detection #142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Teach register_rename_src() to see if new file pair can simply be appended to the rename_src[] array before performing the binary search to find the proper insertion point. This is a performance optimization. This routine is called during run_diff_files in status and the caller is iterating over the sorted index, so we should expect to be able to append in the normal case. The existing insert logic is preserved so we don't have to assume that, but simply take advantage of it if possible. Signed-off-by: Jeff Hostetler <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
/submit |
Submitted as [email protected] |
@@ -82,6 +82,18 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, SZEDER Gábor wrote (reply to this):
On Sat, Jun 08, 2019 at 07:42:42AM -0700, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <[email protected]>
>
> Teach register_rename_src() to see if new file pair
> can simply be appended to the rename_src[] array before
> performing the binary search to find the proper insertion
> point.
>
> This is a performance optimization. This routine is called
> during run_diff_files in status and the caller is iterating
> over the sorted index, so we should expect to be able to
> append in the normal case. The existing insert logic is
> preserved so we don't have to assume that, but simply take
> advantage of it if possible.
Could you add a command and performance figures to the commit message
to show off the benefits of this patch?
From the description it's clear that it's a performance optimization,
but it's unclear whether it has a noticeable, or at least measurable
effect, or it's "only" a micro-optimization.
I tried something more substantial than a simple 'git status':
# without this patch
$ time perf record -g ./git log --oneline -m --name-only v2.20.0 >/dev/null
[ ... ]
real 2m4.320s
user 2m0.913s
sys 0m2.284s
$ perf report -g none |grep -E '(diffcore_rename|register_rename_src)'
52.40% 0.79% git git [.] diffcore_rename
0.01% 0.01% git git [.] register_rename_src
but it looks like that while more than half of the considerable
runtime is spent detecting renames, the time spent in
register_rename_src(), the function optimized in this patch, is
negligible.
> Signed-off-by: Jeff Hostetler <[email protected]>
> Signed-off-by: Johannes Schindelin <[email protected]>
> ---
> diffcore-rename.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index 07bd34b631..5bfc5f6c22 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -82,6 +82,18 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
>
> first = 0;
> last = rename_src_nr;
> +
> + if (last > 0) {
> + struct diff_rename_src *src = &(rename_src[last-1]);
> + int cmp = strcmp(one->path, src->p->one->path);
> + if (!cmp)
> + return src;
> + if (cmp > 0) {
> + first = last;
> + goto append_it;
> + }
> + }
> +
> while (last > first) {
> int next = (last + first) >> 1;
> struct diff_rename_src *src = &(rename_src[next]);
> @@ -95,6 +107,7 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
> first = next+1;
> }
>
> +append_it:
> /* insert to make it at "first" */
> ALLOC_GROW(rename_src, rename_src_nr + 1, rename_src_alloc);
> rename_src_nr++;
> --
> gitgitgadget
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff Hostetler wrote (reply to this):
On 6/10/2019 8:26 AM, SZEDER Gábor wrote:
> On Sat, Jun 08, 2019 at 07:42:42AM -0700, Jeff Hostetler via GitGitGadget wrote:
>> From: Jeff Hostetler <[email protected]>
>>
>> Teach register_rename_src() to see if new file pair
>> can simply be appended to the rename_src[] array before
>> performing the binary search to find the proper insertion
>> point.
>>
>> This is a performance optimization. This routine is called
>> during run_diff_files in status and the caller is iterating
>> over the sorted index, so we should expect to be able to
>> append in the normal case. The existing insert logic is
>> preserved so we don't have to assume that, but simply take
>> advantage of it if possible.
>
> Could you add a command and performance figures to the commit message
> to show off the benefits of this patch?
>
> From the description it's clear that it's a performance optimization,
> but it's unclear whether it has a noticeable, or at least measurable
> effect, or it's "only" a micro-optimization.
>
> I tried something more substantial than a simple 'git status':
>
> # without this patch
> $ time perf record -g ./git log --oneline -m --name-only v2.20.0 >/dev/null
> [ ... ]
>
> real 2m4.320s
> user 2m0.913s
> sys 0m2.284s
> $ perf report -g none |grep -E '(diffcore_rename|register_rename_src)'
> 52.40% 0.79% git git [.] diffcore_rename
> 0.01% 0.01% git git [.] register_rename_src
>
> but it looks like that while more than half of the considerable
> runtime is spent detecting renames, the time spent in
> register_rename_src(), the function optimized in this patch, is
> negligible.
>
I just wanted to send an ACK. I'll include perf numbers
and more clarity after I dig up my notes on this.
Jeff
On the Git mailing list, Jeff Hostetler wrote (reply to this):
|
@jeffhostetler maybe just after v2.24.0 (i.e. on or around November 4th) would be a good time to revisit this? |
Seems that this was made obsolete in v2.31.0, via b970b4e. |
Just another patch from Git for Windows' branch thicket...