Skip to content

Optimize run_diff_files()' rename detection #142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions diffcore-rename.c
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,18 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Sat, Jun 08, 2019 at 07:42:42AM -0700, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <[email protected]>
> 
> Teach register_rename_src() to see if new file pair
> can simply be appended to the rename_src[] array before
> performing the binary search to find the proper insertion
> point.
> 
> This is a performance optimization.  This routine is called
> during run_diff_files in status and the caller is iterating
> over the sorted index, so we should expect to be able to
> append in the normal case.  The existing insert logic is
> preserved so we don't have to assume that, but simply take
> advantage of it if possible.

Could you add a command and performance figures to the commit message
to show off the benefits of this patch?

From the description it's clear that it's a performance optimization,
but it's unclear whether it has a noticeable, or at least measurable
effect, or it's "only" a micro-optimization.

I tried something more substantial than a simple 'git status':

  # without this patch
  $ time perf record -g ./git log --oneline -m --name-only v2.20.0 >/dev/null
  [ ... ]
  
  real    2m4.320s
  user    2m0.913s
  sys     0m2.284s
  $ perf report -g none |grep -E '(diffcore_rename|register_rename_src)'
      52.40%     0.79%  git      git                 [.] diffcore_rename
       0.01%     0.01%  git      git                 [.] register_rename_src

but it looks like that while more than half of the considerable
runtime is spent detecting renames, the time spent in
register_rename_src(), the function optimized in this patch, is
negligible.


> Signed-off-by: Jeff Hostetler <[email protected]>
> Signed-off-by: Johannes Schindelin <[email protected]>
> ---
>  diffcore-rename.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index 07bd34b631..5bfc5f6c22 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -82,6 +82,18 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
>  
>  	first = 0;
>  	last = rename_src_nr;
> +
> +	if (last > 0) {
> +		struct diff_rename_src *src = &(rename_src[last-1]);
> +		int cmp = strcmp(one->path, src->p->one->path);
> +		if (!cmp)
> +			return src;
> +		if (cmp > 0) {
> +			first = last;
> +			goto append_it;
> +		}
> +	}
> +
>  	while (last > first) {
>  		int next = (last + first) >> 1;
>  		struct diff_rename_src *src = &(rename_src[next]);
> @@ -95,6 +107,7 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
>  		first = next+1;
>  	}
>  
> +append_it:
>  	/* insert to make it at "first" */
>  	ALLOC_GROW(rename_src, rename_src_nr + 1, rename_src_alloc);
>  	rename_src_nr++;
> -- 
> gitgitgadget

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 6/10/2019 8:26 AM, SZEDER Gábor wrote:
> On Sat, Jun 08, 2019 at 07:42:42AM -0700, Jeff Hostetler via GitGitGadget wrote:
>> From: Jeff Hostetler <[email protected]>
>>
>> Teach register_rename_src() to see if new file pair
>> can simply be appended to the rename_src[] array before
>> performing the binary search to find the proper insertion
>> point.
>>
>> This is a performance optimization.  This routine is called
>> during run_diff_files in status and the caller is iterating
>> over the sorted index, so we should expect to be able to
>> append in the normal case.  The existing insert logic is
>> preserved so we don't have to assume that, but simply take
>> advantage of it if possible.
> 
> Could you add a command and performance figures to the commit message
> to show off the benefits of this patch?
> 
>  From the description it's clear that it's a performance optimization,
> but it's unclear whether it has a noticeable, or at least measurable
> effect, or it's "only" a micro-optimization.
> 
> I tried something more substantial than a simple 'git status':
> 
>    # without this patch
>    $ time perf record -g ./git log --oneline -m --name-only v2.20.0 >/dev/null
>    [ ... ]
>    
>    real    2m4.320s
>    user    2m0.913s
>    sys     0m2.284s
>    $ perf report -g none |grep -E '(diffcore_rename|register_rename_src)'
>        52.40%     0.79%  git      git                 [.] diffcore_rename
>         0.01%     0.01%  git      git                 [.] register_rename_src
> 
> but it looks like that while more than half of the considerable
> runtime is spent detecting renames, the time spent in
> register_rename_src(), the function optimized in this patch, is
> negligible.
> 

I just wanted to send an ACK.  I'll include perf numbers
and more clarity after I dig up my notes on this.

Jeff

first = 0;
last = rename_src_nr;

if (last > 0) {
struct diff_rename_src *src = &(rename_src[last-1]);
int cmp = strcmp(one->path, src->p->one->path);
if (!cmp)
return src;
if (cmp > 0) {
first = last;
goto append_it;
}
}

while (last > first) {
int next = (last + first) >> 1;
struct diff_rename_src *src = &(rename_src[next]);
Expand All @@ -95,6 +107,7 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
first = next+1;
}

append_it:
/* insert to make it at "first" */
ALLOC_GROW(rename_src, rename_src_nr + 1, rename_src_alloc);
rename_src_nr++;
Expand Down