Skip to content

Optimize run_diff_files()' rename detection #142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dscho
Copy link
Member

@dscho dscho commented Feb 25, 2019

Just another patch from Git for Windows' branch thicket...

Teach register_rename_src() to see if new file pair
can simply be appended to the rename_src[] array before
performing the binary search to find the proper insertion
point.

This is a performance optimization.  This routine is called
during run_diff_files in status and the caller is iterating
over the sorted index, so we should expect to be able to
append in the normal case.  The existing insert logic is
preserved so we don't have to assume that, but simply take
advantage of it if possible.

Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
@dscho dscho added the ready to submit Has commits that have not been submitted yet label Feb 25, 2019
@dscho
Copy link
Member Author

dscho commented Jun 8, 2019

/submit

@dscho dscho removed the ready to submit Has commits that have not been submitted yet label Jun 8, 2019
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 8, 2019

Submitted as [email protected]

@@ -82,6 +82,18 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Sat, Jun 08, 2019 at 07:42:42AM -0700, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <[email protected]>
> 
> Teach register_rename_src() to see if new file pair
> can simply be appended to the rename_src[] array before
> performing the binary search to find the proper insertion
> point.
> 
> This is a performance optimization.  This routine is called
> during run_diff_files in status and the caller is iterating
> over the sorted index, so we should expect to be able to
> append in the normal case.  The existing insert logic is
> preserved so we don't have to assume that, but simply take
> advantage of it if possible.

Could you add a command and performance figures to the commit message
to show off the benefits of this patch?

From the description it's clear that it's a performance optimization,
but it's unclear whether it has a noticeable, or at least measurable
effect, or it's "only" a micro-optimization.

I tried something more substantial than a simple 'git status':

  # without this patch
  $ time perf record -g ./git log --oneline -m --name-only v2.20.0 >/dev/null
  [ ... ]
  
  real    2m4.320s
  user    2m0.913s
  sys     0m2.284s
  $ perf report -g none |grep -E '(diffcore_rename|register_rename_src)'
      52.40%     0.79%  git      git                 [.] diffcore_rename
       0.01%     0.01%  git      git                 [.] register_rename_src

but it looks like that while more than half of the considerable
runtime is spent detecting renames, the time spent in
register_rename_src(), the function optimized in this patch, is
negligible.


> Signed-off-by: Jeff Hostetler <[email protected]>
> Signed-off-by: Johannes Schindelin <[email protected]>
> ---
>  diffcore-rename.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index 07bd34b631..5bfc5f6c22 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -82,6 +82,18 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
>  
>  	first = 0;
>  	last = rename_src_nr;
> +
> +	if (last > 0) {
> +		struct diff_rename_src *src = &(rename_src[last-1]);
> +		int cmp = strcmp(one->path, src->p->one->path);
> +		if (!cmp)
> +			return src;
> +		if (cmp > 0) {
> +			first = last;
> +			goto append_it;
> +		}
> +	}
> +
>  	while (last > first) {
>  		int next = (last + first) >> 1;
>  		struct diff_rename_src *src = &(rename_src[next]);
> @@ -95,6 +107,7 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
>  		first = next+1;
>  	}
>  
> +append_it:
>  	/* insert to make it at "first" */
>  	ALLOC_GROW(rename_src, rename_src_nr + 1, rename_src_alloc);
>  	rename_src_nr++;
> -- 
> gitgitgadget

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 6/10/2019 8:26 AM, SZEDER Gábor wrote:
> On Sat, Jun 08, 2019 at 07:42:42AM -0700, Jeff Hostetler via GitGitGadget wrote:
>> From: Jeff Hostetler <[email protected]>
>>
>> Teach register_rename_src() to see if new file pair
>> can simply be appended to the rename_src[] array before
>> performing the binary search to find the proper insertion
>> point.
>>
>> This is a performance optimization.  This routine is called
>> during run_diff_files in status and the caller is iterating
>> over the sorted index, so we should expect to be able to
>> append in the normal case.  The existing insert logic is
>> preserved so we don't have to assume that, but simply take
>> advantage of it if possible.
> 
> Could you add a command and performance figures to the commit message
> to show off the benefits of this patch?
> 
>  From the description it's clear that it's a performance optimization,
> but it's unclear whether it has a noticeable, or at least measurable
> effect, or it's "only" a micro-optimization.
> 
> I tried something more substantial than a simple 'git status':
> 
>    # without this patch
>    $ time perf record -g ./git log --oneline -m --name-only v2.20.0 >/dev/null
>    [ ... ]
>    
>    real    2m4.320s
>    user    2m0.913s
>    sys     0m2.284s
>    $ perf report -g none |grep -E '(diffcore_rename|register_rename_src)'
>        52.40%     0.79%  git      git                 [.] diffcore_rename
>         0.01%     0.01%  git      git                 [.] register_rename_src
> 
> but it looks like that while more than half of the considerable
> runtime is spent detecting renames, the time spent in
> register_rename_src(), the function optimized in this patch, is
> negligible.
> 

I just wanted to send an ACK.  I'll include perf numbers
and more clarity after I dig up my notes on this.

Jeff

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 10, 2019

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 6/8/2019 10:42 AM, Johannes Schindelin via GitGitGadget wrote:
> Just another patch from Git for Windows' branch thicket...
> 
> Jeff Hostetler (1):
>    diffcore-rename: speed up register_rename_src
> 
>   diffcore-rename.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
> 
> 
> base-commit: 8104ec994ea3849a968b4667d072fedd1e688642
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-142%2Fdscho%2Fregister_rename_src-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-142/dscho/register_rename_src-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/142
> 

For the sake of full disclosure, we added this patch
to Git for Windows in December 2016.

It was discussed on the mailing list in April 2017 [1] but
it was shelved for various reasons.

Let me put this one on hold while I dig up my notes from 2016
and re-review of the original mailing list suggestions and
see where I want to take this patch going forward.

Jeff


[1] 
https://public-inbox.org/git/[email protected]/

@dscho
Copy link
Member Author

dscho commented Oct 27, 2019

@jeffhostetler maybe just after v2.24.0 (i.e. on or around November 4th) would be a good time to revisit this?

@dscho
Copy link
Member Author

dscho commented Feb 26, 2021

Seems that this was made obsolete in v2.31.0, via b970b4e.

@dscho dscho closed this Feb 26, 2021
@dscho dscho deleted the register_rename_src branch February 26, 2021 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants