Skip to content

[RFC] Revert/delay performance regression in 'git checkout -b' #317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

derrickstolee
Copy link

As we were integrating Git 2.23.0 into VFS for Git, we discovered that "git checkout -b new-branch" went from 0.3s to 10+s on the Windows OS repo. This was an intentional change when writing the "git switch" builtin. Here is the commit message for 65f099b ("switch: no worktree status unless real branch switch happens" 2019-03-29):

When we switch from one branch to another, it makes sense to show a
summary of local changes since there could be conflicts, or some files
left modified.... When switch is used solely for creating a new
branch (and "switch" to the same commit) or detaching, we don't really
need to show anything.

"git checkout" does it anyway for historical reasons. But we can start
with a clean slate with switch and don't have to.

This essentially reverts fa655d8411 (checkout: optimize "git checkout
-b <new_branch>" - 2018-08-16) and make it default for switch,
but also for -B and --detach. Users of big repos are encouraged to
move to switch.

I was considering doing a full, long-term revert of this change to get the performance back to normal, but I also saw this feedback on the list for this patch:

I like this last bit. The skip_merge_working_tree() function which
this removes was ugly, difficult to maintain, and difficult to get
just right (and easy to break -- even by changing parts of the system
which one might not expect to impact it).

So, the goal is to reduce the complication given by skip_merge_working_tree() by recommending that users use 'git switch -c'. The only problem is: users will take a while to move, unless prompted.

This series does the following:

  1. Reverts the change that makes 'git checkout -b' slow again.
  2. Creates a warning that recommends users start using 'git switch -c' instead.

This allows us to strip out this performance feature after users have had time to adopt the new way of doing things.

Cc: [email protected], [email protected]

@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 21, 2019

Submitted as [email protected]

@@ -16,3 +16,11 @@ will checkout the '<something>' branch on another remote,
and by linkgit:git-worktree[1] when 'git worktree add' refers to a
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Wed, Aug 21, 2019 at 12:18:33PM -0700, Derrick Stolee via GitGitGadget wrote:
> This reverts commit 65f099b3988198f0fdf3ef7a21dc01c556d21fff, which
> removed logic for avoiding extra cost in "git checkout -b" in favor
> of the new "git switch -c". This will cause a performance issue for
> users in large repos.

It always makes me sad when I see a pull request, where all the
relevant information is only included in the PR, which will not be
part of the history, but not in the commit messages...


>  builtin/.checkout.c.swp           | Bin 0 -> 77824 bytes

> diff --git a/builtin/.checkout.c.swp b/builtin/.checkout.c.swp
> new file mode 100644
> index 0000000000000000000000000000000000000000..f6dad4abb02c265ee66b3f6f76d00d59b9b524a4
> GIT binary patch
> literal 77824
> zcmeIb37F(pRrf#a46<X03c_z1GgCe3>dYjdA(>2=r6*}KnI2}zLK2GV?&_Y(bXPT1
> z)icux34uRC00rR{;pJ5jL_ico7FiV85d;D8CG1O3By6%HioE<k-*fK$ZB^AXi9sKq

Uh-oh.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/21/2019 6:15 PM, SZEDER Gábor wrote:
> On Wed, Aug 21, 2019 at 12:18:33PM -0700, Derrick Stolee via GitGitGadget wrote:
>> This reverts commit 65f099b3988198f0fdf3ef7a21dc01c556d21fff, which
>> removed logic for avoiding extra cost in "git checkout -b" in favor
>> of the new "git switch -c". This will cause a performance issue for
>> users in large repos.
> 
> It always makes me sad when I see a pull request, where all the
> relevant information is only included in the PR, which will not be
> part of the history, but not in the commit messages...

An issue with the RFC quality at the moment. I'll do a better job
if we decide to move this direction.
 
>>  builtin/.checkout.c.swp           | Bin 0 -> 77824 bytes
> 
>> diff --git a/builtin/.checkout.c.swp b/builtin/.checkout.c.swp
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..f6dad4abb02c265ee66b3f6f76d00d59b9b524a4
>> GIT binary patch
>> literal 77824
>> zcmeIb37F(pRrf#a46<X03c_z1GgCe3>dYjdA(>2=r6*}KnI2}zLK2GV?&_Y(bXPT1
>> z)icux34uRC00rR{;pJ5jL_ico7FiV85d;D8CG1O3By6%HioE<k-*fK$ZB^AXi9sKq
> 
> Uh-oh.

Whoops! I thought I removed this in time to submit, but I guess I
did not.

-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 21, 2019

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Wed, Aug 21, 2019 at 12:18:32PM -0700, Derrick Stolee via GitGitGadget wrote:
> As we were integrating Git 2.23.0 into VFS for Git, we discovered that "git
> checkout -b new-branch" went from 0.3s to 10+s on the Windows OS repo.

Does this slowdown only affect the Windows OS repo with VFS for Git,
or other biggish repos without VFS for Git as well?

> This series does the following:
> 
>  1. Reverts the change that makes 'git checkout -b' slow again.
>  2. Creates a warning that recommends users start using 'git switch -c'
>     instead.

'git help switch' says loudly and clearly that "THIS COMMAND IS
EXPERIMENTAL. THE BEHAVIOR MAY CHANGE."  It's too early to recommend
it this aggressively, and to deprecate any parts of 'git checkout'.


@gitgitgadget
Copy link

gitgitgadget bot commented Aug 22, 2019

On the Git mailing list, Elijah Newren wrote (reply to this):

Hi,

On Wed, Aug 21, 2019 at 12:21 PM Derrick Stolee via GitGitGadget
<[email protected]> wrote:
>
> As we were integrating Git 2.23.0 into VFS for Git, we discovered that "git
> checkout -b new-branch" went from 0.3s to 10+s on the Windows OS repo. This
> was an intentional change when writing the "git switch" builtin. Here is the
> commit message for 65f099b ("switch: no worktree status unless real branch
> switch happens" 2019-03-29):
>
> When we switch from one branch to another, it makes sense to show a
> summary of local changes since there could be conflicts, or some files
> left modified.... When switch is used solely for creating a new
> branch (and "switch" to the same commit) or detaching, we don't really
> need to show anything.
>
> "git checkout" does it anyway for historical reasons. But we can start
> with a clean slate with switch and don't have to.
>
> This essentially reverts fa655d8411 (checkout: optimize "git checkout
> -b <new_branch>" - 2018-08-16) and make it default for switch,
> but also for -B and --detach. Users of big repos are encouraged to
> move to switch.
>
> I was considering doing a full, long-term revert of this change to get the
> performance back to normal, but I also saw this feedback on the list for
> this patch:
>
> I like this last bit. The skip_merge_working_tree() function which
> this removes was ugly, difficult to maintain, and difficult to get
> just right (and easy to break -- even by changing parts of the system
> which one might not expect to impact it).

Instead of restoring this easy-to-break code, could we instead
simplify it and make it more robust?  As per the original commit
message, the whole point of the patch is that when you have a huge
index, operations take a while.  But in the special case of "git
checkout -b <new_branch>", there's no need to even check the index.
So, we could:

  * Check if the user is running "git checkout -b <new_branch>"
  * If so, use the performance hack to skip touching the index at all.

This would be much better than what the patch currently does:

  * Check if the user has specified -m, if so they clearly didn't just
specify "git checkout -b <new_branch>"
  * Check if the user has specified -f, if so they clearly didn't just
specify "git checkout -b <new_branch>"
  * Check if the user has specified --detach, if so they clearly
didn't just specify "git checkout -b <new_branch>"
  * ...<lots of other similar steps>...
  * If we got here, since we've checked all other cases (assuming
other people who have touched checkout remembered to add the necessary
checks for each and every new flag), then by deduction the user must
have specified "git checkout -b <new_branch>", so...
  * Use the performance hack to skip touching the index at all.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 22, 2019

On the Git mailing list, Elijah Newren wrote (reply to this):

On Wed, Aug 21, 2019 at 5:01 PM SZEDER G=C3=A1bor <[email protected]> wr=
ote:
>
> On Wed, Aug 21, 2019 at 12:18:32PM -0700, Derrick Stolee via GitGitGadget=
 wrote:
> > As we were integrating Git 2.23.0 into VFS for Git, we discovered that =
"git
> > checkout -b new-branch" went from 0.3s to 10+s on the Windows OS repo.
>
> Does this slowdown only affect the Windows OS repo with VFS for Git,
> or other biggish repos without VFS for Git as well?

It will also affect other biggish repos without VFS for Git; see the
"does not seem to be GFVS-specific in any way" paragraph of
https://public-inbox.org/git/CABPp-BGir_5xyqEfwytDog0rZDydPHXjuqXCpNKk67dVP=
[email protected]/

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 22, 2019

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/21/2019 6:31 PM, SZEDER Gábor wrote:
> On Wed, Aug 21, 2019 at 12:18:32PM -0700, Derrick Stolee via GitGitGadget wrote:
>> As we were integrating Git 2.23.0 into VFS for Git, we discovered that "git
>> checkout -b new-branch" went from 0.3s to 10+s on the Windows OS repo.
> 
> Does this slowdown only affect the Windows OS repo with VFS for Git,
> or other biggish repos without VFS for Git as well?
> 
>> This series does the following:
>>
>>  1. Reverts the change that makes 'git checkout -b' slow again.
>>  2. Creates a warning that recommends users start using 'git switch -c'
>>     instead.
> 
> 'git help switch' says loudly and clearly that "THIS COMMAND IS
> EXPERIMENTAL. THE BEHAVIOR MAY CHANGE."  It's too early to recommend
> it this aggressively, and to deprecate any parts of 'git checkout'.

Even more reason why the performance boost should not have been
removed from 'git checkout -b' so quickly.

Thanks,
-Stolee


@gitgitgadget
Copy link

gitgitgadget bot commented Aug 22, 2019

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/21/2019 11:16 PM, Elijah Newren wrote:
> Hi,
> 
> On Wed, Aug 21, 2019 at 12:21 PM Derrick Stolee via GitGitGadget
> <[email protected]> wrote:
>>
>> As we were integrating Git 2.23.0 into VFS for Git, we discovered that "git
>> checkout -b new-branch" went from 0.3s to 10+s on the Windows OS repo. This
>> was an intentional change when writing the "git switch" builtin. Here is the
>> commit message for 65f099b ("switch: no worktree status unless real branch
>> switch happens" 2019-03-29):
>>
>> When we switch from one branch to another, it makes sense to show a
>> summary of local changes since there could be conflicts, or some files
>> left modified.... When switch is used solely for creating a new
>> branch (and "switch" to the same commit) or detaching, we don't really
>> need to show anything.
>>
>> "git checkout" does it anyway for historical reasons. But we can start
>> with a clean slate with switch and don't have to.
>>
>> This essentially reverts fa655d8411 (checkout: optimize "git checkout
>> -b <new_branch>" - 2018-08-16) and make it default for switch,
>> but also for -B and --detach. Users of big repos are encouraged to
>> move to switch.
>>
>> I was considering doing a full, long-term revert of this change to get the
>> performance back to normal, but I also saw this feedback on the list for
>> this patch:
>>
>> I like this last bit. The skip_merge_working_tree() function which
>> this removes was ugly, difficult to maintain, and difficult to get
>> just right (and easy to break -- even by changing parts of the system
>> which one might not expect to impact it).
> 
> Instead of restoring this easy-to-break code, could we instead
> simplify it and make it more robust?  As per the original commit
> message, the whole point of the patch is that when you have a huge
> index, operations take a while.  But in the special case of "git
> checkout -b <new_branch>", there's no need to even check the index.
> So, we could:
> 
>   * Check if the user is running "git checkout -b <new_branch>"
>   * If so, use the performance hack to skip touching the index at all.
> 
> This would be much better than what the patch currently does:
> 
>   * Check if the user has specified -m, if so they clearly didn't just
> specify "git checkout -b <new_branch>"
>   * Check if the user has specified -f, if so they clearly didn't just
> specify "git checkout -b <new_branch>"
>   * Check if the user has specified --detach, if so they clearly
> didn't just specify "git checkout -b <new_branch>"
>   * ...<lots of other similar steps>...
>   * If we got here, since we've checked all other cases (assuming
> other people who have touched checkout remembered to add the necessary
> checks for each and every new flag), then by deduction the user must
> have specified "git checkout -b <new_branch>", so...
>   * Use the performance hack to skip touching the index at all.

I can look into a simpler implementation of this direction. I'm
getting the feeling that this immediate recommendation of "git switch -c"
is too hasty, so the best thing to do is to create a simpler way
to detect "git checkout -b" and use the fast method.

Thanks,
-Stolee

This reverts commit 65f099b, which
removed logic for avoiding extra cost in "git checkout -b" in favor
of the new "git switch -c". The new switch builtin is still an
experimental feature, so stating "large repos should use 'git switch'"
is too aggressive.

When a user runs "git checkout -b <branchname>" they are pretty
clear in stating that they only want to create a branch and not
update the working directory. The difficulty comes in detecting
that we are actually in that case. That is why the
skip_merge_working_tree() method is so detailed: there are many
settings to check to ensure the arguments match as expected.

Signed-off-by: Derrick Stolee <[email protected]>
@derrickstolee
Copy link
Author

Replaced by #325

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants