Skip to content

[DO NOT MERGE YET] builtin rebase and stash #1800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

dscho
Copy link
Member

@dscho dscho commented Aug 23, 2018

The Google Summer of Code 2018 was particularly useful for Windows users: all three accepted students worked on converting slow Unix shell scripts to fast Git builtins. Specifically:

  • git stash
  • git rebase
  • git rebase -i

The latter looks as if it should be part of git rebase, but that is not the case: rebase is such a complex command that it consisted of 5 separate shell scripts. This PR also takes care of the default git rebase backend, which I converted this afternoon, for fun.

NOTE: it might be too risky to go "full in" with the builtins, as the patches are still being reviewed/developed, and have not seen a whole lot of real-world usage. We can help with that real-world usage in Git for Windows, but it may be a tall order to force potentially buggy software onto end users, so I am considering the following additional patches:

  • we still have that rebase.useBuiltin config variable, which defaults to true in this PR, but the Git for Windows installer could override that and default to false, just like we did it with the builtin difftool
  • we could easily introduce a stash.useBuiltin config variable, and re-introduce the shell-scripted version as git-legacy-stash.sh, and also let the Git for Windows default to false here, and
  • we should then re-introduce the rebase -i backend (under a slightly different name) and call it from the git-legacy-rebase.sh (which is only called if rebase.useBuiltin is set to false).

Note also: I used the current branches as per the Git maintainer's current state, but I know that a couple of updates are still planned, so I will have to incorporate them accordingly.

Philip Oakley and others added 30 commits August 21, 2018 17:42
This constant is not defined in MSVC's headers.

In UCRT's fcntl.h, _O_RDONLY, _O_WRONLY and _O_RDWR are defined as 0, 1
and 2, respectively. Yes, that means that UCRT breaks with the tradition
that O_RDWR == O_RDONLY | O_WRONLY.

It is a perfectly legal way to define those constants, though, therefore
we need to take care of defining O_ACCMODE accordingly.

This is particularly important in order to keep our "open() can set
errno to EISDIR" emulation working: it tests that (flags & O_ACCMODE) is
not identical to O_RDONLY before going on to test specifically whether
the file for which open() reported EACCES is, in fact, a directory.

Signed-off-by: Philip Oakley <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
GCC and MSVC disagree about using the GCC extension _ANONYMOUS_UNION.
Simply skip that offending keyword when compiling with MSVC.

Signed-off-by: Philip Oakley <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
This adds MSVC versions of getenv() and friends. These take UTF-8
arguments and return UTF-8 values, but use the UNICODE versions
of the CRT routines.  This avoids the need to write to __environ
(which is only visible if you statically link to the CRT).  This
also avoids the CP_ACP conversions performed inside the CRT.
It also avoids various memory leaks and problems.

Signed-off-by: Jeff Hostetler <[email protected]>
VS2015 complains when using a const pointer in memcpy()/free().

Signed-off-by: Jeff Hostetler <[email protected]>
This special-cases various signals that are not supported on Windows,
such as SIGPIPE. These cause the UCRT to throw asserts (at least in
debug mode).

Signed-off-by: Jeff Hostetler <[email protected]>
The MINGW version of the main() wrapper gets away with declaring symbols
that were intentionally not exported. However, some of these symbols do
not actually exist in MSVC's UCRT.

So let's add an MSVC version of the main() wrapper that uses wmain() and
imports the UNICODE argv and environment. While at it, we pass our UTF-8
version of ARGV to the real main -- rather than overwriting __argv as is
done in the MINGW Version.

Signed-off-by: Jeff Hostetler <[email protected]>
VS2015's headers already declare that struct.

Signed-off-by: Jeff Hostetler <[email protected]>
It is just called different in MSVC's headers.

Signed-off-by: Jeff Hostetler <[email protected]>
The ntstatus.h header is only available in MINGW.

Signed-off-by: Jeff Hostetler <[email protected]>
In MSVC, the DEBUG constant is set automatically whenever compiling with
debug information.

This is clearly not what was intended in cache-tree.c, so let's use a less
ambiguous constant there.

Signed-off-by: Jeff Hostetler <[email protected]>
Install required third-party DLLs next to EXEs.

Build and install release mode PDBs for git
executables allowing detailed stack traces
in the event of crash.

Signed-off-by: Jeff Hostetler <[email protected]>
Just like 1e2ce1d (sha1: Use OpenSSL SHA1 routines on MINGW, 2016-10-12),
we now use OpenSSL's SHA-1 routines instead of Git's own because OpenSSL
is substantially faster as of version 1.0.2: it now uses hardware
acceleration on Intel processors much more effectively.

Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
When compiling with MSVC, we rely on NuPkgs to provide the binaries of
dependencies such as libiconv. The libiconv 1.14.0.11 package available
from https://www.nuget.org/packages/libiconv seems to have a bug where
it does not set errno (when we would expect it to be E2BIG).

Let's simulate the error condition by taking less than 16 bytes
remaining in the out buffer as an indicator that we ran out of space.
While 16 might seem a bit excessive (when converting from, say, any
encoding to UTF-8, 8 bytes should be fine), it is designed to be a safe
margin.

Signed-off-by: Johannes Schindelin <[email protected]>
When passing a command-line to call an external diff command to the
difftool, we must be prepared for paths containing special characters,
e.g. backslashes in the temporary directory's path on Windows.

This has been caught by running the test suite with an MSVC-built Git:
in contrast to the MINGW one, it does not rewrite `$TMP` to use forward
slashes instead of backslashes.

Signed-off-by: Johannes Schindelin <[email protected]>
Set -O2 (maximize speed) rather than -Os (favor small code)
for non-debug builds.

Signed-off-by: Jeff Hostetler <[email protected]>
Teach main Makefile to also delete the generated PDB files
as well as the PDB files for the various EXE files during
"make MSVC=1 clean".

Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Jeff Hostetler <[email protected]>
Remove todo list item for vs2015 build notes regarding
the third party DLLs.

Signed-off-by: Jeff Hostetler <[email protected]>
In Git for Windows' SDK, there is already a script to package Git for
Windows as a NuGet package, downloading nuget.exe if needed.

Let's just fall back to using that executable (if it is there) if
nuget.exe was not found in the PATH.

Signed-off-by: Johannes Schindelin <[email protected]>
MS Visual C suggests that the construct

	condition ? (int) i : (ptrdiff_t) d

is incorrect. Let's fix this by casting to ptrdiff_t also for the
positive arm of the conditional.

Signed-off-by: Johannes Schindelin <[email protected]>
We do have a lovely Makefile option to state that that header file is
not available. Let's use it everywhere...

Signed-off-by: Johannes Schindelin <[email protected]>
When compiling with Visual Studio, the projects' names are identical to
the executables modulo the extensions. Which means that the bin-wrappers
*need* to target the .exe files lest they try to execute directories.

Signed-off-by: Johannes Schindelin <[email protected]>
Add the Microsoft .manifest pattern, and correct the generic 'Debug'
and 'Release' directory patterns which were mechanically adjusted way
back in c591d5f (gitignore: root most patterns at the top-level directory,
2009-10-26) to allow multi-level projects within the Git suite.

Signed-off-by: Philip Oakley <[email protected]>
Git's source code wants to be able to close() the same file descriptor
multiple times, ignoring the error returned by the second call (and the
ones after that), or to access the osfhandle of an already-closed stdout,
among other things that the UCRT does not like.

Simply linking invalidcontinue.obj allows such usage without resorting to
Debug Assertions (or exiting with exit code 9 in Release Mode).

Let's add a note so we don't forget, as suggested by Jeff Hostetler.

See https://msdn.microsoft.com/en-us/library/ms235330.aspx for more
details.

Signed-off-by: Johannes Schindelin <[email protected]>
It is a real old anachronism from the Cogito days to have a
.git/branches/ directory. And to have tests that ensure that Cogito
users can migrate away from using that directory.

But so be it, let's continue testing it.

Let's make sure, however, that git init does not need to create that
directory.

This bug was noticed when testing with templates that had been
pre-committed, skipping the empty branches/ directory of course because
Git does not track empty directories.

Signed-off-by: Johannes Schindelin <[email protected]>
The error message talked about a "lib option", but it clearly referred
to a link option.

Signed-off-by: Philip Oakley <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
The engine.pl script expects file names not to contain spaces. However,
paths with spaces are quite prevalent on Windows. Use shellwords() rather
than split() to parse them correctly.

Helped-by: Junio C Hamano <[email protected]>
Signed-off-by: Philip Oakley <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
dscho added 9 commits August 23, 2018 22:24
This merges the builtin stash.

Upstream Git did not integrate it into any stable integration branch
yet, but the performance improvements are substantial enough,
especially on Windows, that we really, really, really want to have it
early.

Signed-off-by: Johannes Schindelin <[email protected]>
This is the first batch of the patches that turn `git rebase` into
a builtin.

This not only helps performance on Windows, but *especially* makes
things more robust, as no MSYS2 Bash will be required to run this
command any longer.

Signed-off-by: Johannes Schindelin <[email protected]>
This is the second wave of patches to bring us closer to a builtin `git
rebase`.

Signed-off-by: Johannes Schindelin <[email protected]>
This set of patches implements the actions (such as --continue, --skip,
etc) in the builtin rebase.

Signed-off-by: Johannes Schindelin <[email protected]>
This wave of built rebase patches implements the remaining rebase
options in the builtin rebase.

Signed-off-by: Johannes Schindelin <[email protected]>
This fifth batch of builtin rebase patches concludes the conversion: the
builtin rebase is now feature-complete.

Signed-off-by: Johannes Schindelin <[email protected]>
This branch first merges the builtin interactive rebase, and then
teaches the builtin rebase to hand off interactive rebases to the
builtin backend correctly.

Signed-off-by: Johannes Schindelin <[email protected]>
This final patch flips the switch and makes the builtin rebase the
default. The old, Unix shell scripted version can still be called via

	git -c rebase.useBuiltin=false rebase [...]

Signed-off-by: Johannes Schindelin <[email protected]>
This patch teaches the builtin rebase to avoid the scripted --am backend
and call `git format-patch` and `git am` directly.

Meaning: apart from the --merge and the --preserve-merges backends, `git
rebase` is now implemented in pure C, with no need to ask the Unix shell
interpreter for help.

This brings us really close to a fully builtin `git rebase`: the
--preserve-merges mode is about to be deprecated (as soon as the
--rebase-merges mode has proven stable and robust enough), and there are
plans to scrap the `git-rebase--merge` backend in favor of teaching the
interactive rebase enough tricks to run the --merge mode, too.

Signed-off-by: Johannes Schindelin <[email protected]>
@tgummerer
Copy link

we could easily introduce a stash.useBuiltin config variable, and re-introduce the shell-scripted version as git-legacy-stash.sh, and also let the Git for Windows default to false here, and

I think having at least the config variable would be a good thing I think. I'm having some seconds thoughts on what I said yesterday on IRC:

yeah I gave the stash patches a look, and I didn't find anything that seemed seriously wrong with them

I did spot a few bugs, especially in the last patch in the series, which breaks stash in a worktree I think. Also the configuration variables are not respected correctly. This is the two top ones that I remember of the top of my head. I can help trying to sort through my review feedback to see what I consider broken vs. possible improvements in the code, but that's not going to happen any earlier than the weekend.

dscho and others added 10 commits August 24, 2018 15:14
This simply copies the version as of v2.19.0-rc0 verbatim. As of now,
it is not hooked up (because it needs a couple more changes to work);
The next commit will use the scripted interactive rebase backend from
`git rebase` again when `rebase.useBuiltin=false`.

Signed-off-by: Johannes Schindelin <[email protected]>
We recently converted both the `git rebase` and the `git rebase -i`
command from Unix shell scripts to builtins.

The former has a safety valve allowing to fall back to the scripted
`rebase`, just in case that there is a bug in the builtin `rebase`:
setting the config variable `rebase.useBuiltin` to `false` will
fall back to using the scripted version.

The latter did not have such a safety hatch.

Let's reinstate the scripted interactive rebase backend so that `rebase.useBuiltin=false` will not use the builtin interactive rebase,
just in case that an end user runs into a bug with the builtin version
and needs to get out of the fix really quickly.

This is necessary because Git for Windows wants to ship the builtin
rebase/interactive rebase earlier than core Git: Git for Windows
v2.19.0 will come with the option of a drastically faster (if a lot
less battle-tested) `git rebase`/`git rebase -i`.

As the file name `git-rebase--interactive` is already in use, let's
rename the scripted backend to `git-legacy-rebase--interactive`.

A couple of additional touch-ups are needed (such as teaching the
builtin `rebase--interactive`, which assumed the role of the
`rebase--helper`, to perform the two tricks to skip the unnecessary
picks and to generate a new todo list) to make things work again.

Signed-off-by: Johannes Schindelin <[email protected]>
Add a test demonstrating a problem with long submodule paths

[jes: adusted test number to avoid conflicts, fixed non-portable use of
the 'export' statement, fixed broken && chain]

Signed-off-by: Doug Kelly <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
This simply copies the version as of v2.19.0-rc0 verbatim. As of now,
it is not hooked up.

The next commit will change the builtin `stash` to hand off to the
scripted `git stash` when `stash.useBuiltin=false`.

Signed-off-by: Johannes Schindelin <[email protected]>
We recently converted the `git stash` command from Unix shell scripts
to builtins.

Just like we have `rebase.useBuiltin` to fall back to the scripted
rebase, to give end users a way out when they discover a bug in the
builtin command, this commit adds support for `stash.useBuiltin`.

This is necessary because Git for Windows wants to ship the builtin
stash earlier than core Git: Git for Windows v2.19.0 will come with
the option of a drastically faster (if a lot less battle-tested)
`git stash`.

As the file name `git-stash` is already in use, let's rename the
scripted backend to `git-legacy-stash`.

To make the test suite pass with `stash.useBuiltin=false`, this commit
also backports rudimentary support for `-q` (but only *just* enough
to appease the test suite), and adds a super-ugly hack to force exit
code 129 for `git stash -h`.

Signed-off-by: Johannes Schindelin <[email protected]>
The upcoming Git for Windows v2.19.0 wants to ship with the builtin
versions of stash, rebase and rebase -i. The reason: these are just *so
much faster*: t3400 and t3404 run about 60-70 percent faster, and t3903
even more than 80% faster.

However, these are still all pretty fresh, still being reviewed and
iterated on the Git mailing list.

So let's try to give users a way to test these (or to boldly use them
for their mission-critical tasks, as this here developer plans on
doing), but stay with the safe option by default: use the scripted
versions (which might be slow, but they are well tested).

Signed-off-by: Johannes Schindelin <[email protected]>
This *would* be a fixup commit, except that we want to avoid rewriting
commits that we merged from upstream's `pu` branch. Instead, we want to
send a new iteration, and then re-merge the new iteration once it made
it into the `pu` branch.

Signed-off-by: Johannes Schindelin <[email protected]>
A couple of fixes that should be squashed during the next merging
rebase of Git for Windows.

Signed-off-by: Johannes Schindelin <[email protected]>
This branch adds back the scripted versions, then adds the option to
use the builtin versions of `stash` and `rebase` by setting
`stash.useBuiltin=true` and `rebase.useBuiltin=true`, respectively,
(the latter already worked for the top-level `git rebase` command and
the `--am` backend, and now it also works for the interactive backend).

Signed-off-by: Johannes Schindelin <[email protected]>
@prertik
Copy link

prertik commented Aug 25, 2018

Whoa! Awesome!! You implemented rebase --am too, cool! Excited about the new release incorporating these. 🎉

@peolsson
Copy link

I use both rebase -i and stash quite often. I've tested this new release, and so far I'm amazed by the speed improvement! I will continue running this version and report back if I find any issues with it.

@dscho dscho mentioned this pull request Aug 28, 2018
@dscho
Copy link
Member Author

dscho commented Aug 28, 2018

Superseded by #1806

@dscho dscho closed this Aug 28, 2018
@dscho dscho deleted the g4w-and-builtin-rebase-and-stash branch August 28, 2018 21:16
@dscho
Copy link
Member Author

dscho commented Aug 28, 2018

Superseded by #1806

(This will hopefully be published as Git for Windows v2.19.0-rc1 soon. And it does have the built-in stash and rebase as experimental options, opt-in, as recommended by @tgummerer.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.