Skip to content

Get the BusyBox-based MinGit production-ready #1439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dscho opened this issue Jan 20, 2018 · 32 comments
Open

Get the BusyBox-based MinGit production-ready #1439

dscho opened this issue Jan 20, 2018 · 32 comments
Assignees

Comments

@dscho
Copy link
Member

dscho commented Jan 20, 2018

Cygwin (and hence MSYS2, which is a derivative) tries to emulate POSIX functionality on top of the Win32 API. When spawning child processes, this means that fork() needs to be emulated, which is hard, and requires the MSYS2 runtime's address range to be pinned. This leads to many a problem on many, many setups sometimes even only after unrelated software is upgraded!

Git for Windows uses the MSYS2 runtime essentially for two things: SSH and Unix shell/Perl scripting.

In MinGit, we already do not include any Perl scripts. But plenty of Unix shell scripts. Ideally, those would be converted into "builtins", i.e. pure, portable, performant C, which would make everything quite a bit more robust, not to mention fast. Sadly, this is no priority of core Git's developers/maintainer and it is a lot of work.

To side-step this, we put in quite an effort last year to ship a "BusyBox-based" variant of MinGit. BusyBox is an executable that offers minimal versions of many Unix tools, such as a Unix shell, sed, awk, etc, in a single binary (much like git.exe includes many subcommands as "builtins"), and there exists a pure Win32 version of BusyBox that we helped along until it could run Git's Unix shell scripts and test suite.

It is time to get this BusyBox-based MinGit to a point where it is robust enough to be the default MinGit.

@dscho dscho self-assigned this Jan 20, 2018
@shiftkey
Copy link

@dscho I'm interested in this, especially to reduce/avoid the dependency on MSYS2/Cygwin. How can others help?

@dscho
Copy link
Member Author

dscho commented Jan 22, 2018

How can others help?

Mostly by testing in as close to production as you dare ;-)

I vaguely remember that there have been some inexplicable hangs when I tried to run the test suite (I used GIT_TEST_ARGS=--quiet GIT_TEST_INSTALLED=/path/to/mingw64/bin busybox xargs -P15 ./test-[0-9]*.sh after copying the test artifacts (which are now bundled as Pacman package, too) into a clone of git-for-windows/git). That's the main part I want to address in the near future.

@shiftkey
Copy link

shiftkey commented Jan 22, 2018

Mostly by testing in as close to production as you dare ;-)

@dscho
Copy link
Member Author

dscho commented Feb 13, 2018

Hey everyone it been hard to get anything out of this

I am sorry to hear that. But I also have to say that I have a hard time understanding what exactly the problem is. Care to explain in any sort of detail?

@mingwandroid
Copy link

mingwandroid commented Apr 11, 2018

The Anaconda Distribution has been trying to use busybox Git for Windows and ran into a problem with git submodule hanging:

C:\gfw\cmd\git.exe submodule update --init --recursive
10:53:00.229338 git.c:576               trace: exec: git-submodule update --init --recursive
10:53:00.229338 run-command.c:640       trace: run_command: git-submodule update --init --recursive

I added set -x and a few echo's to C:\gfw\mingw64\libexec\git-core\git-submodule to see what I could see and I saw two problems:

  1. It uses basename and that doesn't seem to be backslash happy:
+ basename 'C:\gfw\mingw64/libexec/git-core\git-submodule'
+ echo 'git-core\git-submodule'
git-core\git-submodule
+ basename 'C:\gfw\mingw64/libexec/git-core\git-submodule'
  1. It hangs at the call to sed. It seems that another busybox got spawned ok and is pegging one of my CPU threads but I guess they're not chatting to each other?
+ sed -e 's/-/ /'

@mingwandroid
Copy link

I see a commandline of sh --forkshell 00000000000002CC for the last spawned, busy busybox.

@mingwandroid
Copy link

mingwandroid commented Apr 11, 2018

Switching the busybox.exe in the latest release to this one allows this to work (or at least to not hang).

@dscho
Copy link
Member Author

dscho commented Aug 25, 2018

Sorry for the late reply. I am still struggling to find any time for any serious work on BusyBox. This comment makes me believe that the hang is caused by the patches I introduced on top of BusyBox-w32, and that's what most likely also causes the so far unexplained hangs in the test suite I observed.

@shiftkey
Copy link

@dscho is there a particular build we should be helping with testing if this has accidentally resolved itself, or should we hold off for the moment?

@mingwandroid
Copy link

No problem at all @dscho there's only so many hours in a day.

@dscho
Copy link
Member Author

dscho commented Aug 27, 2018

is there a particular build we should be helping with testing if this has accidentally resolved itself, or should we hold off for the moment?

@shiftkey Sadly, there is no chance of this being resolved accidentally...

@mingwandroid thanks for understanding!

@leonyu
Copy link

leonyu commented Sep 14, 2018

I actually found busybox mingit to be more usable by end-user than regular mingit, as you can simply run busybox sh to get a working shell, while bash from regular mingit is basically unusable.

@shiftkey
Copy link

while bash from regular mingit is basically unusable.

I don't believe bash.exe ships in the vanilla MinGit environment. sh.exe does, and if you've seen issues with that I'd love to hear more.

@leonyu
Copy link

leonyu commented Sep 15, 2018

it has bash as sh. At least that's what --version tells me.

However, it is not usable as an interactive shell, cause there is no readline, and and the PATH is not automatically set to include /usr/bin, so it doesn't resolve those unix commands by default, like busybox does. (Or am i missing something)


Update, path is set if I call sh.exe --login. However readline is still pretty broken in.

In addition, mingit busybox has broken Unicode support (probably expected given it took a few versions to get unicode support in MSYS).

@dscho
Copy link
Member Author

dscho commented Sep 28, 2018

MinGit is not intended to be used interactively, skipping a ton of parts in the quest to minimize the footprint for applications which want to ship with Git. Any interactive functionality you use in MinGit (BusyBox variant or not) might go away at any stage, without prior warning (apart from this here stern one).

@drizzd
Copy link

drizzd commented Apr 13, 2019

I am increasingly annoyed by the fact that Git-bash behavior is just so special that commands which are intended to work both in cmd.exe and in a real POSIX shell, but they always need special hanling in Git-bash. For example, I frequently use winpty and MSYS2_ARG_CONV_EXCL='*' to run simple commands like winpty docker run -it ubuntu ls //bin. And when I manage to do this some commands crash anyway or produce strange output because somehow the pseudo terminal emulation works differently.

As far as I am concerned, I don't need the full power of bash on Windows, or the path conversion, or the terminal emulation. I only want to native Windows applications in a native Windows terminal. On the other hand, I just can't get used to cmd.exe or PowerShell because I am so addicted to basic command line editing shortkeys like ctrl-p, ctrl-n, ctrl-a, ctrl-e, alt-b, alt-f, alt-w, alt-d, and maybe also ctrl-z and ctrl-r.

I think it would be huge if we can rid Git-for-Windows of the complexity induced by the MSYS2/Cygwin emulation layer (both runtime and terminal emulation) and replace it with Windows native code. Why do we consider BusyBox only for MinGit, but not as a replacement for MSYS2/Cygwin/Git-bash? Ignoring backwards compatibility for now, what would be the minimum requirements for such a replacement?

Do we need more than the following:

  • posix shell script support
  • posix shell command line
  • readline support (for ctrl-a, ctrl-u commands etc.)
  • command completion
  • pager (I am perfectly fine with less, but most people barely manage exiting from less, let alone navigate in it or make case-insensitive searches)

@dscho
Copy link
Member Author

dscho commented Apr 14, 2019

Note that BusyBox comes with its own less and its own readline lookalike. Having said that, we are currently very reliant on MSYS2. Off the top of my head:

  • Perl. git svn and git send-email still require Perl. (So does git add -i, but I already have a version running locally that addresses that, see PRs git add -i: add a rudimentary version in C (supporting only status and help so far) gitgitgadget/git#170-built-in add -p: add support for the same config settings as the Perl version gitgitgadget/git#175 for details).

  • OpenSSH. The native OpenSSH is getting there, but it is Windows 10 only, and it still has some kinks in some corner cases (and with a user base of over 3 million, a single maintainer could easily be overwhelmed if even only as many as 0.01% hit those corner cases and report those bugs and demand them to be fixed).

  • We are relying on MinTTY to provide a better terminal window, at least on older Windows versions (traditionally, prior to Windows 10, the CMD window is seriously limited, compared to what one is used from Linux and even to a certain extent from macOS). BusyBox still needs to learn about MinTTY's pseudo terminals, so that it correctly detects that it is running in a terminal.

  • Quite likely a lot of other things that I can't think of right now.

Also, please note that BusyBox supports Ctrl+R, but not Ctrl+Z.

Further, BusyBox-w32 has troublesome performance issues, at least currently. In theory it should be a lot faster to execute shell scripts using BusyBox' ash than using MSYS2's Bash. In practice, it seems to be the opposite. My guess is that the way the forkshell emulation is implemented is suboptimal, and leaves a lot of room for improvement.

Finally, let's not forget that Git advertises scripting as the way to make things work. Hooks are strongly expected to be shell scripts. And those shell scripts are definitely outside of the control of Git's own source code, so we would quite likely break power users' scripts by simply switching to BusyBox, as most of its commands/options are noticeably limited compared to the full commands.

So I think that the best we can do is to offer an opt-in to BusyBox. After making it work. Robustly so.

@drizzd
Copy link

drizzd commented Feb 8, 2020

FWIW, I am now happy with Clink, which effectively adds readline capabilities to cmd.exe.

@ur4t
Copy link

ur4t commented Jun 22, 2021

There must be a compatible /usr/bin/sh.exe for ssh to work correctly. Details: #3285.

@ur4t
Copy link

ur4t commented Jun 22, 2021

Diffutils shipped with mingit-busybox $GIT_INSTALL_ROOT/usr/bin/{cmp,diff,diff3}.exe needs libiconv msys-iconv-2.dll (1000KB) and libintl msys-intl-8.dll(43KB) to work properly.
Nothing misfunctions in my use cases, and busybox provides cmp and diff. I think these three executable can be excluded.

@ur4t
Copy link

ur4t commented Jun 22, 2021

Some helper scripts like git-difftool--helper works with mingit-busybox and mingit, which is important for git difftool to work properly.

@dscho
Copy link
Member Author

dscho commented Jun 22, 2021

There must be a compatible /usr/bin/sh.exe for ssh to work correctly. Details: #3285.

I hope that we will get a chance to resolve this more elegantly, e.g. by some magic /etc/ssh/config setting that hopefully exists to that end.

Diffutils shipped with mingit-busybox $GIT_INSTALL_ROOT/usr/bin/{cmp,diff,diff3}.exe needs libiconv msys-iconv-2.dll (1000KB) and libintl msys-intl-8.dll(43KB) to work properly.
Nothing misfunctions in my use cases, and busybox provides cmp and diff. I think these three executable can be excluded.

Right!

We do have a script to check for missing .dll files, and an Azure Pipeline that runs it regularly to verify that Git for Windows and MinGit do not ship with .exe files with broken links to .dll files. I think the two tasks here are:

  • open a PR to adjust make-file-list.sh as you suggested, and then
  • adjust the Azure Pipeline (or even better: port the Pipeline to a GitHub workflow that runs in git-sdk-32/git-sdk-64.

Porting the Pipeline should be relatively easy: just use the setup-git-for-windows-sdk Action and then execute this shell script snippet:

  printf 'Checking full file list\n\n' >&2
  ./check-for-missing-dlls.sh | tee full.txt
  printf '\n\nChecking MinGit file list\n\n' >&2
  MINIMAL_GIT=1 ./check-for-missing-dlls.sh | tee min.txt
  ! grep ' is missing ' full.txt min.txt
  ! grep -i 'unused ' min.txt

Some helper scripts like git-difftool--helper works with mingit-busybox and mingit, which is important for git difftool to work properly.

I am not that sure about git difftool... MinGit is intended to be used by applications, not directly by humans. And difftool strikes me as a particularly human-oriented Git command, i.e. it is not low-level enough to truly fit into MinGit's mission.

@ur4t
Copy link

ur4t commented Jun 23, 2021

I've find two large executable with the same content. Can one of them be a wrapper to another?

~/mingit-busybox/mingw64/bin $ git --version --build-options
git version 2.32.0.windows.1
cpu: x86_64
built from commit: 4c204998d0e156d13d81abe1d1963051b1418fc0
sizeof-long: 4
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
~/mingit-busybox/mingw64/bin $ ls -al git-remote-http*
-rwxrwxr-x    1 ur4t     ur4t       2092048 Jun 07 20:27 git-remote-http.exe
-rwxrwxr-x    1 ur4t     ur4t       2092048 Jun 07 20:27 git-remote-https.exe
~/mingit-busybox/mingw64/bin $ sha512sum git-remote-http*
7eaf2ad99a1cc9f0d95286ebedea11bd700249b5618529468bf8d65dfeeccde0  git-remote-http.exe
7eaf2ad99a1cc9f0d95286ebedea11bd700249b5618529468bf8d65dfeeccde0  git-remote-https.exe

@dscho
Copy link
Member Author

dscho commented Jun 23, 2021

I've find two large executable with the same content. Can one of them be a wrapper to another?

We would have to teach the git-wrapper.c that trick (most likely to redirect from git-remote-http.exe to git-remote-https.exe, so as to not punish the more common case, https://), but yes it would be totally doable, and it would save on .zip size. Good find!

@ur4t
Copy link

ur4t commented Jun 24, 2021

In mingit-busybox, msys executable inside /usr/bin/ belongs to these packages:
diffutils, libfido2, msys2-runtime, openssh, tcl, rebase.

  • diffutils is completely broken, which can be removed.
  • openssh is called via ssh executable, which means it can be replaced with standalone openssh builds, such as Win32-OpenSSH, while libfido2 is supplemental for openssh.
  • rebase is used for adjusting PE file to correctly fork(), which should be done in building process. msys2-runtime, nothing special to talk about.

So here comes a crazy idea: completely remove msys part, provide Win32-Openssh or even ask users to download it themselves (sinve 1809, Windows10 bundles it).

The only keypoint is that how many software depend on msys tcl.

@dscho
Copy link
Member Author

dscho commented Jun 24, 2021

The only keypoint is that how many software depend on msys tcl.

That looks like an oversight on my part, too. There are parts of Git that depend on Tcl, but they also depend on Tk because they are gitk and Git GUI, both of which are inappropriate for bundling inside MinGit.

diffutils is completely broken, which can be removed.

Good point.

So here comes a crazy idea: completely remove msys part, provide Win32-Openssh or even ask users to download it themselves (sinve 1809, Windows10 bundles it).

That might be an option if we did not support Windows Vista, still.

But I give you this: it is tempting an idea. And I have thought about it a lot. But for that, we definitely need to work a lot harder on BusyBox-w32. We do not have any current build of mingw-w64-busybox, for starters, let alone any proper CI testing that runs Git's test suite.

dscho added a commit to dscho/build-extra that referenced this issue Jun 30, 2021
In msys2/MSYS2-packages@325eedfc14 (a rather
monster-type of a huge change), the build-time dependency `tcl` was
upgraded to a full-scale runtime dependency of all SQLite packages,
including `libsqlite`.

This is most likely a bug (which was easy to miss, given the amount of
changes accumulated in that big patch), as `tcl` is still marked as a
runtime dependency of the `tcl-sqlite` package (which is still correct).

But `tcl` is not required to use `libsqlite`, which `openssh` does, and
therefore we now have this useless baggage we have to shlep around.

Let's exclude it manually.

Noticed in
git-for-windows/git#1439 (comment)

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho
Copy link
Member Author

dscho commented Jul 2, 2021

I believe that I addressed the tcl, diffutils and git-remote-http.exe parts.

@dscho
Copy link
Member Author

dscho commented Sep 19, 2022

After a looong hiatus, I upgraded to the latest BusyBox-w32 tag, integrated a CI build, and the Pipeline to build the mingw-w64-busybox package is currently running.

@ur4t
Copy link

ur4t commented Jan 19, 2023

So here comes a crazy idea: completely remove msys part, provide Win32-Openssh or even ask users to download it themselves (sinve 1809, Windows10 bundles it).

That might be an option if we did not support Windows Vista, still.

As announced in https://github.com/git-for-windows/git/releases/tag/v2.36.1.windows.1, we can make it true now.

Git for Windows will also stop supporting Windows Vista soon after Git for Windows 2.36.0 is released. Around the beginning of 2023, Git for Windows will drop support for Windows 7 and for Windows 8, following Cygwin's and MSYS2's lead (Git for Windows relies on MSYS2 for components such as Bash and Perl).

@dscho
Copy link
Member Author

dscho commented Jan 19, 2023

So here comes a crazy idea: completely remove msys part, provide Win32-Openssh or even ask users to download it themselves (sinve 1809, Windows10 bundles it).

That might be an option if we did not support Windows Vista, still.

As announced in https://github.com/git-for-windows/git/releases/tag/v2.36.1.windows.1, we can make it true now.

I am not so sure about that.

@dscho
Copy link
Member Author

dscho commented Oct 18, 2024

It would seem that I never have enough time to push this through, and there are still substantial hurdles:

  • When running tests via BusyBox, some tests seem to be slower than with MSYS2's Bash. My recollection is unfortunately vague by this point, but it seemed that there was too much work going on when fork()ing subshells.
  • I have doubts about the way BusyBox-w32 is maintained, e.g. their refusal to use CI builds i.e. removing a simple and straightforward way to raise confidence in the correctness of the code.
  • Even if the above issues would be resolved, there is still the issue that BusyBox is more limited in functionality than MSYS2's Bash and coreutils, and therefore Git for Windows will always have to provide the latter to users. As a result, I will never be able to focus as much on improving BusyBox as I would like to.

Nevertheless, I think it would be valuable to enhance the current state by:

  • shipping mingw64\bin\ash.exe instead of mingw64\bin\busybox.exe (same executable, just different name)
  • telling Git explicitly (instead of running busybox --help) that it is supposed to use BusyBox. This could be accomplished e.g. by
    • forcing ash to be used instead of sh in the BusyBox-enabled MinGit
    • introducing a core.shell or some such
  • optionally let regular Git for Windows users opt into using BusyBox to run the Git commands that are still implemented as Unix shell scripts rather than in portable C, and aliases, and hooks, and whatever Git functionality that (ab-)uses the Unix shell to parse & execute a command-line.
  • work on running the CI tests using BusyBox
  • work on improving BusyBox' performance by identifying performance bottlenecks and resolving them (or working around them).
  • consider other projects than BusyBox (such as this one).

In short: There is a lot to do, and the more hands help, the quicker it will get done.

@avih
Copy link

avih commented Jan 23, 2025

This is mostly an old discussion, but I wanted to share maybe some new info about the issues raised here.

As background, I use both (pure) cygwin and busybox-w32 as daily-driver in interactive environments, regardless of git, and I do contribute to bb-w32 occasionally.

As for git, I use the minigit busybox package, but with own bb-w32 build for about 2 years and didn't notice issues, though that's only an anecdote.

The goal is to hopefully make it clearer what still stands in the way and what not when trying to use bb-w32 instead of msys2 and its utilities - to the best of my knowledge. This is not an attempt to convince anyone that busybox-w32 is the way to go.

I'm also speaking exclusively about bb-w32 and msys2/cygwin, and not comparing it to alternative solutions or shells (which may indeed be better).

First, perl obviously needs another solution outside of busybox as long as it's required.

I also agree that bb-w32 CI would be good to have.

As for bb-w32 issues raised above, hopefully in order of appearance:

  1. xargs issue: there was a bug in xargs where it could produce too long list of args (which was cropped by the OS), fixed recently.
  2. basename with backslashes: this is still the case today, because AFAIK the backslash support is mainly when executing commands, but not in specific utilities (and IIRC also not in sh globs). But it could be reasonably replaced with a tiny sh function IMO. Might also be worth examining where the backslashes come from, because in bb-w32 \ is converted to / in $PWD and most env vars (of course, it's also possible to invoke git from cmd.exe with with backslashes in arguments, and bb-w32 doesn't modify arguments - like cygwin, but unike msys2).
  3. sed -e 's/-/ /' hangs: worth reporting if this is reproducible, but it didn't hang for me with this expression (as noted, this is my own bb-w32 built which is reasonably close to upstream - not gfw-bb-w32).
  4. unicode: supported in upstream for about 2 years now (with a utf8 manifest - get the u version), including interactively. Also supported natively in gfw-bb-w32 (not manifest, but also not interactively).
  5. openssh is win10: shouldn't be an issue, because so it gfw.
  6. mintty or good terminal emulation: I think win10 console has very decent terminal emulation. I'd guess that it should cover git's needs. bb-w32 also has reasonably useful emulation for output from its own sh and utilities (as fallback if the console doesn't support VT mode), but not for output from external utilities - like git.
  7. bb-w32 not supporting ^Z: this has two use cases:
    • As EOF for terminal KB input (e.g. read or cat > foo, etc): ^Z works.
    • For interrupting the foreground process and maybe moving it to the BG - unsupported AFAIK (but foo & does work).
  8. tcl and gitk/git-gui: I believe these are already pure migw in gfw, and if they're not - they can be. I use my own tcl/tk mingw build, and it runs gitk fine AFAICT - which I use a lot. It runs git-gui too but I don't really use it to be able to identify issues, but it seems normal.

Another subject which I think was not mentioned is long file names: I believe gfw-bb-w32 does increase the possible path lengths at least in some cases, IIRC to 4K or some such. Upstream bb-w32 doesn't have it. I believe it uses PATH_MAX or some such with some file buffers, which is ~ 260 (if it's a long argument then it remains long, but IIRC code which reads a path via win32 api only uses PATH_MAX buffer).

As for being slower than bash/msys2-utils, it's definitely possible, but it also highly depends on the use case. Generally speaking, from my own experience with interactive cygwin and bb-w32 daily, I've feel that bb-w32 scripts/utils perform similar or better than cygwin/msys2. Scripting at least is generally faster than bash. Subshells performance might differ (I didn't compare those), but any script which somewhat cares about performance should try reasonably to avoid them unless necessary - same goes on BSD and even on linux (which is known for its blazing fast exec/fork performance, and subshells still matter a lot in scripts, but orders of magnitude less than on windows).

If any specific (sh) script is known to perform slower than desired, I don't mind having a look if someone points it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants
@drizzd @dscho @leonyu @shiftkey @mingwandroid @avih @ur4t and others