-
Notifications
You must be signed in to change notification settings - Fork 140
gettext(windows): always use UTF-8 #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/submit |
Submitted as [email protected] |
This branch is now known as |
This patch series was integrated into pu via git@39559f5. |
This patch series was integrated into pu via git@0adb53a. |
This patch series was integrated into pu via git@8087c10. |
This patch series was integrated into pu via git@d784b2f. |
This patch series was integrated into pu via git@52c0c27. |
This patch series was integrated into pu via git@003c186. |
This patch series was integrated into pu via git@5d18a85. |
ff37a26
to
43f1fd6
Compare
On native Windows, Git exclusively uses UTF-8 for console output (both with MinTTY and native Win32 Console). Gettext uses `setlocale()` to determine the output encoding for translated text, however, MSVCRT's `setlocale()` does not support UTF-8. As a result, translated text is encoded in system encoding (as per `GetAPC()`), and non-ASCII chars are mangled in console output. Side note: There is actually a code page for UTF-8: 65001. In practice, it does not work as expected at least on Windows 7, though, so we cannot use it in Git. Besides, if we overrode the code page, any process spawned from Git would inherit that code page (as opposed to the code page configured for the current user), which would quite possibly break e.g. diff or merge helpers. So we really cannot override the code page. In `init_gettext_charset()`, Git calls gettext's `bind_textdomain_codeset()` with the character set obtained via `locale_charset()`; Let's override that latter function to force the encoding to UTF-8 on native Windows. In Git for Windows' SDK, there is a `libcharset.h` and therefore we define `HAVE_LIBCHARSET_H` in the MINGW-specific section in `config.mak.uname`, therefore we need to add the override before that conditionally-compiled code block. Rather than simply defining `locale_charset()` to return the string `"UTF-8"`, though, we are careful not to break `LC_ALL=C`: the `ab/no-kwset` patch series, for example, needs to have a way to prevent Git from expecting UTF-8-encoded input. Signed-off-by: Karsten Blees <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
43f1fd6
to
2d2253f
Compare
/submit |
Submitted as [email protected] |
This patch series was integrated into pu via git@5adbd93. |
This already made it into v2.23.0, as gitster@090d1e8. |
The main issue we work around here is that Windows does not have a UTF-8 "code page".
Side note: there is actually a code page for UTF-8: 65001 (see https://docs.microsoft.com/en-us/windows/desktop/Intl/code-page-identifiers). However, when experimenting with it, we ran into a multitude of issues in the Git for Windows project, ranging from various problems with Windows' default console to miscounted file writes. While these issues may have been mitigated in recent Windows 10 versions, older ones (in particular, Windows 7) still seem to have most of them, and Git for Windows specifically still supports even Windows Vista. So from a practical point of view, there is no UTF-8 code page.
Changes since v1:
LC_ALL=C
method used byab/no-kwset
to prevent Git from assuming UTF-8-encoded input is now supported.