-
Notifications
You must be signed in to change notification settings - Fork 140
Let's use the Win32 API more precisely #147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously, we would have obtained the user name encoded in whatever the current code page is. Note: the "user name" here does not denote the full name but instead the short logon name. Signed-off-by: Johannes Schindelin <[email protected]>
3160c9f
to
aae4449
Compare
Many Win32 API functions actually exist in two variants: one with the `A` suffix that takes ANSI parameters (`char *` or `const char *`) and one with the `W` suffix that takes Unicode parameters (`wchar_t *` or `const wchar_t *`). The ANSI variant assumes that the strings are encoded according to whatever is the current locale. This is not what Git wants to use on Windows: we assume that `char *` variables point to strings encoded in UTF-8. There is a pseudo UTF-8 locale on Windows, but it does not work as one might expect. In addition, if we overrode the user's locale, that would modify the behavior of programs spawned by Git (such as editors, difftools, etc), therefore we cannot use that pseudo locale. Further, it is actually highly encouraged to use the Unicode versions instead of the ANSI versions, so let's do precisely that. Note: when calling the Win32 API functions _without_ any suffix, it depends whether the `UNICODE` constant is defined before the relevant headers are #include'd. Without that constant, the ANSI variants are used. Let's be explicit and avoid that ambiguity. Signed-off-by: Johannes Schindelin <[email protected]>
/submit |
Submitted as [email protected] |
@@ -1407,7 +1407,7 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **deltaen | |||
do_unset_environment_variables(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Eric Sunshine wrote (reply to this):
On Thu, Jun 27, 2019 at 5:37 AM Johannes Schindelin via GitGitGadget
<[email protected]> wrote:
> Many Win32 API functions actually exist in two variants: one with
> the `A` suffix that takes ANSI parameters (`char *` or `const char *`)
> and one with the `W` suffix that takes Unicode parameters (`wchar_t *`
> or `const wchar_t *`).
>
> The ANSI variant assumes that the strings are encoded according to
> whatever is the current locale. This is not what Git wants to use on
> Windows: we assume that `char *` variables point to strings encoded in
> UTF-8.
>
> There is a pseudo UTF-8 locale on Windows, but it does not work
> as one might expect.
What does "does not work as one might expect" mean? The reader is left
hanging, not knowing why or how the UTF-8 locale on Windows is
undesirable.
> In addition, if we overrode the user's locale, that
> would modify the behavior of programs spawned by Git (such as editors,
> difftools, etc), therefore we cannot use that pseudo locale.
>
> Further, it is actually highly encouraged to use the Unicode versions
> instead of the ANSI versions, so let's do precisely that.
>
> Note: when calling the Win32 API functions _without_ any suffix, it
> depends whether the `UNICODE` constant is defined before the relevant
> headers are #include'd. Without that constant, the ANSI variants are
> used. Let's be explicit and avoid that ambiguity.
>
> Signed-off-by: Johannes Schindelin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Eric,
On Thu, 27 Jun 2019, Eric Sunshine wrote:
> On Thu, Jun 27, 2019 at 5:37 AM Johannes Schindelin via GitGitGadget
> <[email protected]> wrote:
> > Many Win32 API functions actually exist in two variants: one with
> > the `A` suffix that takes ANSI parameters (`char *` or `const char *`)
> > and one with the `W` suffix that takes Unicode parameters (`wchar_t *`
> > or `const wchar_t *`).
> >
> > The ANSI variant assumes that the strings are encoded according to
> > whatever is the current locale. This is not what Git wants to use on
> > Windows: we assume that `char *` variables point to strings encoded in
> > UTF-8.
> >
> > There is a pseudo UTF-8 locale on Windows, but it does not work
> > as one might expect.
>
> What does "does not work as one might expect" mean? The reader is left
> hanging, not knowing why or how the UTF-8 locale on Windows is
> undesirable.
Should I really bore the reader with half-details? At least one of those
behaviors was reported as an unbootable Windows 7, but some others could
not reproduce.
I really did not want to write a novel here that is 1) largely irrelevant
and 2) would take way too long to write because I forgot most of the
details and would have to look them up again.
Hopefully we can do without the full story here, as even one report that
it does not work as expected is enough to make this an unfeasible option
for us, and that's that as far as I am concerned.
Ciao,
Dscho
>
> > In addition, if we overrode the user's locale, that
> > would modify the behavior of programs spawned by Git (such as editors,
> > difftools, etc), therefore we cannot use that pseudo locale.
> >
> > Further, it is actually highly encouraged to use the Unicode versions
> > instead of the ANSI versions, so let's do precisely that.
> >
> > Note: when calling the Win32 API functions _without_ any suffix, it
> > depends whether the `UNICODE` constant is defined before the relevant
> > headers are #include'd. Without that constant, the ANSI variants are
> > used. Let's be explicit and avoid that ambiguity.
> >
> > Signed-off-by: Johannes Schindelin <[email protected]>
>
This branch is now known as |
This patch series was integrated into pu via git@3f0b1d7. |
This patch series was integrated into pu via git@10d7f1f. |
This patch series was integrated into pu via git@4893d96. |
This patch series was integrated into pu via git@1524073. |
This patch series was integrated into pu via git@c412472. |
This patch series was integrated into pu via git@355ba94. |
This patch series was integrated into pu via git@a51bff8. |
This patch series was integrated into next via git@f528daf. |
This patch series was integrated into pu via git@d8ba2de. |
This patch series was integrated into pu via git@463742d. |
This patch series was integrated into pu via git@dc94250. |
This patch series was integrated into pu via git@0a2ff7c. |
This patch series was integrated into master via git@0a2ff7c. |
Closed via 0a2ff7c. |
For many Win32 functions, there actually exist two variants: one that takes
const char *
("ANSI", meaning the current code page) andwchar_t *
("Unicode", i.e. UTF-16, at least for all practical matters).These functions have "A" and "W" suffixes, respectively, e.g.
GetFileAttributesW()
. The symbols without this suffix are#define
d to the*W()
versions if the constantUNICODE
is defined before including the Windows headers, and to*A()
otherwise.Let's not rely on this constant, but explicitly say what we want: we want the Unicode versions, as they seem to be used by the ANSI flavor anyway.