Skip to content

working-tree-encoding=UTF-16 checks out UTF-16BE #1995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
alegrigoriev opened this issue Dec 22, 2018 · 3 comments
Closed
1 task done

working-tree-encoding=UTF-16 checks out UTF-16BE #1995

alegrigoriev opened this issue Dec 22, 2018 · 3 comments

Comments

@alegrigoriev
Copy link

  • I was not able to find an open or closed issue matching what I'm seeing

Setup

  • Which version of Git for Windows are you using? Is it 32-bit or 64-bit?
    64 bit 2.19.1.windows.1
$ git --version --build-options

git version 2.19.1.windows.1
cpu: x86_64
built from commit: 11a3092e18f2201acd53e45aaa006f1601b6c02a
sizeof-long: 4
sizeof-size_t: 8
  • Which version of Windows are you running? Vista, 7, 8, 10? Is it 32-bit or 64-bit?
    Windows 10.1803.17134 x64
$ cmd.exe /c ver

Microsoft Windows [Version 10.0.17134.472]
  • What options did you set as part of the installation? Or did you choose the
    defaults?
    Default
# One of the following:
> type "C:\Program Files\Git\etc\install-options.txt"
> type "C:\Program Files (x86)\Git\etc\install-options.txt"
> type "%USERPROFILE%\AppData\Local\Programs\Git\etc\install-options.txt"
$ cat /etc/install-options.txt

Editor Option: Nano
Custom Editor Path:
Path Option: Cmd
SSH Option: OpenSSH
CURL Option: OpenSSL
CRLF Option: CRLFAlways
Bash Terminal Option: MinTTY
Performance Tweaks FSCache: Enabled
Use Credential Manager: Enabled
Enable Symlinks: Disabled
Enable Builtin Rebase: Disabled
Enable Builtin Stash: Disabled
  • Any other interesting things about your environment that might be related
    to the issue you're seeing?

Don't think so.

Details

  • Which terminal/shell are you running Git from? e.g Bash/CMD/PowerShell/other

Bash

Edit your .gitattributes file to assign "working-tree-encoding=UTF-16" attribute to some existing text file, and do a forced checkout of that file. Inspect the checked out file in a binary editor (for example, open as binary in Visual Studio).
  • What did you expect to occur after running these commands?

The file should be written as UTF-16LE with BOM.

  • What actually happened instead?

The file is written as UTF-16BE with BOM. This makes "working-tree-encoding" attribute pretty much useless, while it could potentially be very valuable to support UTF-16/UCS-2 files under Windows.

Not all tools under Windows understand UTF-16BE even with BOM. MSVC CRT doesn't. Visual Studio doesn't recognize those files as text (perhaps because it's using MSVC CRT to open them).

More information: The problem seems to be a general problem caused by libiconv devs decision to always produce UTF-16BE+BOM for UTF-16, without taking the BYTE_ORDER into account. iconv supplied with Git for Windows package exhibits same behavior. Existing precompild builds of ivonv/libiconv/libgettext for Windows (supplied by Michele Locati at https://mlocati.github.io/articles/gettext-iconv-windows.html) also exhibit same behavior.

BUT NEVERTHELESS, iconv installed with Centos 7.4 produces UTF-16LE+BOM, and Git 2.20 built at it from sources does that, as well. This means there may be a patch to force libiconv to the desired behavior of producing UTF-16LE on little-endian machines.

  • If the problem was occurring with a specific repository, can you provide the
    URL to that repository to help us with testing?

Not specific to a repository

@PhilipOakley
Copy link

There is some recent (2 Nov 2018 ->) discussion on the main Git list regarding BOMs and the like. https://public-inbox.org/git/CADN+U_PUfnYWb-wW6drRANv-ZaYBEk3gWHc7oJtxohA5Vc3NEg@mail.gmail.com/

Have a look through the thread and see if it matches up with the problem as you are seeing it.

@dscho
Copy link
Member

dscho commented Feb 26, 2019

Is this still an issue with the latest snapshot?

@dscho dscho closed this as completed Jan 1, 2020
@Mike4Online
Copy link

To add a UTF-16LE with BOM encoded text file via Git for Windows:

  1. Ensure a current version of Git for Windows is installed.
  2. In .gitattributes an entry corresponding to the UTF-16LE with BOM file needs to specify text working-tree-encoding=UTF-16LE-BOM. Ideally, the eol attribute would also be specified.
  3. The file's entry in .gitattributes needs to be specified prior to adding the file.
  4. The file needs to be added via a git bash shell (which can access the iconv.exe text conversion utility), or perhaps via a Git GUI.

If the file was committed prior to having the working-tree-encoding set properly, or was committed via a CMD shell, then Git's internal encoding of the file will be incorrect, leading to encoding errors which appear when you next clone the repository. The git add command supports a --renormalize option that can remedy this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants