Skip to content

Built-in echo behavior varies with different unicode notations of same character #1986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
am11 opened this issue Dec 16, 2018 · 4 comments
Closed
1 task done

Comments

@am11
Copy link

am11 commented Dec 16, 2018

  • I was not able to find an open or closed issue matching what I'm seeing

Setup

  • Which version of Git for Windows are you using? Is it 32-bit or 64-bit?
    v2.16.1(2)
$ git --version --build-options

git version 2.16.1.windows.2
cpu: x86_64
built from commit: e78e3c8ee9c219723d60aa1bccd8348c2269b9ba
sizeof-long: 4
  • Which version of Windows are you running? Vista, 7, 8, 10? Is it 32-bit or 64-bit?
:: get OS version
C:\> ver

Microsoft Windows [Version 10.0.18298.1000]
  • What options did you set as part of the installation? Or did you choose the
    defaults?
$ cat /etc/install-options.txt
Editor Option: Nano
Path Option: Cmd
SSH Option: OpenSSH
CURL Option: OpenSSL
CRLF Option: CRLFAlways
Bash Terminal Option: MinTTY
Performance Tweaks FSCache: Enabled
Use Credential Manager: Enabled
Enable Symlinks: Disabled
  • Any other interesting things about your environment that might be related
    to the issue you're seeing?

I am using latest insider build of Windows 10 (last updatd 15.12.2018).

Details

  • Which terminal/shell are you running Git from? e.g Bash/CMD/PowerShell/other

Bash

  • What commands did you run to trigger this issue?
echo -e "\U2744"
echo -e "\U00002744"
echo -e "\xE2\x9D\x84\xEF\xB8\x8F"
echo -e "\0342\0235\0204\0357\0270\0217"

/bin/echo -e "\xE2\x9D\x84\xEF\xB8\x8F"
/bin/echo -e "\0342\0235\0204\0357\0270\0217"

Exception note: since /bin/echo and built-in (GNU) echo are different, /bin/echo.exe doesn't understand \U notations and echoes them literally even with escape interpretation -e (e.g. try /bin/echo -e "\U2744" then echo -e "\U2744").

  • What did you expect to occur after running these commands?

All of the commands to print full-width Unicode 1.1 Snowflake character (U+2744): ❄️

  • What actually happened instead?

Only \U notations display the full character, \x and \0 are displayed partially:

image

It seems like the built-in echo falls back on the behavior of /bin/echo.exe in case of hex \x and octal \0 notations.

Moreover, in ~/.bash_profile, I am using \U notation (which in my understanding /bin/echo.exe doesn't interpret at all), yet:

export PS1=$'\e[1;94m\\t\e[0m:\e[1;36m\U2744\e[0m \w \$ '

# note the use of \U2744, the rest of markers are colors and special notations:
# http://tldp.org/HOWTO/Bash-Prompt-HOWTO/bash-prompt-escape-sequences.html

results in rendering of partial snowflake:

image

Seems like there is a third printer implementation, that does interpret \U notation but does not adjust the width of character like built-in echo -e '\U2744'.

@dscho
Copy link
Member

dscho commented Dec 17, 2018

What happens if you pipe this into less?

I am not saying that this is a bug in Git for Windows, because it is not, it is a bug in some component we bundle. I could imagine that the bug might be in ncurses, mintty, bash and/or echo. To route this bug to the appropriate bug tracker, we'll need to find out which component is at fault.

@rimrul
Copy link
Member

rimrul commented Dec 17, 2018

I can kinda reproduce this on 2.20.1. I get this on Windows 8.1 and similar rendering on Windows 7.:

gfw1986

What confused me a little though: How did you get from 2744 to E29D84EFB88F? A liitle digging [1][2] showed me the that the little snowflake should be UTF-8 encoded as E29D84.

The resulting echo -e "\xE2\x9D\x84" and /bin/echo -e "\xE2\x9D\x84" both show a proper snowflake.

Piping to less does not change the rendering for me. My console font is set to Lucida Console.

gfw1986-2

Manually setting the bash prompt with export PS1=$'\e[1;94m\\t\e[0m:\e[1;36m\U2744\e[0m \w \$ ' produces a full snowflake for me, though. configuring it in ~/.bash_profile has the same effect.

[1] https://stackoverflow.com/questions/602912/how-do-you-echo-a-4-digit-unicode-character-in-bash
[2] https://www.fileformat.info/info/unicode/char/2744/index.htm

@am11
Copy link
Author

am11 commented Dec 17, 2018

Piping to less didn't change the rendering for me either. My font is also Lucida Console, 9pt.

While echo -e "\xE2\x9D\x84" render the full snowflake 100% of the times, /bin/echo -e "\xE2\x9D\x84" intermittently does it right:
image


What confused me a little though: How did you get from 2744 to E29D84EFB88F?

Correct, just realized that I was using https://r12a.github.io/app-conversion/ tool in Firefox on Windows 10 and copied emoji character from https://emojipedia.org/snowflake/. Pasting in converter's textarea results in colorful image ❄️; text with variation selector-16 codepoint (U+FE0F). Without it looks like , U+2744.

@dscho
Copy link
Member

dscho commented Feb 26, 2019

Okay, I think I start to understand what is going on there. The \u notation definitely refers to Unicode characters, and therefore is aware of the width of Unicode characters. The \x and \0 notation refers not to Unicode characters, but to bytes, and is totally unaware of encodings (and therefore of the fact that Unicode characters can have different widths).

However, for some reason the PS1 handling seems to be unaware of Unicode widths, and it would probably need to be patched into Bash's source code, something that is safely outside my duties as Git for Windows maintainer.

But maybe you want to give it a try, @am11?

In the meantime, you can use the fact that a simple space has the same width as half the snowflake: use

export PS1=$'\e[1;94m\\t\e[0m:\e[1;36m\u2744 \e[0m \w \$ '

instead (i.e. append a space after \u2744).

@dscho dscho closed this as completed Feb 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants