-
Notifications
You must be signed in to change notification settings - Fork 5k
String.ToLower uses Turkish casing rules with en-US-POSIX #4894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In both cases, what do you see if you output:
? The lower-casing being done in the False case is Turkish; I'm guessing that your environment in the two cases is causing you to have different cultures set up. You can see the Turkish casing with code like: using System;
using System.Globalization;
class Program
{
static void Main(string[] args)
{
char I = 'I';
Console.WriteLine("0x{0:X2}", (int)I);
CultureInfo.CurrentCulture = new CultureInfo("tr-TR");
Console.WriteLine("0x{0:X2}", (int)char.ToLower(I));
CultureInfo.CurrentCulture = new CultureInfo("en-US");
Console.WriteLine("0x{0:X2}", (int)char.ToLower(I));
}
} which outputs:
|
That's interesting. I didn't do anything special about setting the a special environment.
Even though ssh is en-US-POSIX, the "i" shouldn't be the Turkish "i". |
@stephentoub we are seeing the same issue in @aspnet testing. On OSX, somehow the culture is set to "en-US-POSIX" and this causes |
@natemcmaster, what the culture gets set to is based on the environment variables you have set in your environment, e.g. LC_ALL, LANG, LC_COLLATE, etc. That's by design. Check your environment variables (e.g. However, that en-US-POSIX is using Turkish casing rules is a bug. @ellismg, before I look into it more deeply, any idea why we're setting m_needsTurkishCasing to true for en-US-POSIX? e.g. on Ubuntu this: using System;
using System.Reflection;
using System.Globalization;
class Program
{
static void Main()
{
var cultures = new[] {
new CultureInfo("en-US"),
new CultureInfo("fr-FR"),
new CultureInfo("tr-TR"),
new CultureInfo("blah"),
new CultureInfo("blah-BLAH"),
new CultureInfo("blah-BLAH-BLAH"),
new CultureInfo("en-US-blah"),
new CultureInfo("en-US-POSIX"),
new CultureInfo("zz-ZZ-POSIX")
};
var f = typeof(TextInfo).GetTypeInfo().GetDeclaredField("m_needsTurkishCasing");
foreach (var c in cultures)
{
Console.WriteLine($"Culture: {c}\tTurkish: {f.GetValue(c.TextInfo)}");
}
}
} outputs this:
|
@stephentoub yup, you were right. Our locale is set incorrectly on build agents.
|
I have a hunch: The source does the following: private bool NeedsTurkishCasing(string localeName)
{
Contract.Assert(localeName != null);
return CultureInfo.GetCultureInfo(localeName).CompareInfo.Compare("i", "I", CompareOptions.IgnoreCase) != 0;
} ICU Does tailoring of the en-US-POSIX locale and assigns different primary wights to 'i' and 'I'. You can see this in the ICU Collation Demo by looking at the raw collation elements. Different primary weights mean they different letters, not the same letter with a difference in casing (which is a secondary weight). I think that we should update the check to actually compare using some of the turkish characters instead of doing it the way we currently do. I will fix this for RC2. |
Previously, we were using a comparision between "i" and "I" to (while ignoring case) to figure out if we needed to do Turkish casing (on the assumption that locales which compared i and I as non equal when ignoring case were doing turkish casing). ICU Does tailoring of the en-US-POSIX locale and assigns different primary wights to 'i' and 'I'. You can see this in the ICU Collation Demo by looking at the raw collation elements. Different primary weights mean they different letters, not the same letter with a difference in casing (which is a tinary weight). This changes the check to compare using an actual Turkish i when doing our detection to not get confused by these cases. Fixes #2531
Previously, we were using a comparision between "i" and "I" to (while ignoring case) to figure out if we needed to do Turkish casing (on the assumption that locales which compared i and I as non equal when ignoring case were doing Turkish casing). ICU Does tailoring of the en-US-POSIX locale and assigns different primary wights to 'i' and 'I'. You can see this in the ICU Collation Demo by looking at the raw collation elements. Different primary weights mean they different letters, not the same letter with a difference in casing (which is a trinary weight). This changes the check to compare using an actual Turkish i when doing our detection to not get confused by these cases. Fixes #2531
Ensure that en-US-POSIX does not get Turkish casing behavior.
Previously, we were using a comparision between "i" and "I" (while ignoring case) to figure out if we needed to do Turkish casing (on the assumption that locales which compared i and I as non equal when ignoring case were doing Turkish casing). ICU Does tailoring of the en-US-POSIX locale and assigns different primary wights to 'i' and 'I'. You can see this in the ICU Collation Demo by looking at the raw collation elements. Different primary weights mean they different letters, not the same letter with a difference in casing (which is a trinary weight). This changes the check to compare using an actual Turkish i when doing our detection to not get confused by these cases. Fixes #2531
Add regression test for dotnet/coreclr#2531
Ensure that en-US-POSIX does not get Turkish casing behavior.
Hi Guys,
Here is the repro code:
Then you will see this result:
origin string:#IF
lower string:#ıf
"#ıf == #if" == False
But if you directly running in mac, then you will see the right out put
origin string:#IF
lower string:#if
"#if == #if" == true
The text was updated successfully, but these errors were encountered: