-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Codec lookup failing under turkish locale #46138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
When switching to a turkish locale, the codecs registry fails on a codec This happens when the codec name contains an uppercase 'I'. What Replacing the tolower() call with this made the lookup work: int my_tolower(char c)
{
if ('A' <= c && c <= 'Z')
c += 32;
return c;
} PS: If the turkish locale is not supported, this here will enable it to a) sudo cp /usr/share/i18n/SUPPORTED /var/lib/locales/supported.d/local |
I can confirm this on SVN trunk on a Mandriva system. |
There is more to this bug than appears. I'm guessing that the name See this example: #!/usr/bin/python2.5 import locale print 'TR', locale.normalize('tr') print locale.setlocale(locale.LC_ALL, ('tr_TR', 'ISO8859-9')) # first issue, not quite the same coming out, as came in # and this fails First, the value returned from getlocale is ('tr_TR', 'so8859-9'), not |
The C library's tolower() and toupper() are used in a handful of source Modules/_sre.c: return ((ch) < 256 ? (unsigned int)tolower((ch)) : ch); |
Even if we don't fix all uses of (?to)(lower|upper) in the source tree, Perhaps also the str type should grow ascii_lower() and ascii_upper() |
I agree that it's a bit unfortunate that the 8-bit string APIs in Python .lower() and .upper() for 8-bit strings were always locale dependent and In Python 3k the problem will probably go away, since .lower() and Perhaps we should just convert a few of the cases you found to using |
Marc-Andre: How should we proceed with this bug? Discuss on python-dev |
Sean: I'd suggest to discuss this on python-dev. Note that even if we do use Unicode for the cases in question, the |
Does anyone know if this was discussed on python-dev? I've tried searching the archives and didn't find anything, but that's not to say it isn't there. |
There is also a locale normalization function in unicodeobject.c: normalize_encoding(). This function uses "if (ISUPPER(*e)) *l++ = TOLOWER(e++);" which uses the Python, *locale-independent, implementation of ctype. We should maybe use the ISUPPER / TOLOWER in codecs.c. Anyway, a function should be fixed, but I don't know which one :-) |
We've included this patch in Gentoo for about two years now. Can we get some discussion going on doing something like this? |
Looking at this again, I think we should change the codec registry C code to use Py_TOLOWER() and the encoding search function code to use the .translate() approach that Antoine suggested. |
The decimal module has been fixed in Python 2.7, 3.2 and 3.3 for Turkish local: issue bpo-11830. |
New changeset 92d02de91cc9 by Antoine Pitrou in branch '3.2': New changeset a77a4df54b95 by Antoine Pitrou in branch '3.2': New changeset fe0caf8c48d2 by Antoine Pitrou in branch 'default': |
New changeset 739958134fe5 by Antoine Pitrou in branch '2.7': |
Finally fixed in 2.7, 3.2, 3.3! |
The Fedora bot fails because here ... locale.setlocale(locale.LC_CTYPE, loc)
loc = ('tr_TR', 'ISO8859-9'), and apparently setlocale can only
handle "tr_TR", but not "tr_TR.ISO8859-9": 144 if (locale) { |
Stefan Krah <[email protected]> wrote:
Perhaps this is a bug in Fedora's setlocale that can't handle the turkish 'I' |
Perhaps indeed. Maybe you should try to report it. |
Yes, it's a bug. This works: #include <stdio.h>
#include <locale.h>
int
main(void)
{
char *s;
printf("%s\n", setlocale(LC_CTYPE, "tr_TR.ISO8859-9"));
printf("%s\n", setlocale(LC_CTYPE, NULL));
s = setlocale(LC_CTYPE, "tr_TR.ISO8859-9");
printf("%s\n", s ? s : "null");
return 0;
} But when I change the first setlocale call to "tr_TR", the result of |
I'm seeing this test failure in Gentoo, as well. |
Fedora bug report: |
Unrelated to the Fedora issue: The test is currently skipped on the diff -r 0b52b6f1bfab Lib/test/test_locale.py
--- a/Lib/test/test_locale.py Tue Aug 02 10:16:45 2011 +0200
+++ b/Lib/test/test_locale.py Tue Aug 02 11:37:39 2011 +0200
@@ -399,7 +399,7 @@
oldlocale = locale.setlocale(locale.LC_CTYPE)
self.addCleanup(locale.setlocale, locale.LC_CTYPE, oldlocale)
try:
- locale.setlocale(locale.LC_CTYPE, 'tr_TR')
+ locale.setlocale(locale.LC_CTYPE, 'tr_TR.UTF-8')
except locale.Error:
# Unsupported locale on this system
self.skipTest('test needs Turkish locale') |
As I wrote on python-dev, this test also fails on Debian lenny, which has So, indeed the test should be skipped on a multitude of platforms. |
On Tue, 02 Aug 2011 12:12:37 +0200, Stefan Krah <[email protected]> wrote:
This is true for my Gentoo buildbots. Once we've figured out the When I run the C test program I get null as the final output of that This is with glibc-2.13-r2 (the r2 is Gentoo's mod number). As someone pointed out on python-dev, if this isn't fixable then it should be an expected failure, not a skip. One question is, is there any platform on which the turkish locale is installed where this test actually works? |
[Re-opening to fix the skips] Yes, the test works on: Ubuntu Lucid (libc-2.11.1), OpenSUSE (libc-2.11.1), FreeBSD-8.2 Failure: Fedora 14 (libc-2.13), Debian lenny (libc-2.7), Gentoo (libc-2.13-r2) So perhaps this test should be marked as expected failure on Linux |
The Python bug is fixed, the problem is apparently some libcs have the
Well, it works here (Mageia). |
https://bugzilla.redhat.com/show_bug.cgi?id=726536 claims that the I suspect the only way of running the test case reliably is whitelisting |
New changeset a55ffb6c1993 by Stefan Krah in branch '3.2': New changeset 4244e4348362 by Stefan Krah in branch 'default': New changeset 0b8917fc6db5 by Stefan Krah in branch '2.7': |
I've upgraded the Fedora buildbot to Fedora-16. The specific glibc So the test will now fail again on all systems that a) have the bug |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: