-
Notifications
You must be signed in to change notification settings - Fork 13.3k
str should support a few other string en/decodings #1771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As far as i understand, Haskell uses UTF-8 only and en/decodes during I/O. |
That's pretty much what I have in mind. |
Somewhat related is #1557. Haskell uses only '\n' internally, and converts during I/O. |
I care about operating system APIs because I have to, but don't want to think about 'locale' and would rather always output UTF-8 for The |
Yeah, the str library probably needs a few common encoding-converters wired in, along with the assumption of a "full" iconv or libicu-like "convert to anything" layer further out (say in libstd). I think you have a good list here. Un-tagging as [rfc] since this is not really a language change, just some library work. Totally legitimate library work mind you! Needs doing. All the wchar_t / unicode APIs in windows need such decoding. (And at some point we will, indeed, need to think about locales. They're real. But I'm willing to leave that to a later cycle of stdlib design, or at least another bug.) |
I landed UTF-16 helpers in 47e7a05. I believe that's "what uv needs on windows", in the sense that UTF-16 is the interpretation of the I don't think the latin-1-and-supersets stuff is really worth trying to support, or at least not in libcore. I'd prefer not to require libcore to model codepages or other locale artifacts. It's a huge task, we'll probably delegate most of it to libICU, and lots of software is never localized. For at least those reasons I'd prefer we leave locale stuff to libstd. I do expect to modify all the windows API calls we use in libcore to call the "W" variants with UTF-16 input, not the "A" variants with (broken) UTF-8 as we're doing today. Closing this. |
…t-lang#1771) * Also set RUSTUP_TOOLCHAIN
I've been thinking a little bit about how best to include support for the various other string encodings that we'll need to avoid the pitfalls of things like native filesystem paths and to read more common text encodings. It seems like in addition to the UTF-8 and ASCII which we already handle, we'll likely want simple
to
andfrom
methods for the following, perhaps under the umbrella of astring_encoding
interface:These are likely to be only a small can of worms, so long as we don't make any plans to automatically recognize them or agonize about working with them internally...
What am I missing?
The text was updated successfully, but these errors were encountered: