-
Notifications
You must be signed in to change notification settings - Fork 13.3k
RFC: Rename char
to make it clearer that it is a unicode codepoint/scalar value
#12730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
(I'm personally against calling it "rune" since that word feels like a glyph/grapheme rather than a codepoint to me... but there's precedent in Go and in BSD for that name.) |
Also note that UCS-4 was historically not same to UTF-32. Wikipedia says they are now identical, but Unicode FAQ seems to suggest another. |
One argument against renaming |
@Kimundi OTOH it looks like nobody (from the above list) is using |
I really don't like |
|
Don't like rune, hate usv for its meaningless. +1 for char.
|
This is a minor wart that I'm not inclined to change. Why does char not allow surrogate code points? |
@brson: Surrogate code points aren't Unicode scalar values. They're just an implementation detail of UTF-16. The UTF-8 standard explicitly forbids encoding them too. |
This is mostly a reaction to #12730. If we are going to keep calling them `char`, at least make it clear that they aren't characters but codepoint/scalar.
I think that various names on the table here are inadequate for different reasons:
That leaves In Go it is exactly an alias for So, proposal:
|
Several people include core team ones don't like rune. I don't like it too.
|
@liigo Could you explain why you don’t like "rune" and why "character" being ambiguous is not a problem, as you see it? |
Someone has been answered your questions, see comments above.
|
To be pedantic Also, this should be an RFC in rust-lang/rfcs, now that we have that process. Closing. (If someone else doesn't step up to write it up, I'm happy to do it... eventually.) |
char name wasted my many days as I was thinking it as plain c++ char. Too bad if some one like me assume it by name. Now I read type carefully before using. My point char is confusing name. If core team doesn't like above names invent new one but not char atleast. It will save someone's time. It's old topic but it hurts me that's why I'm adding my comment. |
c8 is good name atleast it will force us to understand what is c8 like i8, u8, i32 etc |
agree with this,make misunderstanding for common used name ·char·,make it much harder to get into use rust,as target of some useable Programming Language maybe should respect some ·general knowledge· for most of other languages already made it like ·noun· |
It was the least bad option among everything considered, and it's highly unlikely that it would change at this point with the language stable. Since it's a Unicode scalar value (not just any code point), there's always a 1:1 mapping between strings and C has Coming from this in C and C++, I struggle to see how the the naming of On that note, how do I unsubscribe from all threads in a repository? ... |
Our
char
type is a Unicode scalar value (codepoint excluding the surrogate range), which can lead to confusion because (a) it differs to other languages and (b) it doesn't directly encourage good unicode hygiene ("Oh, a character? that's what the user sees").Possible names include
codepoint
,ucs4
, orrune
like Go.Other languages names for a unicode scalar value/what
char
means:Char
is a codepoint (although surrogates are allowed)dchar
(char
is a "UTF-8 code unit" andwchar
is a "UTF-16 code-unit" (i.e. aliases foru8
andu16
?): http://dlang.org/type.html)rune
char
is a 16-bit integer (i.e. UTF-16 code unit)char
is (normally) a byte, i.e. a UTF-8 code unit.(Other languages like Python don't have a type for a single character and don't have a type called
char
, and so aren't meaningful for this comparison.)(This issue brought to you by reddit.)
The text was updated successfully, but these errors were encountered: