Skip to content

Commit 24c6065

Browse files
authored
Merge pull request #1226 from bobahop/patch-3
[char] about.md
2 parents a3f9741 + 53a3223 commit 24c6065

File tree

1 file changed

+40
-1
lines changed

1 file changed

+40
-1
lines changed

concepts/char/about.md

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,42 @@
11
# About
22

3-
TODO
3+
Rust implements the [char][char type] type to represent a single character. A `char` literal is placed within single quotes, like `'a'`.
4+
Each `char` is four bytes in size and represents a single [Unicode Scalar Value][unicode scalar].
5+
6+
However, a [`char`][character type] is not always what we think of as a letter. There are some languages, e.g. Hindi, that use [diacritics][diacritics],
7+
which are special symbols which modify the character they are attached to. Although the diacritic in Rust is a separate `char`, it is the diacritic and
8+
the character it modifies that we commonly think of as a letter.
9+
10+
The term for a character and its diacritic is [grapheme cluster][grapheme cluster]. There are external crates that can be used to process grapheme clusters,
11+
such as [unicode-segmentation][unicode-segmentation].
12+
13+
Example
14+
15+
```rust
16+
pub fn main() {
17+
let text = ""; // a "u" with a diacritic
18+
let text_vec: Vec<char> = text.chars().collect(); // this gets the chars in "ü"
19+
println!("{:?}", text_vec.len()); // this prints the number of chars in "ü"
20+
println!("{:?}", text_vec[0]); // this prints the first char in "ü"
21+
println!("{:?}", text_vec[1]); // this prints the second char in "ü"
22+
}
23+
```
24+
25+
prints
26+
27+
```rust
28+
2
29+
'u'
30+
'\u{308}'
31+
```
32+
33+
'\u{308}' is another way of writing a `char` literal. `\u` indicates it is a unicode `char` with `{308}` being the unique Unicode number for that character
34+
or diacritic.
35+
36+
[char type]: https://doc.rust-lang.org/std/primitive.char.html
37+
[unicode]: http://www.unicode.org/glossary/#unicode
38+
[character type]: https://doc.rust-lang.org/book/ch03-02-data-types.html#the-character-type
39+
[unicode scalar]: http://www.unicode.org/glossary/#unicode_scalar_value
40+
[diacritics]: http://www.unicode.org/glossary/#diacritic
41+
[grapheme cluster]: https://doc.rust-lang.org/book/ch08-02-strings.html#bytes-and-scalar-values-and-grapheme-clusters-oh-my
42+
[unicode-segmentation]: https://crates.io/crates/unicode-segmentation

0 commit comments

Comments
 (0)