|
1 | 1 | # About
|
2 | 2 |
|
3 |
| -TODO |
| 3 | +Rust implements the [char][char type] type to represent a single character. A `char` literal is placed within single quotes, like `'a'`. |
| 4 | +Each `char` is four bytes in size and represents a single [Unicode Scalar Value][unicode scalar]. |
| 5 | + |
| 6 | +However, a [`char`][character type] is not always what we think of as a letter. There are some languages, e.g. Hindi, that use [diacritics][diacritics], |
| 7 | +which are special symbols which modify the character they are attached to. Although the diacritic in Rust is a separate `char`, it is the diacritic and |
| 8 | +the character it modifies that we commonly think of as a letter. |
| 9 | + |
| 10 | +The term for a character and its diacritic is [grapheme cluster][grapheme cluster]. There are external crates that can be used to process grapheme clusters, |
| 11 | +such as [unicode-segmentation][unicode-segmentation]. |
| 12 | + |
| 13 | +Example |
| 14 | + |
| 15 | +```rust |
| 16 | +pub fn main() { |
| 17 | + let text = "ü"; // a "u" with a diacritic |
| 18 | + let text_vec: Vec<char> = text.chars().collect(); // this gets the chars in "ü" |
| 19 | + println!("{:?}", text_vec.len()); // this prints the number of chars in "ü" |
| 20 | + println!("{:?}", text_vec[0]); // this prints the first char in "ü" |
| 21 | + println!("{:?}", text_vec[1]); // this prints the second char in "ü" |
| 22 | +} |
| 23 | +``` |
| 24 | + |
| 25 | +prints |
| 26 | + |
| 27 | +```rust |
| 28 | +2 |
| 29 | +'u' |
| 30 | +'\u{308}' |
| 31 | +``` |
| 32 | + |
| 33 | +'\u{308}' is another way of writing a `char` literal. `\u` indicates it is a unicode `char` with `{308}` being the unique Unicode number for that character |
| 34 | +or diacritic. |
| 35 | + |
| 36 | +[char type]: https://doc.rust-lang.org/std/primitive.char.html |
| 37 | +[unicode]: http://www.unicode.org/glossary/#unicode |
| 38 | +[character type]: https://doc.rust-lang.org/book/ch03-02-data-types.html#the-character-type |
| 39 | +[unicode scalar]: http://www.unicode.org/glossary/#unicode_scalar_value |
| 40 | +[diacritics]: http://www.unicode.org/glossary/#diacritic |
| 41 | +[grapheme cluster]: https://doc.rust-lang.org/book/ch08-02-strings.html#bytes-and-scalar-values-and-grapheme-clusters-oh-my |
| 42 | +[unicode-segmentation]: https://crates.io/crates/unicode-segmentation |
0 commit comments