Skip to content

Commit 6ef8392

Browse files
committed
auto merge of #17453 : steveklabnik/rust/gh17340, r=alexcrichton
/cc @huonw
2 parents b2e5773 + f358407 commit 6ef8392

File tree

1 file changed

+120
-22
lines changed

1 file changed

+120
-22
lines changed

src/doc/guide-strings.md

+120-22
Original file line numberDiff line numberDiff line change
@@ -96,12 +96,11 @@ need, and it can make your lifetimes more complex.
9696

9797
## Generic functions
9898

99-
To write a function that's generic over types of strings, use [the `Str`
100-
trait](http://doc.rust-lang.org/std/str/trait.Str.html):
99+
To write a function that's generic over types of strings, use `&str`.
101100

102101
```{rust}
103-
fn some_string_length<T: Str>(x: T) -> uint {
104-
x.as_slice().len()
102+
fn some_string_length(x: &str) -> uint {
103+
x.len()
105104
}
106105
107106
fn main() {
@@ -111,15 +110,12 @@ fn main() {
111110
112111
let s = "Hello, world".to_string();
113112
114-
println!("{}", some_string_length(s));
113+
println!("{}", some_string_length(s.as_slice()));
115114
}
116115
```
117116

118117
Both of these lines will print `12`.
119118

120-
The only method that the `Str` trait has is `as_slice()`, which gives you
121-
access to a `&str` value from the underlying string.
122-
123119
## Comparisons
124120

125121
To compare a String to a constant string, prefer `as_slice()`...
@@ -161,25 +157,93 @@ indexing is basically never what you want to do. The reason is that each
161157
character can be a variable number of bytes. This means that you have to iterate
162158
through the characters anyway, which is a O(n) operation.
163159

164-
To iterate over a string, use the `graphemes()` method on `&str`:
160+
There's 3 basic levels of unicode (and its encodings):
161+
162+
- code units, the underlying data type used to store everything
163+
- code points/unicode scalar values (char)
164+
- graphemes (visible characters)
165+
166+
Rust provides iterators for each of these situations:
167+
168+
- `.bytes()` will iterate over the underlying bytes
169+
- `.chars()` will iterate over the code points
170+
- `.graphemes()` will iterate over each grapheme
171+
172+
Usually, the `graphemes()` method on `&str` is what you want:
165173

166174
```{rust}
167-
let s = "αἰθήρ";
175+
let s = "u͔n͈̰̎i̙̮͚̦c͚̉o̼̩̰͗d͔̆̓ͥé";
168176
169177
for l in s.graphemes(true) {
170178
println!("{}", l);
171179
}
172180
```
173181

182+
This prints:
183+
184+
```{notrust,ignore}
185+
186+
n͈̰̎
187+
i̙̮͚̦
188+
c͚̉
189+
o̼̩̰͗
190+
d͔̆̓ͥ
191+
192+
```
193+
174194
Note that `l` has the type `&str` here, since a single grapheme can consist of
175195
multiple codepoints, so a `char` wouldn't be appropriate.
176196

177-
This will print out each character in turn, as you'd expect: first "α", then
178-
"ἰ", etc. You can see that this is different than just the individual bytes.
179-
Here's a version that prints out each byte:
197+
This will print out each visible character in turn, as you'd expect: first "u͔", then
198+
"n͈̰̎", etc. If you wanted each individual codepoint of each grapheme, you can use `.chars()`:
180199

181200
```{rust}
182-
let s = "αἰθήρ";
201+
let s = "u͔n͈̰̎i̙̮͚̦c͚̉o̼̩̰͗d͔̆̓ͥé";
202+
203+
for l in s.chars() {
204+
println!("{}", l);
205+
}
206+
```
207+
208+
This prints:
209+
210+
```{notrust,ignore}
211+
u
212+
͔
213+
n
214+
̎
215+
͈
216+
̰
217+
i
218+
̙
219+
̮
220+
͚
221+
̦
222+
c
223+
̉
224+
͚
225+
o
226+
͗
227+
̼
228+
̩
229+
̰
230+
d
231+
̆
232+
̓
233+
ͥ
234+
͔
235+
e
236+
́
237+
```
238+
239+
You can see how some of them are combining characters, and therefore the output
240+
looks a bit odd.
241+
242+
If you want the individual byte representation of each codepoint, you can use
243+
`.bytes()`:
244+
245+
```{rust}
246+
let s = "u͔n͈̰̎i̙̮͚̦c͚̉o̼̩̰͗d͔̆̓ͥé";
183247
184248
for l in s.bytes() {
185249
println!("{}", l);
@@ -189,16 +253,50 @@ for l in s.bytes() {
189253
This will print:
190254

191255
```{notrust,ignore}
192-
206
193-
177
194-
225
195-
188
256+
117
257+
205
258+
148
259+
110
260+
204
261+
142
262+
205
263+
136
264+
204
196265
176
197-
206
198-
184
199-
206
266+
105
267+
204
268+
153
269+
204
200270
174
201-
207
271+
205
272+
154
273+
204
274+
166
275+
99
276+
204
277+
137
278+
205
279+
154
280+
111
281+
205
282+
151
283+
204
284+
188
285+
204
286+
169
287+
204
288+
176
289+
100
290+
204
291+
134
292+
205
293+
131
294+
205
295+
165
296+
205
297+
148
298+
101
299+
204
202300
129
203301
```
204302

0 commit comments

Comments
 (0)