@@ -96,12 +96,11 @@ need, and it can make your lifetimes more complex.
96
96
97
97
## Generic functions
98
98
99
- To write a function that's generic over types of strings, use [ the ` Str `
100
- trait] ( http://doc.rust-lang.org/std/str/trait.Str.html ) :
99
+ To write a function that's generic over types of strings, use ` &str ` .
101
100
102
101
``` {rust}
103
- fn some_string_length<T: Str> (x: T ) -> uint {
104
- x.as_slice(). len()
102
+ fn some_string_length(x: &str ) -> uint {
103
+ x.len()
105
104
}
106
105
107
106
fn main() {
@@ -111,15 +110,12 @@ fn main() {
111
110
112
111
let s = "Hello, world".to_string();
113
112
114
- println!("{}", some_string_length(s));
113
+ println!("{}", some_string_length(s.as_slice() ));
115
114
}
116
115
```
117
116
118
117
Both of these lines will print ` 12 ` .
119
118
120
- The only method that the ` Str ` trait has is ` as_slice() ` , which gives you
121
- access to a ` &str ` value from the underlying string.
122
-
123
119
## Comparisons
124
120
125
121
To compare a String to a constant string, prefer ` as_slice() ` ...
@@ -161,25 +157,93 @@ indexing is basically never what you want to do. The reason is that each
161
157
character can be a variable number of bytes. This means that you have to iterate
162
158
through the characters anyway, which is a O(n) operation.
163
159
164
- To iterate over a string, use the ` graphemes() ` method on ` &str ` :
160
+ There's 3 basic levels of unicode (and its encodings):
161
+
162
+ - code units, the underlying data type used to store everything
163
+ - code points/unicode scalar values (char)
164
+ - graphemes (visible characters)
165
+
166
+ Rust provides iterators for each of these situations:
167
+
168
+ - ` .bytes() ` will iterate over the underlying bytes
169
+ - ` .chars() ` will iterate over the code points
170
+ - ` .graphemes() ` will iterate over each grapheme
171
+
172
+ Usually, the ` graphemes() ` method on ` &str ` is what you want:
165
173
166
174
``` {rust}
167
- let s = "αἰθήρ ";
175
+ let s = "u͔n͈̰̎i̙̮͚̦c͚̉o̼̩̰͗d͔̆̓ͥé ";
168
176
169
177
for l in s.graphemes(true) {
170
178
println!("{}", l);
171
179
}
172
180
```
173
181
182
+ This prints:
183
+
184
+ ``` {notrust,ignore}
185
+ u͔
186
+ n͈̰̎
187
+ i̙̮͚̦
188
+ c͚̉
189
+ o̼̩̰͗
190
+ d͔̆̓ͥ
191
+ é
192
+ ```
193
+
174
194
Note that ` l ` has the type ` &str ` here, since a single grapheme can consist of
175
195
multiple codepoints, so a ` char ` wouldn't be appropriate.
176
196
177
- This will print out each character in turn, as you'd expect: first "α", then
178
- "ἰ", etc. You can see that this is different than just the individual bytes.
179
- Here's a version that prints out each byte:
197
+ This will print out each visible character in turn, as you'd expect: first "u͔", then
198
+ "n͈̰̎", etc. If you wanted each individual codepoint of each grapheme, you can use ` .chars() ` :
180
199
181
200
``` {rust}
182
- let s = "αἰθήρ";
201
+ let s = "u͔n͈̰̎i̙̮͚̦c͚̉o̼̩̰͗d͔̆̓ͥé";
202
+
203
+ for l in s.chars() {
204
+ println!("{}", l);
205
+ }
206
+ ```
207
+
208
+ This prints:
209
+
210
+ ``` {notrust,ignore}
211
+ u
212
+ ͔
213
+ n
214
+ ̎
215
+ ͈
216
+ ̰
217
+ i
218
+ ̙
219
+ ̮
220
+ ͚
221
+ ̦
222
+ c
223
+ ̉
224
+ ͚
225
+ o
226
+ ͗
227
+ ̼
228
+ ̩
229
+ ̰
230
+ d
231
+ ̆
232
+ ̓
233
+ ͥ
234
+ ͔
235
+ e
236
+ ́
237
+ ```
238
+
239
+ You can see how some of them are combining characters, and therefore the output
240
+ looks a bit odd.
241
+
242
+ If you want the individual byte representation of each codepoint, you can use
243
+ ` .bytes() ` :
244
+
245
+ ``` {rust}
246
+ let s = "u͔n͈̰̎i̙̮͚̦c͚̉o̼̩̰͗d͔̆̓ͥé";
183
247
184
248
for l in s.bytes() {
185
249
println!("{}", l);
@@ -189,16 +253,50 @@ for l in s.bytes() {
189
253
This will print:
190
254
191
255
``` {notrust,ignore}
192
- 206
193
- 177
194
- 225
195
- 188
256
+ 117
257
+ 205
258
+ 148
259
+ 110
260
+ 204
261
+ 142
262
+ 205
263
+ 136
264
+ 204
196
265
176
197
- 206
198
- 184
199
- 206
266
+ 105
267
+ 204
268
+ 153
269
+ 204
200
270
174
201
- 207
271
+ 205
272
+ 154
273
+ 204
274
+ 166
275
+ 99
276
+ 204
277
+ 137
278
+ 205
279
+ 154
280
+ 111
281
+ 205
282
+ 151
283
+ 204
284
+ 188
285
+ 204
286
+ 169
287
+ 204
288
+ 176
289
+ 100
290
+ 204
291
+ 134
292
+ 205
293
+ 131
294
+ 205
295
+ 165
296
+ 205
297
+ 148
298
+ 101
299
+ 204
202
300
129
203
301
```
204
302
0 commit comments