Skip to content

Commit 7988899

Browse files
committed
resolve
1 parent 659a5a5 commit 7988899

File tree

1 file changed

+37
-21
lines changed

1 file changed

+37
-21
lines changed

src/advanced_unsafety/invalid_values.md

Lines changed: 37 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,37 @@
33
> _“If you tell the truth, you don't have to remember anything.”__
44
> _Mark Twain_
55
6-
Values of a particular type in Rust may never have an "invalid" bit pattern for that type. This is true even if that value is never read from afterwards.
6+
Values of a particular type in Rust may never have an "invalid" bit pattern for that type. This is true even if that value is never read from afterwards, or if that value simply exists behind an unread reference. From [the reference]:
77

8-
A lot of basic types _don't_ have any rules about invalid values. All bit patterns of the integer types (and arrays of the integer types) are valid.
8+
> "Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation.
99
10-
But most other types have some concept of validity.
10+
11+
12+
A lot of basic types _don't_ have any rules about invalid values. For example, all bit patterns of the integer types (and arrays of the integer types) are valid. But most other types have some concept of validity.
1113

1214
## Types of invalid values
1315

16+
### Uninitialized memory
17+
18+
Values of _any_ type can be "uninitialized", which is considered instantly UB even for types like integers. We discuss this further in [the chapter on uninitialized memory][uninit-chapter]. For now this chapter will largely cover cases where a type may have an invalid _bit pattern_, rather than other cases where it may be invalid due to e.g. not having an initialized bit representation at all.
19+
1420
### Primitive types with invalid values
1521

16-
`bool`s that have bit patterns other than those for `true` and `false` are invalid. The same goes for `char`s representing byte patterns that are considered invalid in UTF-32.
22+
`bool`s that have bit patterns other than those for `true` and `false` are invalid. The same goes for `char`s representing byte patterns that are considered invalid in UTF-32 (anything that is either a surrogate character, or greater than `char::MAX`).
1723

1824

19-
`&T` and `&mut T` may not be null, nor may they be [unaligned] for values of type `T`. There are a lot of other reasons that a reference may not be valid, but these are the ones where the bit pattern is statically known to be invalid regardless of context.
25+
### Pointers with invalid values
26+
27+
`&T` and `&mut T` may not be null, nor may they be [unaligned] for values of type `T`.
28+
29+
`fn` pointers and the metadata part of `dyn Trait` may not be null either.
30+
31+
Most smart pointer types like `Box<T>` and `Rc<T>` are invalid when null. Library types may achieve the same behavior using the [`NonNull<T>`] pointer type.
32+
33+
It's also currently invalid for `Vec<T>` to have a null pointer for its buffer! `Vec<T>` uses [`NonNull<T>`] internally, and empty vectors use a pointer value equal to the alignment of `T`.
34+
35+
There are a lot of other reasons that a pointer type may not be valid, but these are the ones where the bit pattern is statically known to be invalid regardless of context. We'll be covering these in more depth in other chapters (@@note: where?), but, for example, all of these pointers must not only be non-null, they must also point to an actual valid instance of that type at all times (except `Vec<T>`, which is allowed to refer to invalid-but-aligned-and-non-null memory when it is empty)
36+
2037

2138
### Enums with invalid values
2239

@@ -48,34 +65,28 @@ enum Stuff {
4865

4966
setting the discriminant bit to something that is not the discriminant of `Char` or `Number` is invalid. Similarly, setting the discriminant bit to that for `Char` but having the value be invalid for a `char` is also invalid.
5067

51-
### Smart pointers and NonNull
52-
53-
Most smart pointer types like `Box<T>` and `Rc<T>` are invalid when null. Library types may achieve the same behavior using the [`NonNull<T>`] pointer type.
54-
55-
It's also currently invalid for `Vec<T>` to have a null pointer for its buffer! `Vec<T>` uses [`NonNull<T>`] internally, and empty vectors use a pointer value equal to the alignment of `T`.
56-
57-
58-
59-
### `#[repr(Rust)]` isn't stable!
60-
61-
Note that Rust's default representation for types is not stable! What might be a valid bit pattern one day may become invalid later, unless you're only relying on things that are known to be invariant.
62-
6368
### `str`
6469

6570
The string slice type `str` does not actually have any validity constraints: Despite being only for UTF-8 encoded strings, it is valid for `str`s to be in any bit pattern, provided you do not call any methods on the string that are not about directly accessing the memory behind it.
6671

67-
Basically, the UTF-8 validity of `str` is an implicit safety requirement for most of its methods, however it is fine to _hold on to_ an `&str` that points to random bytes.
72+
Basically, the UTF-8 validity of `str` is an implicit safety requirement for most of its methods, however it is fine to _hold on to_ an `&str` that points to random bytes. This is a difference between things being "insta-UB" and "UB on use": invalid value UB is typically "insta UB" (it's UB even if you don't _do_ anything with the invalid value), but here you're allowed to do this as long as you don't use the data in certain ways.
6873

6974
This is something that can be relied on when doing things like manipulating or constructing `str`s byte-by-byte, where there may be intermediate invalid states.
7075

7176
Of course, reference types like `&str` must still satisfy all of the rules about reference validity (being non-null, etc).
7277

7378
### Invalid values for general library types
7479

75-
In general, types may have various invalid values based on their internal representation (which may not be stable!).
80+
In general, types may have various invalid values based on their internal representation (which may not be stable!).
81+
In addition to [`NonNull<T>`], the Rust standard library provides [`NonZeroUsize`] and a bunch of other similar `NonZero` integer types that work as its integer counterparts, and libraries may use these internally.
82+
83+
84+
Note that Rust's default representation for types is not stable! What might be a valid bit pattern one day may become invalid later, unless you're only relying on things that are known to be invariant. Converting a type to its bits, sending it over the network, and converting it back is extremely fragile, and will break if the two sides are on different platforms or even Rust versions.
7685

7786
As a library user you may not assume anything about the representation of a library type unless it is explicitly documented as such, or if it has a public representation that is known to be stable (for example a public `#[repr(C)]` enum)
7887

88+
89+
7990
## When you might end up making an invalid value
8091

8192

@@ -99,7 +110,7 @@ typedef enum {
99110
where the value make take states like `Active | Focused | Visible`. These combined values, as well as the "no flags set" value `0` are invalid in Rust. If this type is represented as an enum in Rust ([even if it is `#[repr(C)]`][reprc-enum]!), it will be UB to accept values of this type over FFI from C. Generally in such cases it is recommended to use an integer type instead, and represent the mask values as constants.
100111

101112

102-
## Things you might see if you used invalid data
113+
## Signs an invalid value was involved
103114

104115
The compiler is allowed to assume that values are never invalid; and it may use invalid states to signal other things, or pack types into smaller spaces.
105116

@@ -111,13 +122,18 @@ In such scenarios, invalid values may lead to values being interpreted as a diff
111122

112123
Furthermore, invalid values will break `match` statements, usually (but not necessarily) leading to an abort.
113124

125+
Debuggers also tend to behave strangely with invalid values, displaying incorrect values, or even having the value change from read to read.
126+
114127
This is not an exhaustive list: ultimately, having an invalid value is UB and it remains illegal even if there are no optimizations that will break.
115128

116129

117130

118131
[unaligned]: ../core_unsafety/dangling_and_unaligned_pointers.md
132+
[uninit-chapter]: ../undef_memory.md
119133
[`mem::transmute()`]: https://doc.rust-lang.org/stable/std/mem/fn.transmute.html
120134
[`mem::transmute_copy()`]: https://doc.rust-lang.org/stable/std/mem/fn.transmute_copy.html
121135
[`mem::zeroed()`]: https://doc.rust-lang.org/stable/std/mem/fn.zeroed.html
122136
[`NonNull<T>`]: https://doc.rust-lang.org/stable/std/ptr/struct.NonNull.html
123-
[reprc-enum]: https://doc.rust-lang.org/reference/type-layout.html#reprc-field-less-enums
137+
[`NonZeroUsize`]: https://doc.rust-lang.org/stable/std/num/struct.NonZeroUsize.html
138+
[reprc-enum]: https://doc.rust-lang.org/reference/type-layout.html#reprc-field-less-enums
139+
[the reference]: https://doc.rust-lang.org/reference/behavior-considered-undefined.html

0 commit comments

Comments
 (0)