|
| 1 | +# Invalid values |
| 2 | + |
| 3 | +> _“If you tell the truth, you don't have to remember anything.”__ |
| 4 | +> — _Mark Twain_ |
| 5 | +
|
| 6 | +Values of a particular type in Rust may never have an "invalid" bit pattern for that type. This is true even if that value is never read from afterwards. |
| 7 | + |
| 8 | +A lot of basic types _don't_ have any rules about invalid values. All bit patterns of the integer types (and arrays of the integer types) are valid. |
| 9 | + |
| 10 | +But most other types have some concept of validity. |
| 11 | + |
| 12 | +## Types of invalid values |
| 13 | + |
| 14 | +### Primitive types with invalid values |
| 15 | + |
| 16 | +`bool`s that have bit patterns other than those for `true` and `false` are invalid. The same goes for `char`s representing byte patterns that are considered invalid in UTF-32. |
| 17 | + |
| 18 | + |
| 19 | +`&T` and `&mut T` may not be null, nor may they be [unaligned] for values of type `T`. There are a lot of other reasons that a reference may not be valid, but these are the ones where the bit pattern is statically known to be invalid regardless of context. |
| 20 | + |
| 21 | +### Enums with invalid values |
| 22 | + |
| 23 | + |
| 24 | +Any bit pattern not covered by a variant of an enum is also invalid. For example, with the following enum: |
| 25 | + |
| 26 | +```rust |
| 27 | +enum Colors { |
| 28 | + Red = 1, |
| 29 | + Orange = 2, |
| 30 | + Yellow = 3, |
| 31 | + Green = 4, |
| 32 | + Blue = 5, |
| 33 | + Indigo = 6, |
| 34 | + Violet = 7, |
| 35 | +} |
| 36 | +``` |
| 37 | + |
| 38 | +a bit pattern of `8` or `0` (assuming that it gets represented as the explicit discriminant integers) is undefined behavior. |
| 39 | + |
| 40 | +Or in this enum: |
| 41 | + |
| 42 | +```rust |
| 43 | +enum Stuff { |
| 44 | + Char(char), |
| 45 | + Number(u32), |
| 46 | +} |
| 47 | +``` |
| 48 | + |
| 49 | +setting the discriminant bit to something that is not the discriminant of `Char` or `Number` is invalid. Similarly, setting the discriminant bit to that for `Char` but having the value be invalid for a `char` is also invalid. |
| 50 | + |
| 51 | +### Smart pointers and NonNull |
| 52 | + |
| 53 | +Most smart pointer types like `Box<T>` and `Rc<T>` are invalid when null. Library types may achieve the same behavior using the [`NonNull<T>`] pointer type. |
| 54 | + |
| 55 | +It's also currently invalid for `Vec<T>` to have a null pointer for its buffer! `Vec<T>` uses [`NonNull<T>`] internally, and empty vectors use a pointer value equal to the alignment of `T`. |
| 56 | + |
| 57 | + |
| 58 | + |
| 59 | +### `#[repr(Rust)]` isn't stable! |
| 60 | + |
| 61 | +Note that Rust's default representation for types is not stable! What might be a valid bit pattern one day may become invalid later, unless you're only relying on things that are known to be invariant. |
| 62 | + |
| 63 | +### Invalid values for general library types |
| 64 | + |
| 65 | +In general, types may have various invalid values based on their internal representation (which may not be stable!). |
| 66 | + |
| 67 | +As a library user you may not assume anything about the representation of a library type unless it is explicitly documented as such, or if it has a public representation that is known to be stable (for example a public `#[repr(C)]` enum) |
| 68 | + |
| 69 | +## When you might end up making an invalid value |
| 70 | + |
| 71 | + |
| 72 | +Invalid values have a chance to crop up when you're reinterpreting a chunk of memory as a value of a different type. This can happen when calling[`mem::transmute()`] and [`mem::transmute_copy()`], or when casting a reference to a region of memory into one of a different type. The value need not be on the stack to be considered invalid: if you gin up an `&bool` that points to a bit pattern that is not a valid `bool`, that is instantly UB even if you don't read from the reference. |
| 73 | + |
| 74 | +They can also happen when receiving values over FFI where either the signature of the function is incorrect (e.g. saying an FFI function accepts `bool` when the other side thinks it accepts a `u8`), or where there are differences in notions of validity across languages. |
| 75 | + |
| 76 | +A subtle case of this comes up occasionally in FFI code due to differences in expectations between how enums are used in Rust and C. |
| 77 | + |
| 78 | +In C, it is common to use enums to represent _bitmasks_, doing something like this: |
| 79 | + |
| 80 | +```c |
| 81 | +typedef enum { |
| 82 | + Active = 0x01; |
| 83 | + Visible = 0x02; |
| 84 | + Updating = 0x03; |
| 85 | + Focused = 0x04; |
| 86 | +} NodeStatus; |
| 87 | +``` |
| 88 | + |
| 89 | +where the value make take states like `Active | Focused | Visible`. These combined values, as well as the "no flags set" value `0` are invalid in Rust. If this type is represented as an enum in Rust ([even if it is `#[repr(C)]`][reprc-enum]!), it will be UB to accept values of this type over FFI from C. Generally in such cases it is recommended to use an integer type instead, and represent the mask values as constants. |
| 90 | + |
| 91 | + |
| 92 | +## Things you might see if you used invalid data |
| 93 | + |
| 94 | +The compiler is allowed to assume that values are never invalid; and it may use invalid states to signal other things, or pack types into smaller spaces. |
| 95 | + |
| 96 | +For example, the type `Option<Box<T>>` will use the fact that the reference cannot be null to fit the entire type into the the same space `Box<T>` takes up, with the null pointer state representing `None`. |
| 97 | + |
| 98 | +This can go even further with stuff like `Option<Option<Option<bool>>>` fitting into a single byte, up to and including the type with 254 `Option`s surrounding one `bool`. This general class of optimization is known as a "niche optimization", with bits representing invalid values being called "niches". |
| 99 | + |
| 100 | +In such scenarios, invalid values may lead to values being interpreted as a different value, for example an `Option<NodeStatus>` using the enum from above would be interpreted as `None` if `NodeStatus` were represented as a Rust enum and an "empty status" value was received over C. |
| 101 | + |
| 102 | +Furthermore, invalid values will break `match` statements, usually (but not necessarily) leading to an abort. |
| 103 | + |
| 104 | +This is not an exhaustive list: ultimately, having an invalid value is UB and it remains illegal even if there are no optimizations that will break. |
| 105 | + |
| 106 | + |
| 107 | + |
| 108 | + [unaligned]: ../core_unsafety/dangling_and_unaligned_pointers.md |
| 109 | + [`mem::transmute()`]: https://doc.rust-lang.org/stable/std/mem/fn.transmute.html |
| 110 | + [`mem::transmute_copy()`]: https://doc.rust-lang.org/stable/std/mem/fn.transmute_copy.html |
| 111 | + [`NonNull<T>`]: https://doc.rust-lang.org/stable/std/ptr/struct.NonNull.html |
| 112 | + [reprc-enum]: https://doc.rust-lang.org/reference/type-layout.html#reprc-field-less-enums |
0 commit comments