From ce2281134d3577c71378e83a06343f9b6678eeca Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Sun, 19 Feb 2023 14:08:24 -0800 Subject: [PATCH 01/19] Add section on uninitialized memory --- src/SUMMARY.md | 4 +- src/advanced_unsafety/invalid_values.md | 2 +- src/advanced_unsafety/undef_memory.md | 1 - src/advanced_unsafety/uninitialized.md | 128 ++++++++++++++++++++++++ 4 files changed, 131 insertions(+), 4 deletions(-) delete mode 100644 src/advanced_unsafety/undef_memory.md create mode 100644 src/advanced_unsafety/uninitialized.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 6b742d1..61acca3 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -4,17 +4,17 @@ - [Undefined behavior](./undefined_behavior.md) - [Core unsafety](./core_unsafety.md) - [Dangling and unaligned pointers](./core_unsafety/dangling_and_unaligned_pointers.md) + - [Invalid values](./core_unsafety/invalid_values.md) - [Data races](./core_unsafety/data_races.md) - [Intrinsics](./core_unsafety/intrinsics.md) - [ABI and FFI](./core_unsafety/abi_and_ffi.md) - [Platform features](./core_unsafety/platform_features.md) - [Inline assembly](./core_unsafety/inline_assembly.md) - [Advanced unsafety](./advanced_unsafety.md) - - [Invalid values](./core_unsafety/invalid_values.md) + - [Uninitialized memory](./advanced_unsafety/uninitialized.md) - [Pointer aliasing](./advanced_unsafety/pointer_aliasing.md) - [Immutable data](./advanced_unsafety/immutable_data.md) - [Atomic ordering](./advanced_unsafety/atomic_ordering.md) - - [Undef memory](./advanced_unsafety/undef_memory.md) - [Pinning](./advanced_unsafety/pinning.md) - [Variance](./advanced_unsafety/variance.md) - [Expert unsafety](./expert_unsafety.md) diff --git a/src/advanced_unsafety/invalid_values.md b/src/advanced_unsafety/invalid_values.md index b7b715f..c6bc0ff 100644 --- a/src/advanced_unsafety/invalid_values.md +++ b/src/advanced_unsafety/invalid_values.md @@ -129,7 +129,7 @@ This is not an exhaustive list: ultimately, having an invalid value is UB and it [unaligned]: ../core_unsafety/dangling_and_unaligned_pointers.md - [uninit-chapter]: ../undef_memory.md + [uninit-chapter]: ../advanced_unsafe/uninitialized.md [`mem::transmute()`]: https://doc.rust-lang.org/stable/std/mem/fn.transmute.html [`mem::transmute_copy()`]: https://doc.rust-lang.org/stable/std/mem/fn.transmute_copy.html [`mem::zeroed()`]: https://doc.rust-lang.org/stable/std/mem/fn.zeroed.html diff --git a/src/advanced_unsafety/undef_memory.md b/src/advanced_unsafety/undef_memory.md deleted file mode 100644 index 926f35c..0000000 --- a/src/advanced_unsafety/undef_memory.md +++ /dev/null @@ -1 +0,0 @@ -# Undef memory diff --git a/src/advanced_unsafety/uninitialized.md b/src/advanced_unsafety/uninitialized.md new file mode 100644 index 0000000..528f4aa --- /dev/null +++ b/src/advanced_unsafety/uninitialized.md @@ -0,0 +1,128 @@ +# Uninitialized memory + +> _"I'm Nobody! Who are you? Are you — Nobody — too?"_ +> +> — _Emily Dickinson_ + +While we have covered [invalid values], there's another thing that behaves a lot like invalid values, but has nothing to do with actual bit patterns: Uninitialized memory. + +An easy way to think about uninitialized memory is that there's an additional value (often called `undef` using LLVM's term for it) that does not map to any concrete bit pattern, but can be introduced in abstract in various ways, and makes _most_ values invalid. + +If you explicitly wish to work with uninitialized and partially-initialized types, [`MaybeUninit`] is a useful abstraction since it can be "initialized" with no overhead and then written to in parts. + +## Sources of uninitialized memory + +### `mem::uninitialized()` and `MaybeUninit::assume_init()` + +[`mem::uninitialized()`] is a deprecated API that has a very tempting shape, it lets you do things like `let x = mem::uninitialized()` for cases when you want to construct the value in bits. It's basically _always_ UB to use, since it immediately sets `x` to uninitialized memory, which is UB. + +Use [`MaybeUninit`] instead. + +It is still possible to create uninitialized memory using [`MaybeUninit::assume_init()`] if you have not, in fact, assured that things are initialized. + +### Padding + +Padding bytes in structs and enums are often but not always uninitialized. This means that treating a struct as a bag of bytes (by, say, treating `&Struct` as `&[u8; size_of::()]` and reading from there) is UB even if you don't write invalid values to those bytes, since you are ginning up uninitialized `u8`s. + +Reading from padding [always produces uninitialized values][pad-glossary]. + + + +### Moved-from values + +The following code is UB: + +```rust +# use std::ptr; +let x = String::new(); // String is not Copy +let mut v = vec![]; +let ptr = &x as *const String; + +v.push(x); // move x into the vector + +unsafe { + // reads from moved-from memory + let ghost = ptr::read(ptr); +} +``` + +Any type of move will do this, even when you "move" the value into a different variable with stuff like `let y = x;`. + +Note that Rust does let you "partially move" out of fields of a struct, in such a case the whole struct is now no longer a valid value for its type, but you are still allowed to "use" the struct to look at other fields. When doing such things, make sure there are no pointers that still think the struct is whole and valid. + +#### Caveat: `ptr::drop_in_place()`, `ManuallyDrop::drop()`, and `ptr::read()` + +[`ptr::drop_in_place()`] and [`ManuallyDrop::drop()`] are interesting: they call all the destructor[^1] on a value (or a pointed-to value in the case of the former). From a safety point of view they are identical; they are just different APIs for dealing with manually calling drop glue. + +[`ManuallyDrop::drop()`] makes the following claim: + +> Other than changes made by the destructor itself, the memory is left unchanged, and so as far as the compiler is concerned still holds a bit-pattern which is valid for the type T. + +In other words, Rust does _not_ consider these operations to do the same invalidation as a regular "move from" operation, even though they have a similar feel. + +There is an [open issue][ugc-394] about whether `Drop::drop()` is itself allowed to produce uninitialized or invalid memory, so it may not be possible to rely on this in a generic context. + +[`ptr::read()`] similarly claims that it leaves the source memory untouched, which means that it is still a valid value. Of course, [`ptr::read()`] on a pointer pointing to uninitialized memory will still create an uninitialized value. + + +For all of these APIs, actually _using_ the dropped or read-from memory may still be fraught depending on the invariants of the value; it's quite easy to cause a double-free by materializing an owned value from the original data after it has already been read-from or dropped. + +However, they do not produce uninitialized memory. + + +### Freshly allocated memory + +Freshly allocated memory (e.g. the yet-unused bytes in [`Vec::with_capacity()`] or just the result of [`Allocator::allocate()`]) is usually uninitialized. You can use APIs like [`Allocator::zeroed()`] if you wish to avoid this, though you can still end up making [invalid values] the same way you can with [`mem::zeroed()`]. + +Generally after allocating memory one should make sure that the only part of that memory being read from is known to have been written to. This can be tricky in situations around complex data structures like probing hashtables where you have a buffer which only has some segments initialized, determined by complex conditions. + +## When you might end up making an uninitialized value + +Some of the APIs and methods above create uninitialized memory in a pretty straightforward way — don't call [`MaybeUninit::assume_init()`] if things are not actually initialized! + +When writing tricky data structures you may end up mistakenly assuming uninitialized memory is initialized. For example imagine building a probing hashmap, backed with allocated memory: only inhabited buckets will be initialized, and if your logic for determining which buckets are inhabited is broken, your code may risk producing uninitialized values. + +A subtle case is when you *write* to uninitialized memory the wrong way. The following code uses a write to a `*mut String` that is pointing to uninitialized memory, and exhibits undefined behavior: + +```rust,no_run +# use std::mem::MaybeUninit; +let mut val: MaybeUninit = MaybeUninit::uninit(); +let ptr: *mut String = val.as_mut_ptr(); +unsafe { + // UB! + *ptr = String::from("hello world"); +} +``` + +This is UB because writing to raw pointers, under the hood, still calls destructors on the old value, the same way a write to an `&mut T` does. This is usually quite convenient, but here the old value is uninitialized, and calling a destructor on it is undefined. + +APIs like [`ptr::write()`] and [`MaybeUninit::write()`] exist to sidestep this problem. Logically, a write to a raw pointer is functionally the same as a [`ptr::read()`] of the pointer (with the read-value being dropped) followed by a [`ptr::write()`] with the new value. + +## Signs an uninitialized value was involved + +This is largely similar to the situation for [invalid values]: The compiler is allowed to assume memory is never uninitialized, and since uninitialized memory is a kind of invalid value, all of the failure modes of [invalid values] are possible. + +Often when reading from uninitialized memory you'll see reads to the same, unchanged, memory producing different values. + +This is not an exhaustive list: ultimately, having an uninitialized value is UB and it remains illegal even if there are no optimizations that will break. + + + [invalid values]: ../core_unsafety/invalid_values.md + [`mem::uninitialized()`]: https://doc.rust-lang.org/stable/std/mem/fn.uninitialized.html + [`mem::zeroed()`]: https://doc.rust-lang.org/stable/std/mem/fn.zeroed.html + [`MaybeUninit`]: https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html + [`MaybeUninit::assume_init()`]: https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html#method.assume_init + [`MaybeUninit::write()`]: https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html#method.write + [pad-glossary]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#padding + [`ptr::drop_in_place()`]: https://doc.rust-lang.org/stable/std/ptr/fn.drop_in_place.html + [`ManuallyDrop::drop()`]: https://doc.rust-lang.org/stable/std/mem/struct.ManuallyDrop.html#method.drop + [`ptr::read()`]: https://doc.rust-lang.org/stable/std/ptr/fn.read.html + [`ptr::write()`]: https://doc.rust-lang.org/stable/std/ptr/fn.write.html + [ugc-394]: https://github.com/rust-lang/unsafe-code-guidelines/issues/394 + [`Vec::with_capacity()`]: https://doc.rust-lang.org/stable/std/vec/struct.Vec.html#method.with_capacity + [`Allocator::allocate()`]: https://doc.rust-lang.org/stable/std/alloc/trait.Allocator.html#tymethod.allocate + [`Allocator::zeroed()`]: https://doc.rust-lang.org/stable/std/alloc/trait.Allocator.html#method.allocate_zeroed + + + [^1]: The "destructor" is different from the `Drop` trait. Calling the destructor is the process of calling a type's `Drop::drop` impl if it exists, and then calling the destructor for all of its fields (also known as "drop glue"). I.e. it's not _just_ `Drop`, but rather the entire _destruction_, of which the destructor is one part. Types that do not implement `Drop` may still have contentful destructors if their transitive fields do. + \ No newline at end of file From f7884586b1a7607b3fec9e5823185ea9f6a84a8b Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Sat, 4 Mar 2023 13:48:24 -0800 Subject: [PATCH 02/19] address --- src/advanced_unsafety/uninitialized.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/src/advanced_unsafety/uninitialized.md b/src/advanced_unsafety/uninitialized.md index 528f4aa..fc9ccc6 100644 --- a/src/advanced_unsafety/uninitialized.md +++ b/src/advanced_unsafety/uninitialized.md @@ -4,25 +4,25 @@ > > — _Emily Dickinson_ -While we have covered [invalid values], there's another thing that behaves a lot like invalid values, but has nothing to do with actual bit patterns: Uninitialized memory. +While we have covered [invalid values], there's another thing that is a kind of invalid value, but has nothing to do with actual bit patterns: Uninitialized memory. -An easy way to think about uninitialized memory is that there's an additional value (often called `undef` using LLVM's term for it) that does not map to any concrete bit pattern, but can be introduced in abstract in various ways, and makes _most_ values invalid. +An easy way to think about uninitialized memory is that there's an additional value (often called `undef` using LLVM's term for it) that does not map to any concrete bit pattern, but can be introduced in the abstract machine in various ways, and makes _most_ values invalid. -If you explicitly wish to work with uninitialized and partially-initialized types, [`MaybeUninit`] is a useful abstraction since it can be "initialized" with no overhead and then written to in parts. +If you explicitly wish to work with uninitialized and partially-initialized types, [`MaybeUninit`] is a useful abstraction since it can be constructed with no overhead and then written to in parts. ## Sources of uninitialized memory ### `mem::uninitialized()` and `MaybeUninit::assume_init()` -[`mem::uninitialized()`] is a deprecated API that has a very tempting shape, it lets you do things like `let x = mem::uninitialized()` for cases when you want to construct the value in bits. It's basically _always_ UB to use, since it immediately sets `x` to uninitialized memory, which is UB. +[`mem::uninitialized()`] is a deprecated API that has a very tempting shape, it lets you do things like `let x = mem::uninitialized()` for cases when you want to construct the value in bits. It's basically _always_ UB to use, since it immediately sets `x` to uninitialized memory, which is UB, since uninitialized memory is a type of invalid value, and it's unsound to produce invalid values. Use [`MaybeUninit`] instead. -It is still possible to create uninitialized memory using [`MaybeUninit::assume_init()`] if you have not, in fact, assured that things are initialized. +It is still possible to create uninitialized values using [`MaybeUninit::assume_init()`] if you have not, in fact, assured that things are initialized. ### Padding -Padding bytes in structs and enums are often but not always uninitialized. This means that treating a struct as a bag of bytes (by, say, treating `&Struct` as `&[u8; size_of::()]` and reading from there) is UB even if you don't write invalid values to those bytes, since you are ginning up uninitialized `u8`s. +Padding bytes in structs and enums are often but not always uninitialized. This means that treating a struct as a bag of bytes (by, say, treating `&Struct` as `&[u8; size_of::()]` and reading from there) is UB even if you don't write invalid values to those bytes, since you are accessing uninitialized `u8`s. Reading from padding [always produces uninitialized values][pad-glossary]. @@ -48,11 +48,11 @@ unsafe { Any type of move will do this, even when you "move" the value into a different variable with stuff like `let y = x;`. -Note that Rust does let you "partially move" out of fields of a struct, in such a case the whole struct is now no longer a valid value for its type, but you are still allowed to "use" the struct to look at other fields. When doing such things, make sure there are no pointers that still think the struct is whole and valid. +Note that Rust does let you "partially move" out of fields of a struct, in such a case the whole struct is now no longer a valid value for its type, but you are still allowed to "use" the struct to look at other fields, and the value as a whole is no longer usable. When doing such things, make sure there are no pointers that still think the struct is whole and valid. #### Caveat: `ptr::drop_in_place()`, `ManuallyDrop::drop()`, and `ptr::read()` -[`ptr::drop_in_place()`] and [`ManuallyDrop::drop()`] are interesting: they call all the destructor[^1] on a value (or a pointed-to value in the case of the former). From a safety point of view they are identical; they are just different APIs for dealing with manually calling drop glue. +[`ptr::drop_in_place()`] and [`ManuallyDrop::drop()`] are interesting: they both call the destructor[^1] on a value (or a pointed-to value in the case of `drop_in_place`). From the perspective of safety they are identical; they are just different APIs for dealing with manually calling destructors. [`ManuallyDrop::drop()`] makes the following claim: @@ -72,7 +72,7 @@ However, they do not produce uninitialized memory. ### Freshly allocated memory -Freshly allocated memory (e.g. the yet-unused bytes in [`Vec::with_capacity()`] or just the result of [`Allocator::allocate()`]) is usually uninitialized. You can use APIs like [`Allocator::zeroed()`] if you wish to avoid this, though you can still end up making [invalid values] the same way you can with [`mem::zeroed()`]. +Freshly allocated memory (e.g. the yet-unused bytes in [`Vec::with_capacity()`] or just the result of [`Allocator::allocate()`]) is usually uninitialized. You can use APIs like [`Allocator::allocate_zeroed()`] if you wish to avoid this, though you can still end up making [invalid values] the same way you can with [`mem::zeroed()`]. Generally after allocating memory one should make sure that the only part of that memory being read from is known to have been written to. This can be tricky in situations around complex data structures like probing hashtables where you have a buffer which only has some segments initialized, determined by complex conditions. @@ -121,7 +121,7 @@ This is not an exhaustive list: ultimately, having an uninitialized value is UB [ugc-394]: https://github.com/rust-lang/unsafe-code-guidelines/issues/394 [`Vec::with_capacity()`]: https://doc.rust-lang.org/stable/std/vec/struct.Vec.html#method.with_capacity [`Allocator::allocate()`]: https://doc.rust-lang.org/stable/std/alloc/trait.Allocator.html#tymethod.allocate - [`Allocator::zeroed()`]: https://doc.rust-lang.org/stable/std/alloc/trait.Allocator.html#method.allocate_zeroed + [`Allocator::allocate_zeroed()`]: https://doc.rust-lang.org/stable/std/alloc/trait.Allocator.html#method.allocate_zeroed [^1]: The "destructor" is different from the `Drop` trait. Calling the destructor is the process of calling a type's `Drop::drop` impl if it exists, and then calling the destructor for all of its fields (also known as "drop glue"). I.e. it's not _just_ `Drop`, but rather the entire _destruction_, of which the destructor is one part. Types that do not implement `Drop` may still have contentful destructors if their transitive fields do. From 11497502d4b5daa72e217218a405659a445753cc Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Sat, 4 Mar 2023 14:18:23 -0800 Subject: [PATCH 03/19] stricter on padding --- src/advanced_unsafety/uninitialized.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/advanced_unsafety/uninitialized.md b/src/advanced_unsafety/uninitialized.md index fc9ccc6..d066fff 100644 --- a/src/advanced_unsafety/uninitialized.md +++ b/src/advanced_unsafety/uninitialized.md @@ -22,7 +22,7 @@ It is still possible to create uninitialized values using [`MaybeUninit::assume_ ### Padding -Padding bytes in structs and enums are often but not always uninitialized. This means that treating a struct as a bag of bytes (by, say, treating `&Struct` as `&[u8; size_of::()]` and reading from there) is UB even if you don't write invalid values to those bytes, since you are accessing uninitialized `u8`s. +Padding bytes in structs and enums are uninitialized. This means that treating a struct as a bag of bytes (by, say, treating `&Struct` as `&[u8; size_of::()]` and reading from there) is UB even if you don't write invalid values to those bytes, since you are accessing uninitialized `u8`s. Reading from padding [always produces uninitialized values][pad-glossary]. From 4e2a7f0f01a85083f2d01bb208ca8a0969bf038e Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Sat, 4 Mar 2023 18:32:48 -0800 Subject: [PATCH 04/19] don't say undef --- src/advanced_unsafety/uninitialized.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/advanced_unsafety/uninitialized.md b/src/advanced_unsafety/uninitialized.md index d066fff..cdded88 100644 --- a/src/advanced_unsafety/uninitialized.md +++ b/src/advanced_unsafety/uninitialized.md @@ -6,7 +6,7 @@ While we have covered [invalid values], there's another thing that is a kind of invalid value, but has nothing to do with actual bit patterns: Uninitialized memory. -An easy way to think about uninitialized memory is that there's an additional value (often called `undef` using LLVM's term for it) that does not map to any concrete bit pattern, but can be introduced in the abstract machine in various ways, and makes _most_ values invalid. +An easy way to think about uninitialized memory is that there's an additional value that does not map to any concrete bit pattern, but can be introduced in the abstract machine in various ways, and makes _most_ values invalid. If you explicitly wish to work with uninitialized and partially-initialized types, [`MaybeUninit`] is a useful abstraction since it can be constructed with no overhead and then written to in parts. From 7b2dc1a42380f4f3a2986424c2eb0ba586fbd6f6 Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Sat, 4 Mar 2023 18:35:25 -0800 Subject: [PATCH 05/19] padding --- src/advanced_unsafety/uninitialized.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/advanced_unsafety/uninitialized.md b/src/advanced_unsafety/uninitialized.md index cdded88..3b0b735 100644 --- a/src/advanced_unsafety/uninitialized.md +++ b/src/advanced_unsafety/uninitialized.md @@ -22,12 +22,13 @@ It is still possible to create uninitialized values using [`MaybeUninit::assume_ ### Padding +The authors are not yet sure of the specifics here, see [UGC #395][ugc395]. For now, treat the below as advice on how to never make a mistake wrt padding, but there may be some leeway in the actual semantics. + Padding bytes in structs and enums are uninitialized. This means that treating a struct as a bag of bytes (by, say, treating `&Struct` as `&[u8; size_of::()]` and reading from there) is UB even if you don't write invalid values to those bytes, since you are accessing uninitialized `u8`s. Reading from padding [always produces uninitialized values][pad-glossary]. - ### Moved-from values The following code is UB: @@ -122,6 +123,7 @@ This is not an exhaustive list: ultimately, having an uninitialized value is UB [`Vec::with_capacity()`]: https://doc.rust-lang.org/stable/std/vec/struct.Vec.html#method.with_capacity [`Allocator::allocate()`]: https://doc.rust-lang.org/stable/std/alloc/trait.Allocator.html#tymethod.allocate [`Allocator::allocate_zeroed()`]: https://doc.rust-lang.org/stable/std/alloc/trait.Allocator.html#method.allocate_zeroed + [ugc-395]: https://github.com/rust-lang/unsafe-code-guidelines/issues/395 [^1]: The "destructor" is different from the `Drop` trait. Calling the destructor is the process of calling a type's `Drop::drop` impl if it exists, and then calling the destructor for all of its fields (also known as "drop glue"). I.e. it's not _just_ `Drop`, but rather the entire _destruction_, of which the destructor is one part. Types that do not implement `Drop` may still have contentful destructors if their transitive fields do. From 9285e858d7c488bb1728976997e1af8d4906fec4 Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Sun, 5 Mar 2023 13:40:01 -0800 Subject: [PATCH 06/19] improve padding and moving section --- src/advanced_unsafety/uninitialized.md | 51 ++++++++++++++++++-------- 1 file changed, 35 insertions(+), 16 deletions(-) diff --git a/src/advanced_unsafety/uninitialized.md b/src/advanced_unsafety/uninitialized.md index 3b0b735..2e21ed3 100644 --- a/src/advanced_unsafety/uninitialized.md +++ b/src/advanced_unsafety/uninitialized.md @@ -10,6 +10,16 @@ An easy way to think about uninitialized memory is that there's an additional va If you explicitly wish to work with uninitialized and partially-initialized types, [`MaybeUninit`] is a useful abstraction since it can be constructed with no overhead and then written to in parts. +## Safely working with uninitialized memory + +The basic rule of thumb is: never refer to uninitialized values as anything other than a raw pointer or wrapped in `MaybeUninit`. Having a stack value or temporary that is uninitialized and has a type that is not `MaybeUninit` (or an array of `MaybeUninit`s) is always undefined behavior. + +If you need to write to an uninitialized buffer in memory, treat it as `&mut [MaybeUninit]`. If you need to piecewise initialize a struct, use `MaybeUninit`. + + +Similarly with invalid values, there's are open issues ([UGC #77], [UGC #346]) about whether it is UB to have _references_ to uninitialized values. When writing unsafe code we recommend you avoid creating such references, choosing to always use `MaybeUninit`, but when auditing unsafe code there may be causes where a reference to uninitialized values is actually safe as long as no uninitialized value is read out of it. In particular, [UGC #346] indicates that it is extremely unlikely that having `&mut` references to uninitialized values will be immediately UB. + + ## Sources of uninitialized memory ### `mem::uninitialized()` and `MaybeUninit::assume_init()` @@ -22,14 +32,23 @@ It is still possible to create uninitialized values using [`MaybeUninit::assume_ ### Padding -The authors are not yet sure of the specifics here, see [UGC #395][ugc395]. For now, treat the below as advice on how to never make a mistake wrt padding, but there may be some leeway in the actual semantics. +Padding bytes in structs and enums are [usually but not always uninitialized][pad-glossary]. This means that treating a struct as a bag of bytes (by, say, treating `&Struct` as `&[u8; size_of::()]` and reading from there) is UB even if you don't write invalid values to those bytes, since you are accessing uninitialized `u8`s. + +The "usually but not always" caveat can be usefully framed as "padding bytes are uninitialized unless proven otherwise". Padding is a property of types, not memory, and these bytes are set to being uninitialized whenever a type is created or copied/moved around, but they can be written to by getting a reference to the memory behind the type[^1], and will be preserved at that spot in memory as long as the type isn't overwritten as a whole. -Padding bytes in structs and enums are uninitialized. This means that treating a struct as a bag of bytes (by, say, treating `&Struct` as `&[u8; size_of::()]` and reading from there) is UB even if you don't write invalid values to those bytes, since you are accessing uninitialized `u8`s. +For example, treating an initialized byte buffer as an `&Struct` and then later reading the padding bytes will give initialized values. However, treating an initialized byte buffer as an `&mut Struct` and then writing a new `Struct` to it will lead to those bytes becoming uninitialized since the `Struct` copy will "copy" the uninitialized padding bytes. Similarly, using `mem::transmute()` (or `mem::zeroed()`) to transmute a byte buffer to a `Struct` will have the padding be uninitialized, because a typed copy of the `Struct` is occurring. -Reading from padding [always produces uninitialized values][pad-glossary]. +See the discussion in [UGC #395][ugc395] for more examples. -### Moved-from values + +### Freshly allocated memory + +Freshly allocated memory (e.g. the yet-unused bytes in [`Vec::with_capacity()`] or just the result of [`Allocator::allocate()`]) is usually uninitialized. You can use APIs like [`Allocator::allocate_zeroed()`] if you wish to avoid this, though you can still end up making [invalid values] the same way you can with [`mem::zeroed()`]. + +Generally after allocating memory one should make sure that the only part of that memory being read from is known to have been written to. This can be tricky in situations around complex data structures like probing hashtables where you have a buffer which only has some segments initialized, determined by complex conditions. + +### Not exactly uninitialized: Moved-from values The following code is UB: @@ -42,24 +61,28 @@ let ptr = &x as *const String; v.push(x); // move x into the vector unsafe { - // reads from moved-from memory + // dangling pointer reads from moved-from memory let ghost = ptr::read(ptr); } ``` Any type of move will do this, even when you "move" the value into a different variable with stuff like `let y = x;`. +This isn't _quite_ uninitialized: it's just that using after a move is straight up UB in Rust. In particular, unlike most pointers to uninitialized values, this dangling pointer is unsound to *write* to as well. + +Working with dangling pointers can often lead to similar problems as working with uninitialized values. + Note that Rust does let you "partially move" out of fields of a struct, in such a case the whole struct is now no longer a valid value for its type, but you are still allowed to "use" the struct to look at other fields, and the value as a whole is no longer usable. When doing such things, make sure there are no pointers that still think the struct is whole and valid. #### Caveat: `ptr::drop_in_place()`, `ManuallyDrop::drop()`, and `ptr::read()` -[`ptr::drop_in_place()`] and [`ManuallyDrop::drop()`] are interesting: they both call the destructor[^1] on a value (or a pointed-to value in the case of `drop_in_place`). From the perspective of safety they are identical; they are just different APIs for dealing with manually calling destructors. +[`ptr::drop_in_place()`] and [`ManuallyDrop::drop()`] are interesting: they both call the destructor[^2] on a value (or a pointed-to value in the case of `drop_in_place`). From the perspective of safety they are identical; they are just different APIs for dealing with manually calling destructors. [`ManuallyDrop::drop()`] makes the following claim: > Other than changes made by the destructor itself, the memory is left unchanged, and so as far as the compiler is concerned still holds a bit-pattern which is valid for the type T. -In other words, Rust does _not_ consider these operations to do the same invalidation as a regular "move from" operation, even though they have a similar feel. +In other words, Rust does _not_ consider these operations to do the same invalidation as a regular "move from" operation, even though they may have a similar feel. They do not create dangling pointers, and they do not themselves overwrite the memory with an uninitialized value. There is an [open issue][ugc-394] about whether `Drop::drop()` is itself allowed to produce uninitialized or invalid memory, so it may not be possible to rely on this in a generic context. @@ -70,13 +93,6 @@ For all of these APIs, actually _using_ the dropped or read-from memory may stil However, they do not produce uninitialized memory. - -### Freshly allocated memory - -Freshly allocated memory (e.g. the yet-unused bytes in [`Vec::with_capacity()`] or just the result of [`Allocator::allocate()`]) is usually uninitialized. You can use APIs like [`Allocator::allocate_zeroed()`] if you wish to avoid this, though you can still end up making [invalid values] the same way you can with [`mem::zeroed()`]. - -Generally after allocating memory one should make sure that the only part of that memory being read from is known to have been written to. This can be tricky in situations around complex data structures like probing hashtables where you have a buffer which only has some segments initialized, determined by complex conditions. - ## When you might end up making an uninitialized value Some of the APIs and methods above create uninitialized memory in a pretty straightforward way — don't call [`MaybeUninit::assume_init()`] if things are not actually initialized! @@ -108,6 +124,7 @@ Often when reading from uninitialized memory you'll see reads to the same, uncha This is not an exhaustive list: ultimately, having an uninitialized value is UB and it remains illegal even if there are no optimizations that will break. + [invalid values]: ../core_unsafety/invalid_values.md [`mem::uninitialized()`]: https://doc.rust-lang.org/stable/std/mem/fn.uninitialized.html [`mem::zeroed()`]: https://doc.rust-lang.org/stable/std/mem/fn.zeroed.html @@ -124,7 +141,9 @@ This is not an exhaustive list: ultimately, having an uninitialized value is UB [`Allocator::allocate()`]: https://doc.rust-lang.org/stable/std/alloc/trait.Allocator.html#tymethod.allocate [`Allocator::allocate_zeroed()`]: https://doc.rust-lang.org/stable/std/alloc/trait.Allocator.html#method.allocate_zeroed [ugc-395]: https://github.com/rust-lang/unsafe-code-guidelines/issues/395 + [UGC #77]: https://github.com/rust-lang/unsafe-code-guidelines/issues/77 + [UGC #346]: https://github.com/rust-lang/unsafe-code-guidelines/issues/346 - - [^1]: The "destructor" is different from the `Drop` trait. Calling the destructor is the process of calling a type's `Drop::drop` impl if it exists, and then calling the destructor for all of its fields (also known as "drop glue"). I.e. it's not _just_ `Drop`, but rather the entire _destruction_, of which the destructor is one part. Types that do not implement `Drop` may still have contentful destructors if their transitive fields do. + [^1]: Be sure to use `&[MaybeUninit]` if treating a type with uninitialized padding as manipulatable memory! + [^2]: The "destructor" is different from the `Drop` trait. Calling the destructor is the process of calling a type's `Drop::drop` impl if it exists, and then calling the destructor for all of its fields (also known as "drop glue"). I.e. it's not _just_ `Drop`, but rather the entire _destruction_, of which the destructor is one part. Types that do not implement `Drop` may still have contentful destructors if their transitive fields do. \ No newline at end of file From 2d83e2f293d9c8b3186b1b3331a94844d081450a Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Sun, 5 Mar 2023 13:49:15 -0800 Subject: [PATCH 07/19] mention unions --- src/advanced_unsafety/uninitialized.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/src/advanced_unsafety/uninitialized.md b/src/advanced_unsafety/uninitialized.md index 2e21ed3..d0dcaab 100644 --- a/src/advanced_unsafety/uninitialized.md +++ b/src/advanced_unsafety/uninitialized.md @@ -12,7 +12,7 @@ If you explicitly wish to work with uninitialized and partially-initialized type ## Safely working with uninitialized memory -The basic rule of thumb is: never refer to uninitialized values as anything other than a raw pointer or wrapped in `MaybeUninit`. Having a stack value or temporary that is uninitialized and has a type that is not `MaybeUninit` (or an array of `MaybeUninit`s) is always undefined behavior. +The basic rule of thumb is: never refer to uninitialized values as anything other than a raw pointer or wrapped in [`MaybeUninit`]. Having a stack value or temporary that is uninitialized and has a type that is not `MaybeUninit` (or an array of `MaybeUninit`s) is always undefined behavior. If you need to write to an uninitialized buffer in memory, treat it as `&mut [MaybeUninit]`. If you need to piecewise initialize a struct, use `MaybeUninit`. @@ -41,6 +41,13 @@ For example, treating an initialized byte buffer as an `&Struct` and then later See the discussion in [UGC #395][ugc395] for more examples. +### Unions + +Reading a union type as the wrong variant can lead to reading uninitialized memory, for example if the union was initialized to a smaller variant, or if the padding of the two variants doesn't overlap perfectly. + +Rust does not have strict aliasing like C and C++: type punning with a union can be safe as long as the punning does not cause invalid or uninitialized values to show up on the other side. + +[`MaybeUninit`] is actually just a union between `T` and `()` under the hood: the rules for correct usage of `MaybeUninit` are the same as the rules for correct usage of a union. ### Freshly allocated memory From 8cc28a27c9d1ebb11f6d896ce9f0d10ad42089f2 Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Sun, 5 Mar 2023 14:27:20 -0800 Subject: [PATCH 08/19] expand section on what is safe and unsafe about them --- src/advanced_unsafety/uninitialized.md | 31 ++++++++++++++++++++------ 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/src/advanced_unsafety/uninitialized.md b/src/advanced_unsafety/uninitialized.md index d0dcaab..7149359 100644 --- a/src/advanced_unsafety/uninitialized.md +++ b/src/advanced_unsafety/uninitialized.md @@ -6,25 +6,39 @@ While we have covered [invalid values], there's another thing that is a kind of invalid value, but has nothing to do with actual bit patterns: Uninitialized memory. -An easy way to think about uninitialized memory is that there's an additional value that does not map to any concrete bit pattern, but can be introduced in the abstract machine in various ways, and makes _most_ values invalid. - -If you explicitly wish to work with uninitialized and partially-initialized types, [`MaybeUninit`] is a useful abstraction since it can be constructed with no overhead and then written to in parts. ## Safely working with uninitialized memory -The basic rule of thumb is: never refer to uninitialized values as anything other than a raw pointer or wrapped in [`MaybeUninit`]. Having a stack value or temporary that is uninitialized and has a type that is not `MaybeUninit` (or an array of `MaybeUninit`s) is always undefined behavior. +The basic rule of thumb is: never refer to uninitialized memory with anything other than a raw pointer something wrapped in [`MaybeUninit`]. Having a stack value or temporary that is uninitialized and has a type that is not `MaybeUninit` (or an array of `MaybeUninit`s) is always undefined behavior. + +A good model for uninitialized memory is that there's an additional value that does not map to any concrete bit pattern (think of it as "byte value #257"), but can be introduced in the abstract machine in various ways, and makes _most_ values invalid. + +Any attempt to read this byte as a `u8` will be UB, and the presence of this byte in non-padding locations is considered UB for most types. The exceptions to this all fall out of treating it as a property of the byte: + + - Zero-sized types do not care about initialized-ness, since they do not have bytes + - Unions do not care about initialized-ness if they have a variant that does not care about initialized-ness + - [`MaybeUninit`] does not care about initializedness since it is internally a union of `T` and a zero-sized type. + - `[MaybeUninit; N]` [does not care about initializedness][arr-maybeuninit] since it doesn't have any bytes that care about initializedness + + +Fundamentally, initializedness is a property of memory, but whether or not initializedness matters is a property of the *type*. For types that care about initializedness, typed operations working with uninitialized memory are typically UB, and having a value that contains uninitialized memory is immediately UB. + +[`ptr::copy`] is explicitly an *untyped* copy, and thus it will copy all bytes, including padding, and including initialized-ness, to the destination, regardless of the type `T`. + +Most other operations copying a type (for example, `*ptr` and `mem::transmute_copy`) will be typed, and will thus ignore padding and be UB if ever fed uninitialized memory in non-padding positions. This also applies to `let x = y` and `mem::transmute`, however in those cases if the source data were uninitialized that would already have been UB. + -If you need to write to an uninitialized buffer in memory, treat it as `&mut [MaybeUninit]`. If you need to piecewise initialize a struct, use `MaybeUninit`. +If you explicitly wish to work with uninitialized and partially-initialized types, [`MaybeUninit`] is a useful abstraction since it can be constructed with no overhead and then written to in parts. It's also useful to e.g. refer to an uninitialized buffer with things like `&mut [MaybeUninit]`. -Similarly with invalid values, there's are open issues ([UGC #77], [UGC #346]) about whether it is UB to have _references_ to uninitialized values. When writing unsafe code we recommend you avoid creating such references, choosing to always use `MaybeUninit`, but when auditing unsafe code there may be causes where a reference to uninitialized values is actually safe as long as no uninitialized value is read out of it. In particular, [UGC #346] indicates that it is extremely unlikely that having `&mut` references to uninitialized values will be immediately UB. +Similarly with invalid values, there are open issues ([UGC #77], [UGC #346]) about whether it is UB to have _references_ to uninitialized memory. When writing unsafe code we recommend you avoid creating such references, choosing to always use `MaybeUninit`, but when auditing unsafe code there may be causes where a reference to uninitialized values is actually safe as long as no uninitialized value is read out of it. In particular, [UGC #346] indicates that it is extremely unlikely that having `&mut` references to uninitialized values will be immediately UB. ## Sources of uninitialized memory ### `mem::uninitialized()` and `MaybeUninit::assume_init()` -[`mem::uninitialized()`] is a deprecated API that has a very tempting shape, it lets you do things like `let x = mem::uninitialized()` for cases when you want to construct the value in bits. It's basically _always_ UB to use, since it immediately sets `x` to uninitialized memory, which is UB, since uninitialized memory is a type of invalid value, and it's unsound to produce invalid values. +[`mem::uninitialized()`] is a deprecated API that has a very tempting shape, it lets you do things like `let x = mem::uninitialized()` for cases when you want to construct the value in bits. It's almost _always_ UB to use, since it immediately sets `x` to uninitialized memory, which is UB, since uninitialized memory is a type of invalid value for almost all types, and it's unsound to produce invalid values. Use [`MaybeUninit`] instead. @@ -38,6 +52,7 @@ The "usually but not always" caveat can be usefully framed as "padding bytes are For example, treating an initialized byte buffer as an `&Struct` and then later reading the padding bytes will give initialized values. However, treating an initialized byte buffer as an `&mut Struct` and then writing a new `Struct` to it will lead to those bytes becoming uninitialized since the `Struct` copy will "copy" the uninitialized padding bytes. Similarly, using `mem::transmute()` (or `mem::zeroed()`) to transmute a byte buffer to a `Struct` will have the padding be uninitialized, because a typed copy of the `Struct` is occurring. +Because [`ptr::copy`] is an untyped copy, it can be used to copy over explicitly-initialized padding. See the discussion in [UGC #395][ugc395] for more examples. @@ -150,6 +165,8 @@ This is not an exhaustive list: ultimately, having an uninitialized value is UB [ugc-395]: https://github.com/rust-lang/unsafe-code-guidelines/issues/395 [UGC #77]: https://github.com/rust-lang/unsafe-code-guidelines/issues/77 [UGC #346]: https://github.com/rust-lang/unsafe-code-guidelines/issues/346 + [arr-maybeuninit]: https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html#initializing-an-array-element-by-element + [`ptr::copy`]: https://doc.rust-lang.org/stable/std/ptr/fn.copy.html [^1]: Be sure to use `&[MaybeUninit]` if treating a type with uninitialized padding as manipulatable memory! [^2]: The "destructor" is different from the `Drop` trait. Calling the destructor is the process of calling a type's `Drop::drop` impl if it exists, and then calling the destructor for all of its fields (also known as "drop glue"). I.e. it's not _just_ `Drop`, but rather the entire _destruction_, of which the destructor is one part. Types that do not implement `Drop` may still have contentful destructors if their transitive fields do. From 638c6445207c28876691ed7cc022cdd8022812da Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Sun, 5 Mar 2023 23:29:24 +0000 Subject: [PATCH 09/19] Update src/advanced_unsafety/uninitialized.md Co-authored-by: Alice Ryhl --- src/advanced_unsafety/uninitialized.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/advanced_unsafety/uninitialized.md b/src/advanced_unsafety/uninitialized.md index 7149359..d52a1d1 100644 --- a/src/advanced_unsafety/uninitialized.md +++ b/src/advanced_unsafety/uninitialized.md @@ -9,7 +9,7 @@ While we have covered [invalid values], there's another thing that is a kind of ## Safely working with uninitialized memory -The basic rule of thumb is: never refer to uninitialized memory with anything other than a raw pointer something wrapped in [`MaybeUninit`]. Having a stack value or temporary that is uninitialized and has a type that is not `MaybeUninit` (or an array of `MaybeUninit`s) is always undefined behavior. +The basic rule of thumb is: never refer to uninitialized memory with anything other than a raw pointer or something wrapped in [`MaybeUninit`]. Having a stack value or temporary that is uninitialized and has a type that is not `MaybeUninit` (or an array of `MaybeUninit`s) is always undefined behavior. A good model for uninitialized memory is that there's an additional value that does not map to any concrete bit pattern (think of it as "byte value #257"), but can be introduced in the abstract machine in various ways, and makes _most_ values invalid. From 204db04ac08d1d531d93966f08ce8aece0ed6bb4 Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Sun, 5 Mar 2023 23:29:32 +0000 Subject: [PATCH 10/19] Update src/advanced_unsafety/uninitialized.md Co-authored-by: Alice Ryhl --- src/advanced_unsafety/uninitialized.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/advanced_unsafety/uninitialized.md b/src/advanced_unsafety/uninitialized.md index d52a1d1..2fb8806 100644 --- a/src/advanced_unsafety/uninitialized.md +++ b/src/advanced_unsafety/uninitialized.md @@ -13,7 +13,7 @@ The basic rule of thumb is: never refer to uninitialized memory with anything ot A good model for uninitialized memory is that there's an additional value that does not map to any concrete bit pattern (think of it as "byte value #257"), but can be introduced in the abstract machine in various ways, and makes _most_ values invalid. -Any attempt to read this byte as a `u8` will be UB, and the presence of this byte in non-padding locations is considered UB for most types. The exceptions to this all fall out of treating it as a property of the byte: +Any attempt to read uninitialized bytes as an integer will be UB, and the presence of this byte in non-padding locations is considered UB for most types. The exceptions to this all fall out of treating it as a property of the byte: - Zero-sized types do not care about initialized-ness, since they do not have bytes - Unions do not care about initialized-ness if they have a variant that does not care about initialized-ness From c180af1c669546e7f40f9b1ec362b1143088812b Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Sun, 5 Mar 2023 23:30:10 +0000 Subject: [PATCH 11/19] Update src/advanced_unsafety/uninitialized.md Co-authored-by: Alice Ryhl --- src/advanced_unsafety/uninitialized.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/advanced_unsafety/uninitialized.md b/src/advanced_unsafety/uninitialized.md index 2fb8806..3ff2dc9 100644 --- a/src/advanced_unsafety/uninitialized.md +++ b/src/advanced_unsafety/uninitialized.md @@ -60,7 +60,7 @@ See the discussion in [UGC #395][ugc395] for more examples. Reading a union type as the wrong variant can lead to reading uninitialized memory, for example if the union was initialized to a smaller variant, or if the padding of the two variants doesn't overlap perfectly. -Rust does not have strict aliasing like C and C++: type punning with a union can be safe as long as the punning does not cause invalid or uninitialized values to show up on the other side. +Rust does not have strict aliasing like C and C++: type punning with a union is safe as long as the corresponding transmute is safe. [`MaybeUninit`] is actually just a union between `T` and `()` under the hood: the rules for correct usage of `MaybeUninit` are the same as the rules for correct usage of a union. From 1e1d00386da7d54e89e5b66dd84f2e4de10dfd83 Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Mon, 6 Mar 2023 00:26:27 +0000 Subject: [PATCH 12/19] Update src/advanced_unsafety/uninitialized.md Co-authored-by: David Koloski --- src/advanced_unsafety/uninitialized.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/advanced_unsafety/uninitialized.md b/src/advanced_unsafety/uninitialized.md index 3ff2dc9..bcd577d 100644 --- a/src/advanced_unsafety/uninitialized.md +++ b/src/advanced_unsafety/uninitialized.md @@ -31,7 +31,7 @@ Most other operations copying a type (for example, `*ptr` and `mem::transmute_co If you explicitly wish to work with uninitialized and partially-initialized types, [`MaybeUninit`] is a useful abstraction since it can be constructed with no overhead and then written to in parts. It's also useful to e.g. refer to an uninitialized buffer with things like `&mut [MaybeUninit]`. -Similarly with invalid values, there are open issues ([UGC #77], [UGC #346]) about whether it is UB to have _references_ to uninitialized memory. When writing unsafe code we recommend you avoid creating such references, choosing to always use `MaybeUninit`, but when auditing unsafe code there may be causes where a reference to uninitialized values is actually safe as long as no uninitialized value is read out of it. In particular, [UGC #346] indicates that it is extremely unlikely that having `&mut` references to uninitialized values will be immediately UB. +Similarly with invalid values, there are open issues ([UGC #77], [UGC #346]) about whether it is UB to have _references_ to uninitialized memory. When writing unsafe code we recommend you avoid creating such references, choosing to always use `MaybeUninit`. When auditing unsafe code, there may be cases where a references to uninitialized values are actually safe as long as no uninitialized values are read out of it. In particular, [UGC #346] indicates that it is extremely unlikely that having `&mut` references to uninitialized values will be immediately UB. ## Sources of uninitialized memory From 5f4a2ee911a9010115b981968cfe6aa443af3977 Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Mon, 6 Mar 2023 00:26:45 +0000 Subject: [PATCH 13/19] Update src/advanced_unsafety/uninitialized.md Co-authored-by: Alice Ryhl --- src/advanced_unsafety/uninitialized.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/advanced_unsafety/uninitialized.md b/src/advanced_unsafety/uninitialized.md index bcd577d..4b0ad27 100644 --- a/src/advanced_unsafety/uninitialized.md +++ b/src/advanced_unsafety/uninitialized.md @@ -21,7 +21,7 @@ Any attempt to read uninitialized bytes as an integer will be UB, and the presen - `[MaybeUninit; N]` [does not care about initializedness][arr-maybeuninit] since it doesn't have any bytes that care about initializedness -Fundamentally, initializedness is a property of memory, but whether or not initializedness matters is a property of the *type*. For types that care about initializedness, typed operations working with uninitialized memory are typically UB, and having a value that contains uninitialized memory is immediately UB. +Fundamentally, initializedness is a property of memory, but whether or not initializedness matters is a property of the access (in particular, of the *type* used by the access). For types that care about initializedness, typed operations working with uninitialized memory are typically UB, and having a value that contains uninitialized memory is immediately UB. [`ptr::copy`] is explicitly an *untyped* copy, and thus it will copy all bytes, including padding, and including initialized-ness, to the destination, regardless of the type `T`. From 03b8268867b8599b2bba8630e87b2ec1e2a6930b Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Mon, 6 Mar 2023 00:28:09 +0000 Subject: [PATCH 14/19] Update src/advanced_unsafety/uninitialized.md Co-authored-by: David Koloski --- src/advanced_unsafety/uninitialized.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/advanced_unsafety/uninitialized.md b/src/advanced_unsafety/uninitialized.md index 4b0ad27..81bed20 100644 --- a/src/advanced_unsafety/uninitialized.md +++ b/src/advanced_unsafety/uninitialized.md @@ -50,7 +50,7 @@ Padding bytes in structs and enums are [usually but not always uninitialized][pa The "usually but not always" caveat can be usefully framed as "padding bytes are uninitialized unless proven otherwise". Padding is a property of types, not memory, and these bytes are set to being uninitialized whenever a type is created or copied/moved around, but they can be written to by getting a reference to the memory behind the type[^1], and will be preserved at that spot in memory as long as the type isn't overwritten as a whole. -For example, treating an initialized byte buffer as an `&Struct` and then later reading the padding bytes will give initialized values. However, treating an initialized byte buffer as an `&mut Struct` and then writing a new `Struct` to it will lead to those bytes becoming uninitialized since the `Struct` copy will "copy" the uninitialized padding bytes. Similarly, using `mem::transmute()` (or `mem::zeroed()`) to transmute a byte buffer to a `Struct` will have the padding be uninitialized, because a typed copy of the `Struct` is occurring. +For example, treating an initialized byte buffer as an `&Struct` and then later reading the padding bytes will give initialized values. However, treating an initialized byte buffer as an `&mut Struct` and then writing a new `Struct` to it will lead to those bytes becoming uninitialized since the `Struct` copy will "copy" the uninitialized padding bytes. Similarly, using `mem::transmute()` (or `mem::zeroed()`) to transmute a byte buffer to a `Struct` will uninitialize the padding because it performs a typed copy of the `Struct`. Because [`ptr::copy`] is an untyped copy, it can be used to copy over explicitly-initialized padding. From 22c1947a14c7ec73a21836d6f583ed02a309ee90 Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Mon, 6 Mar 2023 00:28:21 +0000 Subject: [PATCH 15/19] Update src/advanced_unsafety/uninitialized.md Co-authored-by: David Koloski --- src/advanced_unsafety/uninitialized.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/advanced_unsafety/uninitialized.md b/src/advanced_unsafety/uninitialized.md index 81bed20..4fd2b2a 100644 --- a/src/advanced_unsafety/uninitialized.md +++ b/src/advanced_unsafety/uninitialized.md @@ -38,7 +38,7 @@ Similarly with invalid values, there are open issues ([UGC #77], [UGC #346]) abo ### `mem::uninitialized()` and `MaybeUninit::assume_init()` -[`mem::uninitialized()`] is a deprecated API that has a very tempting shape, it lets you do things like `let x = mem::uninitialized()` for cases when you want to construct the value in bits. It's almost _always_ UB to use, since it immediately sets `x` to uninitialized memory, which is UB, since uninitialized memory is a type of invalid value for almost all types, and it's unsound to produce invalid values. +[`mem::uninitialized()`] is a deprecated API that has a very tempting shape: it lets you do things like `let x = mem::uninitialized()` when you want to construct an uninitialized value. It's _almost always_ UB to use since it immediately sets `x` to uninitialized memory, which is UB because uninitialized memory is an invalid value for almost all types and it's unsound to produce invalid values. Use [`MaybeUninit`] instead. From 5d4399b66888688b61eb73311a43a180be660958 Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Mon, 6 Mar 2023 00:28:29 +0000 Subject: [PATCH 16/19] Update src/advanced_unsafety/uninitialized.md Co-authored-by: David Koloski --- src/advanced_unsafety/uninitialized.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/advanced_unsafety/uninitialized.md b/src/advanced_unsafety/uninitialized.md index 4fd2b2a..8b4173b 100644 --- a/src/advanced_unsafety/uninitialized.md +++ b/src/advanced_unsafety/uninitialized.md @@ -58,7 +58,7 @@ See the discussion in [UGC #395][ugc395] for more examples. ### Unions -Reading a union type as the wrong variant can lead to reading uninitialized memory, for example if the union was initialized to a smaller variant, or if the padding of the two variants doesn't overlap perfectly. +Reading a union type as the wrong variant can lead to reading uninitialized memory, for example if the union was initialized to a smaller variant, or if the padding of the two variants don't overlap perfectly. Rust does not have strict aliasing like C and C++: type punning with a union is safe as long as the corresponding transmute is safe. From 916e2f7d5ffe18d0c6976f6fc0f9982105d0b002 Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Sun, 5 Mar 2023 16:33:22 -0800 Subject: [PATCH 17/19] be precise --- src/advanced_unsafety/uninitialized.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/advanced_unsafety/uninitialized.md b/src/advanced_unsafety/uninitialized.md index 8b4173b..f302673 100644 --- a/src/advanced_unsafety/uninitialized.md +++ b/src/advanced_unsafety/uninitialized.md @@ -13,10 +13,10 @@ The basic rule of thumb is: never refer to uninitialized memory with anything ot A good model for uninitialized memory is that there's an additional value that does not map to any concrete bit pattern (think of it as "byte value #257"), but can be introduced in the abstract machine in various ways, and makes _most_ values invalid. -Any attempt to read uninitialized bytes as an integer will be UB, and the presence of this byte in non-padding locations is considered UB for most types. The exceptions to this all fall out of treating it as a property of the byte: +Any attempt to read uninitialized bytes as a "type that cares about initializedness" will be UB, and the presence of this byte in non-padding locations is considered UB for most types. Most types care about initialized-ness; and the list of types that doesn't derives from treating initializedness as a property of the byte: - - Zero-sized types do not care about initialized-ness, since they do not have bytes - - Unions do not care about initialized-ness if they have a variant that does not care about initialized-ness + - Zero-sized types do not care about initializedness, since they do not have bytes + - Unions do not care about initializedness if they have a variant that does not care about initialized-ness - [`MaybeUninit`] does not care about initializedness since it is internally a union of `T` and a zero-sized type. - `[MaybeUninit; N]` [does not care about initializedness][arr-maybeuninit] since it doesn't have any bytes that care about initializedness @@ -31,7 +31,7 @@ Most other operations copying a type (for example, `*ptr` and `mem::transmute_co If you explicitly wish to work with uninitialized and partially-initialized types, [`MaybeUninit`] is a useful abstraction since it can be constructed with no overhead and then written to in parts. It's also useful to e.g. refer to an uninitialized buffer with things like `&mut [MaybeUninit]`. -Similarly with invalid values, there are open issues ([UGC #77], [UGC #346]) about whether it is UB to have _references_ to uninitialized memory. When writing unsafe code we recommend you avoid creating such references, choosing to always use `MaybeUninit`. When auditing unsafe code, there may be cases where a references to uninitialized values are actually safe as long as no uninitialized values are read out of it. In particular, [UGC #346] indicates that it is extremely unlikely that having `&mut` references to uninitialized values will be immediately UB. +Similarly with invalid values, there are open issues ([UGC #77], [UGC #346]) about whether it is UB to have _references_ to uninitialized memory. When writing unsafe code we recommend you avoid creating such references, choosing to always use `MaybeUninit`. When auditing unsafe code, there may be cases where references to uninitialized values are actually safe as long as no uninitialized values are read out of it. In particular, [UGC #346] indicates that it is extremely unlikely that having `&mut` references to uninitialized values will be immediately UB. ## Sources of uninitialized memory From 69fd9f7f90074de0f44e41b3d49b387330fd788d Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Sun, 5 Mar 2023 16:42:48 -0800 Subject: [PATCH 18/19] +alice's comments --- src/advanced_unsafety/uninitialized.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/src/advanced_unsafety/uninitialized.md b/src/advanced_unsafety/uninitialized.md index f302673..93fe907 100644 --- a/src/advanced_unsafety/uninitialized.md +++ b/src/advanced_unsafety/uninitialized.md @@ -25,7 +25,7 @@ Fundamentally, initializedness is a property of memory, but whether or not initi [`ptr::copy`] is explicitly an *untyped* copy, and thus it will copy all bytes, including padding, and including initialized-ness, to the destination, regardless of the type `T`. -Most other operations copying a type (for example, `*ptr` and `mem::transmute_copy`) will be typed, and will thus ignore padding and be UB if ever fed uninitialized memory in non-padding positions. This also applies to `let x = y` and `mem::transmute`, however in those cases if the source data were uninitialized that would already have been UB. +Most other operations copying a type (for example, `*ptr` and `mem::transmute_copy`) will be typed, and will thus ignore padding and be UB if ever fed uninitialized memory in non-padding positions (assuming the type involved cares about initializedness). This also applies to `let x = y` and `mem::transmute`, however in those cases if the source data were uninitialized that would already have been UB. If you explicitly wish to work with uninitialized and partially-initialized types, [`MaybeUninit`] is a useful abstraction since it can be constructed with no overhead and then written to in parts. It's also useful to e.g. refer to an uninitialized buffer with things like `&mut [MaybeUninit]`. @@ -42,13 +42,15 @@ Similarly with invalid values, there are open issues ([UGC #77], [UGC #346]) abo Use [`MaybeUninit`] instead. -It is still possible to create uninitialized values using [`MaybeUninit::assume_init()`] if you have not, in fact, assured that things are initialized. +It is still possible to create uninitialized values using [`MaybeUninit::uninit()`] with [`MaybeUninit::assume_init()`] if you have not, in fact, assured that things are initialized. + +`mem::uninitialized()` is exactly equivalent to `MaybeUninit::uninit().assume_init()`, but it is deprecated since `MaybeUninit` actually provides the flexibility needed to deal with uninitialized memory safely. ### Padding Padding bytes in structs and enums are [usually but not always uninitialized][pad-glossary]. This means that treating a struct as a bag of bytes (by, say, treating `&Struct` as `&[u8; size_of::()]` and reading from there) is UB even if you don't write invalid values to those bytes, since you are accessing uninitialized `u8`s. -The "usually but not always" caveat can be usefully framed as "padding bytes are uninitialized unless proven otherwise". Padding is a property of types, not memory, and these bytes are set to being uninitialized whenever a type is created or copied/moved around, but they can be written to by getting a reference to the memory behind the type[^1], and will be preserved at that spot in memory as long as the type isn't overwritten as a whole. +The "usually but not always" caveat can be usefully framed as "padding bytes are uninitialized unless proven otherwise". Padding is a property of the _access_ (i.e., the _type_), not memory, and these bytes are set to being uninitialized whenever a type is created or copied/moved around, but they can be written to by getting a reference to the memory behind the type[^1], and will be preserved at that spot in memory as long as the type isn't overwritten as a whole. For example, treating an initialized byte buffer as an `&Struct` and then later reading the padding bytes will give initialized values. However, treating an initialized byte buffer as an `&mut Struct` and then writing a new `Struct` to it will lead to those bytes becoming uninitialized since the `Struct` copy will "copy" the uninitialized padding bytes. Similarly, using `mem::transmute()` (or `mem::zeroed()`) to transmute a byte buffer to a `Struct` will uninitialize the padding because it performs a typed copy of the `Struct`. @@ -115,6 +117,8 @@ For all of these APIs, actually _using_ the dropped or read-from memory may stil However, they do not produce uninitialized memory. +Still, it is convenient when writing unsafe code to operate as if these functions produce uninitialized memory on the original source location. + ## When you might end up making an uninitialized value Some of the APIs and methods above create uninitialized memory in a pretty straightforward way — don't call [`MaybeUninit::assume_init()`] if things are not actually initialized! @@ -152,6 +156,7 @@ This is not an exhaustive list: ultimately, having an uninitialized value is UB [`mem::zeroed()`]: https://doc.rust-lang.org/stable/std/mem/fn.zeroed.html [`MaybeUninit`]: https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html [`MaybeUninit::assume_init()`]: https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html#method.assume_init + [`MaybeUninit::uninit()`]: https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html#method.uninit [`MaybeUninit::write()`]: https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html#method.write [pad-glossary]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#padding [`ptr::drop_in_place()`]: https://doc.rust-lang.org/stable/std/ptr/fn.drop_in_place.html From 02ea208fa78fbc271c4387d71f918bd2503335ce Mon Sep 17 00:00:00 2001 From: Manish Goregaokar Date: Sun, 5 Mar 2023 16:51:49 -0800 Subject: [PATCH 19/19] mention freeing --- src/advanced_unsafety/uninitialized.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/advanced_unsafety/uninitialized.md b/src/advanced_unsafety/uninitialized.md index 93fe907..ad2b0ea 100644 --- a/src/advanced_unsafety/uninitialized.md +++ b/src/advanced_unsafety/uninitialized.md @@ -92,7 +92,7 @@ unsafe { Any type of move will do this, even when you "move" the value into a different variable with stuff like `let y = x;`. -This isn't _quite_ uninitialized: it's just that using after a move is straight up UB in Rust. In particular, unlike most pointers to uninitialized values, this dangling pointer is unsound to *write* to as well. +This isn't _quite_ uninitialized: it's just that using after a move is straight up UB in Rust, much like reading from freed memory. In particular, unlike most pointers to uninitialized values, this dangling pointer is unsound to *write* to as well. Working with dangling pointers can often lead to similar problems as working with uninitialized values.