Skip to content

Commit 5ae76b5

Browse files
committed
auto merge of #5428 : thestinger/rust/tutorial, r=catamorphism
My goal is to explain the underlying concepts first (destructors and then ownership) with plenty of step-by-step examples, so that thoroughly explaining the semantics of mutability, boxes, moves, etc. is a breeze. I'm trying to avoid the comparisons with C++ that were done before, because this should be approachable for people coming from any language. C++ programmers already know these concepts so they aren't the audience that needs to be catered to. Comparisons with other languages can be done in separate documents (like [this one](https://github.com/mozilla/rust/wiki/Rust-for-CXX-programmers)). This still needs examples for ownership (inherited mutability), owned boxes and managed boxes.
2 parents e188894 + 9967dc8 commit 5ae76b5

File tree

1 file changed

+160
-130
lines changed

1 file changed

+160
-130
lines changed

doc/tutorial.md

+160-130
Original file line numberDiff line numberDiff line change
@@ -853,170 +853,184 @@ as in this example that unpacks the first value from a tuple and returns it.
853853
fn first((value, _): (int, float)) -> int { value }
854854
~~~
855855

856-
# Boxes and pointers
857-
858-
Many modern languages have a so-called "uniform representation" for
859-
aggregate types like structs and enums, so as to represent these types
860-
as pointers to heap memory by default. In contrast, Rust, like C and
861-
C++, represents such types directly. Another way to say this is that
862-
aggregate data in Rust are *unboxed*. This means that if you `let x =
863-
Point { x: 1f, y: 1f };`, you are creating a struct on the stack. If you
864-
then copy it into a data structure, you copy the entire struct, not
865-
just a pointer.
866-
867-
For small structs like `Point`, this is usually more efficient than
868-
allocating memory and indirecting through a pointer. But for big structs, or
869-
those with mutable fields, it can be useful to have a single copy on
870-
the stack or on the heap, and refer to that through a pointer.
871-
872-
Whenever memory is allocated on the heap, the program needs a strategy to
873-
dispose of the memory when no longer needed. Most languages, such as Java or
874-
Python, use *garbage collection* for this, a strategy in which the program
875-
periodically searches for allocations that are no longer reachable in order
876-
to dispose of them. Other languages, such as C, use *manual memory
877-
management*, which relies on the programmer to specify when memory should be
878-
reclaimed.
879-
880-
Rust is in a different position. It differs from the garbage-collected
881-
environments in that allows the programmer to choose the disposal
882-
strategy on an object-by-object basis. Not only does this have benefits for
883-
performance, but we will later see that this model has benefits for
884-
concurrency as well, by making it possible for the Rust compiler to detect
885-
data races at compile time. Rust also differs from the manually managed
886-
languages in that it is *safe*—it uses a [pointer lifetime
887-
analysis][borrow] to ensure that manual memory management cannot cause memory
888-
errors at runtime.
856+
# Destructors
889857

890-
[borrow]: tutorial-borrowed-ptr.html
858+
C-style resource management requires the programmer to match every allocation
859+
with a free, which means manually tracking the responsibility for cleaning up
860+
(the owner). Correctness is left to the programmer, and it's easy to get wrong.
891861

892-
The cornerstone of Rust's memory management is the concept of a *smart
893-
pointer*—a pointer type that indicates the lifetime of the object it points
894-
to. This solution is familiar to C++ programmers; Rust differs from C++,
895-
however, in that a small set of smart pointers are built into the language.
896-
The safe pointer types are `@T`, for *managed* boxes allocated on the *local
897-
heap*, `~T`, for *uniquely-owned* boxes allocated on the *exchange
898-
heap*, and `&T`, for *borrowed* pointers, which may point to any memory, and
899-
whose lifetimes are governed by the call stack.
862+
The following code demonstrates manual memory management, in order to contrast
863+
it with Rust's resource management. Rust enforces safety, so the `unsafe`
864+
keyword is used to explicitly wrap the unsafe code. The keyword is a promise to
865+
the compiler that unsafety does not leak outside of the unsafe block, and is
866+
used to create safe concepts on top of low-level code.
900867

901-
All pointer types can be dereferenced with the `*` unary operator.
868+
~~~~
869+
use core::libc::funcs::c95::stdlib::{calloc, free};
870+
use core::libc::types::os::arch::c95::size_t;
902871
903-
> ***Note***: You may also hear managed boxes referred to as 'shared
904-
> boxes' or 'shared pointers', and owned boxes as 'unique boxes/pointers'.
905-
> Borrowed pointers are sometimes called 'region pointers'. The preferred
906-
> terminology is what we present here.
872+
fn main() {
873+
unsafe {
874+
let a = calloc(1, int::bytes as size_t);
907875
908-
## Managed boxes
876+
let d;
909877
910-
Managed boxes are pointers to heap-allocated, garbage-collected
911-
memory. Applying the unary `@` operator to an expression creates a
912-
managed box. The resulting box contains the result of the
913-
expression. Copying a managed box, as happens during assignment, only
914-
copies a pointer, never the contents of the box.
878+
{
879+
let b = calloc(1, int::bytes as size_t);
880+
881+
let c = calloc(1, int::bytes as size_t);
882+
d = c; // move ownership to d
883+
884+
free(b);
885+
}
915886
887+
free(d);
888+
free(a);
889+
}
890+
}
916891
~~~~
917-
let x: @int = @10; // New box
918-
let y = x; // Copy of a pointer to the same box
919892

920-
// x and y both refer to the same allocation. When both go out of scope
921-
// then the allocation will be freed.
893+
Rust uses destructors to handle the release of resources like memory
894+
allocations, files and sockets. An object will only be destroyed when there is
895+
no longer any way to access it, which prevents dynamic failures from an attempt
896+
to use a freed resource. When a task fails, the stack unwinds and the
897+
destructors of all objects owned by that task are called.
898+
899+
The unsafe code from above can be contained behind a safe API that prevents
900+
memory leaks or use-after-free:
901+
922902
~~~~
903+
use core::libc::funcs::c95::stdlib::{calloc, free};
904+
use core::libc::types::common::c95::c_void;
905+
use core::libc::types::os::arch::c95::size_t;
923906
924-
A _managed_ type is either of the form `@T` for some type `T`, or any
925-
type that contains managed boxes or other managed types.
907+
struct Blob { priv ptr: *c_void }
926908
927-
~~~
928-
// A linked list node
929-
struct Node {
930-
next: MaybeNode,
931-
prev: MaybeNode,
932-
payload: int
909+
impl Blob {
910+
static fn new() -> Blob {
911+
unsafe { Blob{ptr: calloc(1, int::bytes as size_t)} }
912+
}
933913
}
934914
935-
enum MaybeNode {
936-
SomeNode(@mut Node),
937-
NoNode
915+
impl Drop for Blob {
916+
fn finalize(&self) {
917+
unsafe { free(self.ptr); }
918+
}
938919
}
939920
940-
let node1 = @mut Node { next: NoNode, prev: NoNode, payload: 1 };
941-
let node2 = @mut Node { next: NoNode, prev: NoNode, payload: 2 };
942-
let node3 = @mut Node { next: NoNode, prev: NoNode, payload: 3 };
921+
fn main() {
922+
let a = Blob::new();
943923
944-
// Link the three list nodes together
945-
node1.next = SomeNode(node2);
946-
node2.prev = SomeNode(node1);
947-
node2.next = SomeNode(node3);
948-
node3.prev = SomeNode(node2);
949-
~~~
924+
let d;
950925
951-
Managed boxes never cross task boundaries. This has several benefits for
952-
performance:
926+
{
927+
let b = Blob::new();
953928
954-
* The Rust garbage collector does not need to stop multiple threads in order
955-
to collect garbage.
929+
let c = Blob::new();
930+
d = c; // move ownership to d
956931
957-
* You can separate your application into "real-time" tasks that do not use
958-
the garbage collector and "non-real-time" tasks that do, and the real-time
959-
tasks will not be interrupted by the non-real-time tasks.
932+
// b is destroyed here
933+
}
960934
961-
C++ programmers will recognize `@T` as similar to `std::shared_ptr<T>`.
935+
// d is destroyed here
936+
// a is destroyed here
937+
}
938+
~~~~
962939

963-
> ***Note:*** Currently, the Rust compiler generates code to reclaim
964-
> managed boxes through reference counting and a cycle collector, but
965-
> we will switch to a tracing garbage collector eventually.
940+
This pattern is common enough that Rust includes dynamically allocated memory
941+
as first-class types (`~` and `@`). Non-memory resources like files are cleaned
942+
up with custom destructors.
966943

967-
## Owned boxes
944+
~~~~
945+
fn main() {
946+
let a = ~0;
968947
969-
In contrast with managed boxes, owned boxes have a single owning
970-
memory slot and thus two owned boxes may not refer to the same
971-
memory. All owned boxes across all tasks are allocated on a single
972-
_exchange heap_, where their uniquely-owned nature allows tasks to
973-
exchange them efficiently.
948+
let d;
974949
975-
Because owned boxes are uniquely owned, copying them requires allocating
976-
a new owned box and duplicating the contents.
977-
Instead, owned boxes are _moved_ by default, transferring ownership,
978-
and deinitializing the previously owning variable.
979-
Any attempt to access a variable after the value has been moved out
980-
will result in a compile error.
950+
{
951+
let b = ~0;
981952
982-
~~~~
983-
let x = ~10;
984-
// Move x to y, deinitializing x
985-
let y = x;
986-
~~~~
953+
let c = ~0;
954+
d = c; // move ownership to d
987955
988-
If you really want to copy an owned box you must say so explicitly.
956+
// b is destroyed here
957+
}
989958
959+
// d is destroyed here
960+
// a is destroyed here
961+
}
990962
~~~~
991-
let x = ~10;
992-
let y = copy x;
993963

994-
let z = *x + *y;
995-
fail_unless!(z == 20);
996-
~~~~
964+
# Ownership
965+
966+
Rust formalizes the concept of object ownership to delegate management of an
967+
object's lifetime to either a variable or a task-local garbage collector. An
968+
object's owner is responsible for managing the lifetime of the object by
969+
calling the destructor, and the owner determines whether the object is mutable.
970+
971+
Ownership is recursive, so mutability is inherited recursively and a destructor
972+
destroys the contained tree of owned objects. Variables are top-level owners
973+
and destroy the contained object when they go out of scope. A box managed by
974+
the garbage collector starts a new ownership tree, and the destructor is called
975+
when it is collected.
976+
977+
If an object doesn't contain garbage-collected boxes, it consists of a single
978+
ownership tree and is given the `Owned` trait which allows it to be sent
979+
between tasks.
980+
981+
# Boxes
982+
983+
Many modern languages represent values as as pointers to heap memory by
984+
default. In contrast, Rust, like C and C++, represents such types directly.
985+
Another way to say this is that aggregate data in Rust are *unboxed*. This
986+
means that if you `let x = Point { x: 1f, y: 1f };`, you are creating a struct
987+
on the stack. If you then copy it into a data structure, you copy the entire
988+
struct, not just a pointer.
989+
990+
For small structs like `Point`, this is usually more efficient than allocating
991+
memory and indirecting through a pointer. But for big structs, or mutable
992+
state, it can be useful to have a single copy on the stack or on the heap, and
993+
refer to that through a pointer.
994+
995+
## Owned boxes
996+
997+
An owned box (`~`) is a uniquely owned allocation on the heap. An owned box
998+
inherits the mutability and lifetime of the owner as it would if there was no
999+
box. The purpose of an owned box is to add a layer of indirection in order to
1000+
create recursive data structures or cheaply pass around an object larger than a
1001+
pointer.
9971002

998-
When they do not contain any managed boxes, owned boxes can be sent
999-
to other tasks. The sending task will give up ownership of the box
1000-
and won't be able to access it afterwards. The receiving task will
1001-
become the sole owner of the box. This prevents *data races*—errors
1002-
that could otherwise result from multiple tasks working on the same
1003-
data without synchronization.
1003+
## Managed boxes
1004+
1005+
A managed box (`@`) is a heap allocation with the lifetime managed by a
1006+
task-local garbage collector. It will be destroyed at some point after there
1007+
are no references left to the box, no later than the end of the task. Managed
1008+
boxes lack an owner, so they start a new ownership tree and don't inherit
1009+
mutability. They do own the contained object, and mutability is defined by the
1010+
type of the shared box (`@` or `@mut`). An object containing a managed box is
1011+
not `Owned`, and can't be sent between tasks.
1012+
1013+
# Move semantics
10041014

1005-
When an owned pointer goes out of scope or is overwritten, the object
1006-
it points to is immediately freed. Effective use of owned boxes can
1007-
therefore be an efficient alternative to garbage collection.
1015+
Rust uses a shallow copy for parameter passing, assignment and returning values
1016+
from functions. A shallow copy is considered a move of ownership if the
1017+
ownership tree of the copied value includes an owned box or a type with a
1018+
custom destructor. After a value has been moved, it can no longer be used from
1019+
the source location and will not be destroyed there.
10081020

1009-
C++ programmers will recognize `~T` as similar to `std::unique_ptr<T>`
1010-
(or `std::auto_ptr<T>` in C++03 and below).
1021+
~~~~
1022+
let x = ~5;
1023+
let y = x.clone(); // y is a newly allocated box
1024+
let z = x; // no new memory allocated, x can no longer be used
1025+
~~~~
10111026

1012-
## Borrowed pointers
1027+
# Borrowed pointers
10131028

1014-
Rust borrowed pointers are a general purpose reference/pointer type,
1015-
similar to the C++ reference type, but guaranteed to point to valid
1016-
memory. In contrast with owned pointers, where the holder of an owned
1017-
pointer is the owner of the pointed-to memory, borrowed pointers never
1018-
imply ownership. Pointers may be borrowed from any type, in which case
1019-
the pointer is guaranteed not to outlive the value it points to.
1029+
Rust's borrowed pointers are a general purpose reference type. In contrast with
1030+
owned pointers, where the holder of an owned pointer is the owner of the
1031+
pointed-to memory, borrowed pointers never imply ownership. A pointer can be
1032+
borrowed to any object, and the compiler verifies that it cannot outlive the
1033+
lifetime of the object.
10201034

10211035
As an example, consider a simple struct type, `Point`:
10221036

@@ -1099,7 +1113,23 @@ For a more in-depth explanation of borrowed pointers, read the
10991113
11001114
[borrowtut]: tutorial-borrowed-ptr.html
11011115
1102-
## Dereferencing pointers
1116+
## Freezing
1117+
1118+
Borrowing an immutable pointer to an object freezes it and prevents mutation.
1119+
`Owned` objects have freezing enforced statically at compile-time. Mutable
1120+
managed boxes handle freezing dynamically when any of their contents are
1121+
borrowed, and the task will fail if an attempt to modify them is made while
1122+
they are frozen.
1123+
1124+
~~~~
1125+
let mut x = 5;
1126+
{
1127+
let y = &x; // x is now frozen, it cannot be modified
1128+
}
1129+
// x is now unfrozen again
1130+
~~~~
1131+
1132+
# Dereferencing pointers
11031133
11041134
Rust uses the unary star operator (`*`) to access the contents of a
11051135
box or pointer, similarly to C.

0 commit comments

Comments
 (0)