From 72507c4174f108b33748bf2608afa90b74d98ef1 Mon Sep 17 00:00:00 2001 From: Dylan MacKenzie Date: Sat, 12 Sep 2020 19:20:37 -0700 Subject: [PATCH 1/4] Initial draft of MIR dataflow framework docs --- src/SUMMARY.md | 1 + src/mir/dataflow.md | 165 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 166 insertions(+) create mode 100644 src/mir/dataflow.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index d1560c960..25ea76e78 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -110,6 +110,7 @@ - [Variance](./variance.md) - [Opaque Types](./opaque-types-type-alias-impl-trait.md) - [Pattern and Exhaustiveness Checking](./pat-exhaustive-checking.md) +- [MIR dataflow](./mir/dataflow.md) - [The borrow checker](./borrow_check.md) - [Tracking moves and initialization](./borrow_check/moves_and_initialization.md) - [Move paths](./borrow_check/moves_and_initialization/move_paths.md) diff --git a/src/mir/dataflow.md b/src/mir/dataflow.md new file mode 100644 index 000000000..abe04e376 --- /dev/null +++ b/src/mir/dataflow.md @@ -0,0 +1,165 @@ +# Dataflow Analysis + +If you work on the MIR, you will frequently come across various flavors of +[dataflow analysis][wiki]. For example, `rustc` uses dataflow to find +uninitialized variables, determine what variables are live across a generator +`yield` statement, and compute which `Place`s are borrowed at a given point in +the control-flow graph. Dataflow analysis is a fundamental concept in modern +compilers, and knowledge of the subject will be helpful to prospective +contributors. + +However, this documentation is not a general introduction to dataflow analysis. +It is merely a description of the framework used to define these analyses in +`rustc`. It assumes that the reader is familiar with some basic terminology, +such as "transfer function", "fixpoint" and "lattice". If you're unfamiliar +with these terms, or if you want a quick refresher, [*Static Program Analysis*] +by Anders Møller and Michael I. Schwartzbach is an excellent, freely available +textbook. For those who prefer audiovisual learning, the Goethe University +Frankfurt has published a series of short [youtube lectures][goethe] in English +that are very approachable. + +## Defining a Dataflow Analysis + +The interface for dataflow analyses is split into three traits. The first is +[`AnalysisDomain`], which must be implemented by *all* analyses. In addition to +the type of the dataflow state, this trait defines the initial value of that +state at entry to each block, as well as the direction of the analysis, either +forward or backward. The domain of your dataflow analysis must be a [lattice][] +(strictly speaking a join-semilattice) with a well-behaved `join` operator. See +documentation for the [`lattice`] module, as well as the [`JoinSemiLattice`] +trait, for more information. + +You must then provide *either* a direct implementation of the [`Analysis`] trait +*or* an implementation of the proxy trait [`GenKillAnalysis`]. The latter is for +so-called ["gen-kill" problems], which have a simple class of transfer function +that can be applied very efficiently. Analyses whose domain is not a `BitSet` +of some index type, or whose transfer functions cannot be expressed through +"gen" and "kill" operations, must implement `Analysis` directly, and will run +slower as a result. All implementers of `GenKillAnalysis` also implement +`Analysis` automatically via a default `impl`. + + +```text + AnalysisDomain + ^ + | | = has as a supertrait + | . = provides a default impl for + | + Analysis + ^ ^ + | . + | . + | . + GenKillAnalysis + +``` + +### Transfer Functions and Effects + +The dataflow framework in `rustc` allows each statement inside a basic block as +well as the terminator to define its own transfer function. For brevity, these +individual transfer functions are known as "effects". Each effect is applied +successively in dataflow order, and together they define the transfer function +for the entire basic block. It's also possible to define an effect for +particular outgoing edges of some terminators (e.g. +[`apply_call_return_effect`] for the `success` edge of a `Call` +terminator). Collectively, these are known as per-edge effects. + +The only meaningful difference (besides the "apply" prefix) between the methods +of the `GenKillAnalysis` trait and the `Analysis` trait is that an `Analysis` +has direct, mutable access to the dataflow state, whereas a `GenKillAnalysis` +only sees an implementer of the `GenKill` trait, which only allows the `gen` +and `kill` operations for mutation. + +Observant readers of the documentation for these traits may notice that there +are actually *two* possible effects for each statement and terminator, the +"before" effect and the unprefixed (or "primary") effect. The "before" effects +are applied immediately before the unprefixed effect **regardless of whether +the analysis is backward or forward**. The vast majority of analyses should use +only the unprefixed effects: Having multiple effects for each statement makes +it difficult for consumers to know where they should be looking. However, the +"before" variants can be useful in some scenarios, such as when the effect of +the right-hand side of an assignment statement must be considered separately +from the left-hand side. + +### Convergence + +TODO + +## Inspecting the Results of a Dataflow Analysis + +Once you have constructed an analysis, you must pass it to an [`Engine`], which +is responsible for finding the steady-state solution to your dataflow problem. +You should use the [`into_engine`] method defined on the `Analysis` trait for +this, since it will use the more efficient `Engine::new_gen_kill` constructor +when possible. + +Calling `iterate_to_fixpoint` on your `Engine` will return a `Results`, which +contains the dataflow state at fixpoint upon entry of each block. Once you have +a `Results`, you can can inspect the dataflow state at fixpoint at any point in +the CFG. If you only need the state at a few locations (e.g., each `Drop` +terminator) use a [`ResultsCursor`]. If you need the state at *every* location, +a [`ResultsVisitor`] will be more efficient. + +```text + Analysis + | + | into_engine(…) + | + Engine + | + | iterate_to_fixpoint() + | + Results + / \ + into_results_cursor(…) / \ visit_with(…) + / \ + ResultsCursor ResultsVisitor +``` + +For example, the following code uses a [`ResultsVisitor`]... + + +```rust,ignore +// Assuming `MyVisitor` implements `ResultsVisitor`... +let my_visitor = MyVisitor::new(); + +// inspect the fixpoint state for every location within every block in RPO. +let results = MyAnalysis() + .into_engine(tcx, body, def_id) + .iterate_to_fixpoint() + .visit_with(body, traversal::reverse_postorder(body), &mut my_visitor); +``` + +whereas this code uses [`ResultsCursor`]: + +```rust,ignore +let mut results = MyAnalysis() + .into_engine(tcx, body, def_id) + .iterate_to_fixpoint() + .into_results_cursor(body); + +// Inspect the fixpoint state immediately before each `Drop` terminator. +for (bb, block) in body.basic_blocks().iter_enumerated() { + if let TerminatorKind::Drop { .. } = block.terminator().kind { + results.seek_before_primary_effect(body.terminator_loc(bb)); + let state = results.get(); + println!("state before drop: {:#?}", state); + } +} +``` + +["gen-kill" problems]: https://en.wikipedia.org/wiki/Data-flow_analysis#Bit_vector_problems +[*Static Program Analysis*]: https://cs.au.dk/~amoeller/spa/ +[`AnalysisDomain`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.AnalysisDomain.html +[`Analysis`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.Analysis.html +[`GenKillAnalysis`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.GenKillAnalysis.html +[`JoinSemiLattice`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/lattice/trait.JoinSemiLattice.html +[`ResultsCursor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/struct.ResultsCursor.html +[`ResultsVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.ResultsVisitor.html +[`apply_call_return_effect`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.Analysis.html#tymethod.apply_call_return_effect +[`into_engine`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.Analysis.html#method.into_engine +[`lattice`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/lattice/index.html +[goethe]: https://www.youtube.com/watch?v=NVBQSR_HdL0&list=PL_sGR8T76Y58l3Gck3ZwIIHLWEmXrOLV_&index=2 +[lattice]: https://en.wikipedia.org/wiki/Lattice_(order) +[wiki]: https://en.wikipedia.org/wiki/Data-flow_analysis#Basic_principles From 1c31d593bae405ccd3b9aaabed945f87ffd23a27 Mon Sep 17 00:00:00 2001 From: Dylan MacKenzie Date: Sun, 13 Sep 2020 12:22:57 -0700 Subject: [PATCH 2/4] Add missing link to `Engine` --- src/mir/dataflow.md | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mir/dataflow.md b/src/mir/dataflow.md index abe04e376..e83c4ebb3 100644 --- a/src/mir/dataflow.md +++ b/src/mir/dataflow.md @@ -153,6 +153,7 @@ for (bb, block) in body.basic_blocks().iter_enumerated() { [*Static Program Analysis*]: https://cs.au.dk/~amoeller/spa/ [`AnalysisDomain`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.AnalysisDomain.html [`Analysis`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.Analysis.html +[`Engine`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/struct.Engine.html [`GenKillAnalysis`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.GenKillAnalysis.html [`JoinSemiLattice`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/lattice/trait.JoinSemiLattice.html [`ResultsCursor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/struct.ResultsCursor.html From fdf0b3bd8cf9125e372223dd389bb5c87821e031 Mon Sep 17 00:00:00 2001 From: Dylan MacKenzie Date: Sun, 13 Sep 2020 12:24:46 -0700 Subject: [PATCH 3/4] Line edits --- src/mir/dataflow.md | 53 +++++++++++++++++++++++++-------------------- 1 file changed, 29 insertions(+), 24 deletions(-) diff --git a/src/mir/dataflow.md b/src/mir/dataflow.md index e83c4ebb3..9f93921b4 100644 --- a/src/mir/dataflow.md +++ b/src/mir/dataflow.md @@ -1,22 +1,22 @@ # Dataflow Analysis If you work on the MIR, you will frequently come across various flavors of -[dataflow analysis][wiki]. For example, `rustc` uses dataflow to find -uninitialized variables, determine what variables are live across a generator -`yield` statement, and compute which `Place`s are borrowed at a given point in -the control-flow graph. Dataflow analysis is a fundamental concept in modern +[dataflow analysis][wiki]. `rustc` uses dataflow to find uninitialized +variables, determine what variables are live across a generator `yield` +statement, and compute which `Place`s are borrowed at a given point in the +control-flow graph. Dataflow analysis is a fundamental concept in modern compilers, and knowledge of the subject will be helpful to prospective contributors. However, this documentation is not a general introduction to dataflow analysis. It is merely a description of the framework used to define these analyses in -`rustc`. It assumes that the reader is familiar with some basic terminology, -such as "transfer function", "fixpoint" and "lattice". If you're unfamiliar -with these terms, or if you want a quick refresher, [*Static Program Analysis*] -by Anders Møller and Michael I. Schwartzbach is an excellent, freely available -textbook. For those who prefer audiovisual learning, the Goethe University -Frankfurt has published a series of short [youtube lectures][goethe] in English -that are very approachable. +`rustc`. It assumes that the reader is familiar with the core ideas as well as +some basic terminology, such as "transfer function", "fixpoint" and "lattice". +If you're unfamiliar with these terms, or if you want a quick refresher, +[*Static Program Analysis*] by Anders Møller and Michael I. Schwartzbach is an +excellent, freely available textbook. For those who prefer audiovisual +learning, the Goethe University Frankfurt has published a series of short +[lectures on YouTube][goethe] in English that are very approachable. ## Defining a Dataflow Analysis @@ -56,14 +56,14 @@ slower as a result. All implementers of `GenKillAnalysis` also implement ### Transfer Functions and Effects -The dataflow framework in `rustc` allows each statement inside a basic block as -well as the terminator to define its own transfer function. For brevity, these +The dataflow framework in `rustc` allows each statement (and terminator) inside +a basic block define its own transfer function. For brevity, these individual transfer functions are known as "effects". Each effect is applied successively in dataflow order, and together they define the transfer function for the entire basic block. It's also possible to define an effect for particular outgoing edges of some terminators (e.g. [`apply_call_return_effect`] for the `success` edge of a `Call` -terminator). Collectively, these are known as per-edge effects. +terminator). Collectively, these are referred to as "per-edge effects". The only meaningful difference (besides the "apply" prefix) between the methods of the `GenKillAnalysis` trait and the `Analysis` trait is that an `Analysis` @@ -71,16 +71,21 @@ has direct, mutable access to the dataflow state, whereas a `GenKillAnalysis` only sees an implementer of the `GenKill` trait, which only allows the `gen` and `kill` operations for mutation. -Observant readers of the documentation for these traits may notice that there -are actually *two* possible effects for each statement and terminator, the -"before" effect and the unprefixed (or "primary") effect. The "before" effects -are applied immediately before the unprefixed effect **regardless of whether -the analysis is backward or forward**. The vast majority of analyses should use -only the unprefixed effects: Having multiple effects for each statement makes -it difficult for consumers to know where they should be looking. However, the -"before" variants can be useful in some scenarios, such as when the effect of -the right-hand side of an assignment statement must be considered separately -from the left-hand side. +### "Before" Effects + +Observant readers of the documentation may notice that there are actually *two* +possible effects for each statement and terminator, the "before" effect and the +unprefixed (or "primary") effect. The "before" effects are applied immediately +before the unprefixed effect **regardless of the direction of the analysis**. +In other words, a backward analysis will apply the "before" effect and then the +the "primary" effect when computing the transfer function for a basic block, +just like a forward analysis. + +The vast majority of analyses should use only the unprefixed effects: Having +multiple effects for each statement makes it difficult for consumers to know +where they should be looking. However, the "before" variants can be useful in +some scenarios, such as when the effect of the right-hand side of an assignment +statement must be considered separately from the left-hand side. ### Convergence From 065431856a6d4d273fae5f9ef314494d720eb4be Mon Sep 17 00:00:00 2001 From: Dylan MacKenzie Date: Sun, 13 Sep 2020 12:30:29 -0700 Subject: [PATCH 4/4] Update `Visitor` and `Cursor` examples --- src/mir/dataflow.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/mir/dataflow.md b/src/mir/dataflow.md index 9f93921b4..5b10afec1 100644 --- a/src/mir/dataflow.md +++ b/src/mir/dataflow.md @@ -127,19 +127,19 @@ For example, the following code uses a [`ResultsVisitor`]... ```rust,ignore // Assuming `MyVisitor` implements `ResultsVisitor`... -let my_visitor = MyVisitor::new(); +let mut my_visitor = MyVisitor::new(); // inspect the fixpoint state for every location within every block in RPO. -let results = MyAnalysis() +let results = MyAnalysis::new() .into_engine(tcx, body, def_id) .iterate_to_fixpoint() - .visit_with(body, traversal::reverse_postorder(body), &mut my_visitor); + .visit_in_rpo_with(body, &mut my_visitor); ``` whereas this code uses [`ResultsCursor`]: ```rust,ignore -let mut results = MyAnalysis() +let mut results = MyAnalysis::new() .into_engine(tcx, body, def_id) .iterate_to_fixpoint() .into_results_cursor(body);