|
| 1 | +# Finalizers and Weak References |
| 2 | + |
| 3 | +Some VMs support **finalizers**. In simple terms, finalizers are clean-up operations associated |
| 4 | +with an object, and are executed when the object is dead. |
| 5 | + |
| 6 | +Some VMs support **weak references**. If an object cannot be reached from roots following only |
| 7 | +strong references, the object will be considered dead. Weak references to dead objects will be |
| 8 | +cleared, and associated clean-up operations will be executed. Some VMs also support more complex |
| 9 | +weak data structures, such as weak hash tables, where keys, values, or both, can be weak references. |
| 10 | + |
| 11 | +The concrete semantics of finalizer and weak reference varies from VM to VM, but MMTk provides a |
| 12 | +low-level API that allows the VM bindings to implement their flavours of finalizer and weak |
| 13 | +references on top of it. |
| 14 | + |
| 15 | +**A note for Java programmers**: In Java, the term "weak reference" often refers to instances of |
| 16 | +`java.lang.ref.Reference` (including the concrete classes `SoftReference`, `WeakReference`, |
| 17 | +`PhantomReference` and the hidden `FinalizerReference` class used by some JVM implementations to |
| 18 | +implement finalizers). Instances of `Reference` are proper Java heap objects, but each instance has |
| 19 | +a field that contains a pointer to the referent, and the field can be cleared when the referent |
| 20 | +dies. In this article, we use the term "weak reference" to refer to the pointer inside that field. |
| 21 | +In other words, a Java `Reference` instance has a field that holds a weak reference to the referent. |
| 22 | + |
| 23 | +## Overview |
| 24 | + |
| 25 | +During each GC, after the transitive closure is computed, MMTk calls `Scanning::process_weak_refs` |
| 26 | +which is implemented by the VM binding. Inside this function, the VM binding can do several things. |
| 27 | + |
| 28 | +- **Query reachability**: The VM binding can query whether any given object has been reached in |
| 29 | + the transitive closure. |
| 30 | + - **Query forwarded address**: If an object is already reached, the VM binding can further |
| 31 | + query the new address of an object. This is needed to support copying GC. |
| 32 | + - **Retain object**: If an object is not reached, the VM binding can optionally request to |
| 33 | + retain (i.e. "resurrect") the object. It will keep that object *and all descendants* |
| 34 | + alive. |
| 35 | +- **Request another invocation**: The VM binding can request `Scanning::process_weak_refs` to be |
| 36 | + *called again* after computing the transitive closure that includes *retained objects and their |
| 37 | + descendants*. This helps handling multiple levels of weak reference strength. |
| 38 | + |
| 39 | +Concretely, |
| 40 | + |
| 41 | +- `ObjectReference::is_reachable()` queries reachability, |
| 42 | +- `ObjectReference::get_forwarded_object()` queries forwarded address, and |
| 43 | +- the `tracer_context` argument provided by the `Scanning::process_weak_refs` function can retain |
| 44 | + objects. |
| 45 | +- Returning `true` from `Scanning::process_weak_refs` will make it called again. |
| 46 | + |
| 47 | +The `Scanning::process_weak_refs` function also gives the VM binding a chance to perform other |
| 48 | +operations, including (but not limited to) |
| 49 | + |
| 50 | +- **Do clean-up operations**: The VM binding can perform clean-up operations, or queue them to be |
| 51 | + executed after GC. |
| 52 | +- **update fields** that contain weak references. |
| 53 | + - **Forward the field**: It can write the forwarded address of the referent if moved by a |
| 54 | + copying GC. |
| 55 | + - **Clear the field**: It can clear the field if the referent is unreachable. |
| 56 | + |
| 57 | +Using those primitive operations, the VM binding can support different flavours of finalizers and/or |
| 58 | +weak references. We will discuss different use cases in the following sections. |
| 59 | + |
| 60 | +## Supporting finalizers |
| 61 | + |
| 62 | +Different VMs define "finalizer" differently, but they all involve performing operations when an |
| 63 | +object is dead. The general way to handle finalizer is visiting all **finalizable objects** (i.e. |
| 64 | +objects that have associated finalization operations), check if they are dead and, if dead, do |
| 65 | +something about them. |
| 66 | + |
| 67 | +### Identifying finalizable objects |
| 68 | + |
| 69 | +Some VMs determine whether an object is finalizable by its type. In Java, for example, an object is |
| 70 | +finalizable if its `finalize()` method is overridden. We can register instances of such types when |
| 71 | +they are constructed. |
| 72 | + |
| 73 | +Some VMs can attach finalizing operations to an object after it is created. The VM can maintain a |
| 74 | +list of objects with attached finalizers, or maintain a (weak) hash map that maps finalizable |
| 75 | +objects to its associated finalizers. |
| 76 | + |
| 77 | +### When to run finalizers? |
| 78 | + |
| 79 | +Depending on the semantics, finalizers can be executed during GC or during mutator time after GC. |
| 80 | + |
| 81 | +The VM binding can run finalizers in `Scanning::process_weak_refs` after finding a finalizable |
| 82 | +object dead. But beware that MMTk is usually run with multiple GC workers. The VM binding can |
| 83 | +parallelise the operations by creating work packets. The `Scanning::process_weak_refs` function is |
| 84 | +executed in the `VMRefClosure` stage, so the created work packets shall be added to the same bucket. |
| 85 | + |
| 86 | +If the finalizers should be executed after GC, the VM binding should enqueue them to VM-specific |
| 87 | +queues so that they can be picked up after GC. |
| 88 | + |
| 89 | +### Reading the body of dead object |
| 90 | + |
| 91 | +In some VMs, finalizers can read the fields in dead objects. Such fields usually include |
| 92 | +information needed for cleaning up resources held by the object, such as file descriptors and |
| 93 | +pointers to memory or objects not managed by GC. |
| 94 | + |
| 95 | +`Scanning::process_weak_refs` is executed in the `VMRefClosure` stage, which happens after the |
| 96 | +strong transitive closure (including all objects reachable from roots following only strong |
| 97 | +references) has been computed, but before any object has been released (which happens in the |
| 98 | +`Release` stage). This means the body of all objects, live or dead, can still be accessed during |
| 99 | +this stage. |
| 100 | + |
| 101 | +Therefore, if the VM needs to execute finalizers during GC, the VM binding can execute them in |
| 102 | +`process_weak_refs`, or create work packets in the `VMRefClosure` stage. |
| 103 | + |
| 104 | +However, if the VM needs to execute finalizers after GC, there will be a problem because the object |
| 105 | +will be reclaimed, and memory of the object will be overwritten by other objects. In this case, the |
| 106 | +VM will need to "resurrect" the dead object. |
| 107 | + |
| 108 | +### Resurrecting dead objects |
| 109 | + |
| 110 | +Some VMs, particularly the Java VM, executes finalizers during mutator time. The dead finalizable |
| 111 | +objects must be brought back to life so that they can still be accessed after the GC. |
| 112 | + |
| 113 | +The `Scanning::process_weak_refs` has an parameter `tracer_context: impl ObjectTracerContext<VM>`. |
| 114 | +This parameter provides the necessary mechanism to retain (i.e. "resurrect") objects and make them |
| 115 | +(and their descendants) live through the current GC. The typical use pattern is: |
| 116 | + |
| 117 | +```rust |
| 118 | +impl<VM: VMBinding> Scanning<VM> for VMScanning { |
| 119 | + fn process_weak_refs( |
| 120 | + worker: &mut GCWorker<VM>, |
| 121 | + tracer_context: impl ObjectTracerContext<VM>, |
| 122 | + ) -> bool { |
| 123 | + let finalizable_objects = ...; |
| 124 | + let mut new_finalizable_objects = vec![]; |
| 125 | + |
| 126 | + tracer_context.with_tracer(worker, |tracer| { |
| 127 | + for object in finalizable_objects { |
| 128 | + if object.is_reachable() { |
| 129 | + // Object is still alive, and may be moved if it's copying GC. |
| 130 | + let new_object = object.get_forwarded_object().unwrap_or(object); |
| 131 | + new_finalizable_objects.push(new_object); |
| 132 | + } else { |
| 133 | + // Object is dead. Retain it. |
| 134 | + let new_object = tracer.trace_object(object); |
| 135 | + enqueue_finalizable_object_to_be_executed_later(new_object); |
| 136 | + } |
| 137 | + } |
| 138 | + }); |
| 139 | + |
| 140 | + // more code ... |
| 141 | + } |
| 142 | +} |
| 143 | +``` |
| 144 | + |
| 145 | +The `tracer` parameter of the closure is an `ObjectTracer`. It provides the `trace_object` method |
| 146 | +which retains an object and returns the forwarded address. |
| 147 | + |
| 148 | +`tracer_context.with_tracer` creates a temporary `ObjectTracer` instance which the VM binding can |
| 149 | +use within the given closure. Objects retained by `trace_object` in the closure are enqueued. |
| 150 | +After the closure returns, `with_tracer` will create reasonably-sized work packets for tracing the |
| 151 | +retained objects and their descendants. Therefore, the VM binding is encouraged use one |
| 152 | +`with_tracer` invocation to retain as many objects as needed. Do not call `with_tracer` too often, |
| 153 | +or it will create too many small work packets, which hurts the performance. |
| 154 | + |
| 155 | +Keep in mind that **`ObjectTracerContext` implements `Clone`**. If the VM has too many finalizable |
| 156 | +objects, it is advisable to split the list of finalizable objects into smaller chunks. Create one |
| 157 | +work packets for each chunk, and give each work packet a clone of `tracer_context` so that multiple |
| 158 | +work packets can process finalizable objects in parallel. |
| 159 | + |
| 160 | + |
| 161 | +## Supporting weak references |
| 162 | + |
| 163 | +The general way to handle weak references is, after computing the transitive closure, iterate |
| 164 | +through all fields that contain weak references to objects. For each field, |
| 165 | + |
| 166 | +- if the referent is already reached, write the new address of the object to the field (or do |
| 167 | + nothing if the object is not moved); |
| 168 | +- otherwise, clear the field, writing `null`, `nil`, or whatever represents a cleared weak |
| 169 | + reference to the field. |
| 170 | + |
| 171 | +### Identifying weak references |
| 172 | + |
| 173 | +Weak references in global slots, including fields of global data structures as well as keys and/or |
| 174 | +values in global weak tables, are relatively straightforward. We just need to enumerate them in |
| 175 | +`Scanning::process_weak_refs`. |
| 176 | + |
| 177 | +There are also fields that in heap objects that hold weak references to other heap objects. There |
| 178 | +are two basic ways to identify them. |
| 179 | + |
| 180 | +- **Register on creation**: We may record objects that contain such fields in a global list when |
| 181 | + such objects are created. In `Scanning::process_weak_refs`, we just need to iterate through |
| 182 | + this list, process the fields, and remove dead objects from the list. |
| 183 | +- **Discover objects during tracing**: While computing the transitive closure, we scan objects and |
| 184 | + discover objects that contain weak reference fields. We enqueue such objects into a list, and |
| 185 | + iterate through the list in `Scanning::process_weak_refs` after transitive closure. The list |
| 186 | + needs to be reconstructed in each GC. |
| 187 | + |
| 188 | +Both methods work, but each has its advantages and disadvantages. Registering on creation does not |
| 189 | +need to reconstruct the list in every GC, while discovering during tracing can avoid visiting dead |
| 190 | +objects. Depending on the nature of your VM, one method may be easier to implement than the other, |
| 191 | +especially if your VM's existing GC has already implemented weak reference processing in some way. |
| 192 | + |
| 193 | +### Associated clean-up operations |
| 194 | + |
| 195 | +Some languages and VMs allow certain clean-up operations to be associated with weak references, and |
| 196 | +will be executed after the weak reference is cleared. |
| 197 | + |
| 198 | +Such clean-up operations can be supported similar to finalizers. While we enumerate weak references |
| 199 | +in `Scanning::process_weak_refs`, we clear weak references to unreachable objects. Depending on the |
| 200 | +semantics, such as whether the clean-up operation can access the body of unreachable referent, we |
| 201 | +may choose to execute the clean-up operation immediately, or enqueue them to be executed after GC, |
| 202 | +and may even resurrect the unreachable referent if we need to. |
| 203 | + |
| 204 | +### Soft references |
| 205 | + |
| 206 | +Java has a special kind of weak reference: `SoftReference`. The API allows the GC to choose whether |
| 207 | +to retain or clear references to softly reachable objects. When using MMTk, there are two ways to |
| 208 | +implement it. |
| 209 | + |
| 210 | +The easiest way is **treating `SoftReference` as strong references in non-emergency GCs, and |
| 211 | +treating them as weak references in emergency GCs**. During non-emergency GC, we let |
| 212 | +`Scanning::scan_objects` scan the weak reference field inside a `SoftReference` instance as if it |
| 213 | +were an ordinary strong reference field. In this way, the (strong) transitive closure after the |
| 214 | +`Closure` stage will also include softly reachable objects, and they will be retained. During |
| 215 | +emergency GC, however, skip this field in `Scanning::scan_objects`, and clear `SoftReference` just |
| 216 | +like `WeakReference` in `Scanning::process_weak_refs`. In this way, softly reachable objects will |
| 217 | +be dead if not subject to finalization. |
| 218 | + |
| 219 | +The other way is **retaining `SoftReference` after the strong closure**. This involves supporting |
| 220 | +multiple levels of reference strengths, which will be introduced in the next section. |
| 221 | + |
| 222 | +### Multiple levels of reference strength |
| 223 | + |
| 224 | +Some VMs support multiple levels of weak reference strengths. Java, for example, has |
| 225 | +`SoftReference`, `WeakReference`, `FinalizerReference` (internal) and `PhantomReference`, in the |
| 226 | +order of decreasing strength. |
| 227 | + |
| 228 | +This can be supported by running `Scanning::process_weak_refs` multiple times. If |
| 229 | +`process_weak_refs` returns `true`, it will be called again after all pending work packets in the |
| 230 | +`VMRefClosure` stage has been executed. That include all work packets that compute the transitive |
| 231 | +closure from objects retained (i.e. "resurrected") during `process_weak_refs`. This allows the VM |
| 232 | +binding to expand the transitive closure multiple times, each retaining objects at different levels |
| 233 | +of reachability. |
| 234 | + |
| 235 | +Take Java as an example, we may run `process_weak_refs` four times. |
| 236 | + |
| 237 | +1. Visit all `SoftReference`. |
| 238 | + - If the referent is reachable, then |
| 239 | + - forward the referent field. |
| 240 | + - If the referent is unreachable, choose between one of the following: |
| 241 | + - Retain the referent and update the referent field. |
| 242 | + - Clear the referent field, remove the `SoftReference` from the list of soft references, |
| 243 | + and optionally enqueue it to the associated `ReferenceQueue` if it has one. |
| 244 | + - (This step may expand the transitive closure if any referents are retained.) |
| 245 | +2. Visit all `WeakReference`. |
| 246 | + - If the referent is reachable, then |
| 247 | + - forward the referent field. |
| 248 | + - If the referent is unreachable, then |
| 249 | + - clear the referent field, remove the `WeakReference` from the list of weak references, |
| 250 | + and optionally enqueue it to the associated `ReferenceQueue` if it has one. |
| 251 | + - (This step cannot expand the transitive closure.) |
| 252 | +3. Visit the list of finalizable objects (may be implemented as `FinalizerReference` by some JVMs). |
| 253 | + - If the finalizable object is reachable, then |
| 254 | + - forward the reference to it since it may have been moved. |
| 255 | + - If the finalizable object is unreachable, then |
| 256 | + - remove it from the list of finalizable objects, and enqueue it for finalization. |
| 257 | + - (This step may expand the transitive closure if any finalizable objects are retained.) |
| 258 | +4. Visit all `PhantomReference`. |
| 259 | + - If the referent is reachable, then |
| 260 | + - forward the referent field. (Note: `PhantomReference#get()` always returns `null`, but |
| 261 | + the actual referent field shall hold a valid reference to the referent.) |
| 262 | + - If the referent is unreachable, then |
| 263 | + - clear the referent field, remove the `PhantomReference` from the list of phantom |
| 264 | + references, and optionally enqueue it to the associated `ReferenceQueue` if it has one. |
| 265 | + - (This step cannot expand the transitive closure.) |
| 266 | + |
| 267 | +As an optimization, Step 1 can be eliminated by merging it with the strong closure in non-emergency |
| 268 | +GC, or with `WeakReference` processing in emergency GC, as we described in the previous section. |
| 269 | +Step 2 can be merged with Step 3 since Step 2 never expands the transitive closure. Therefore, we |
| 270 | +only need to run `process_weak_refs` twice: |
| 271 | + |
| 272 | +1. Handle `WeakReference` (and also `SoftReference` in emergency GC), and then handle finalizable |
| 273 | + objects. |
| 274 | +2. Handle `PhandomReference`. |
| 275 | + |
| 276 | +### Ephemerons |
| 277 | + |
| 278 | +TODO |
| 279 | + |
| 280 | + |
| 281 | +<!-- |
| 282 | +vim: tw=100 ts=4 sw=4 sts=4 et |
| 283 | +--> |
0 commit comments