Skip to content

Commit 3a312a9

Browse files
authored
docs: Expand MemoryPool docs with related structs (#16289)
* doc: related structs to MemoryPool * fix CI * review * review: explain RAII in SharedRegistration
1 parent f35416e commit 3a312a9

File tree

1 file changed

+63
-2
lines changed
  • datafusion/execution/src/memory_pool

1 file changed

+63
-2
lines changed

datafusion/execution/src/memory_pool/mod.rs

Lines changed: 63 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,8 +57,8 @@ pub use pool::*;
5757
/// `GroupByHashExec`. It does NOT track and limit memory used internally by
5858
/// other operators such as `DataSourceExec` or the `RecordBatch`es that flow
5959
/// between operators. Furthermore, operators should not reserve memory for the
60-
/// batches they produce. Instead, if a parent operator needs to hold batches
61-
/// from its children in memory for an extended period, it is the parent
60+
/// batches they produce. Instead, if a consumer operator needs to hold batches
61+
/// from its producers in memory for an extended period, it is the consumer
6262
/// operator's responsibility to reserve the necessary memory for those batches.
6363
///
6464
/// In order to avoid allocating memory until the OS or the container system
@@ -98,6 +98,67 @@ pub use pool::*;
9898
/// operator will spill the intermediate buffers to disk, and release memory
9999
/// from the memory pool, and continue to retry memory reservation.
100100
///
101+
/// # Related Structs
102+
///
103+
/// To better understand memory management in DataFusion, here are the key structs
104+
/// and their relationships:
105+
///
106+
/// - [`MemoryConsumer`]: A named allocation traced by a particular operator. If an
107+
/// execution is parallelized, and there are multiple partitions of the same
108+
/// operator, each partition will have a separate `MemoryConsumer`.
109+
/// - `SharedRegistration`: A registration of a `MemoryConsumer` with a `MemoryPool`.
110+
/// `SharedRegistration` and `MemoryPool` have a many-to-one relationship. `MemoryPool`
111+
/// implementation can decide how to allocate memory based on the registered consumers.
112+
/// (e.g. `FairSpillPool` will try to share available memory evenly among all registered
113+
/// consumers)
114+
/// - [`MemoryReservation`]: Each `MemoryConsumer`/operator can have multiple
115+
/// `MemoryReservation`s for different internal data structures. The relationship
116+
/// between `MemoryConsumer` and `MemoryReservation` is one-to-many. This design
117+
/// enables cleaner operator implementations:
118+
/// - Different `MemoryReservation`s can be used for different purposes
119+
/// - `MemoryReservation` follows RAII principles - to release a reservation,
120+
/// simply drop the `MemoryReservation` object. When all `MemoryReservation`s
121+
/// for a `SharedRegistration` are dropped, the `SharedRegistration` is dropped
122+
/// when its reference count reaches zero, automatically unregistering the
123+
/// `MemoryConsumer` from the `MemoryPool`.
124+
///
125+
/// ## Relationship Diagram
126+
///
127+
/// ```text
128+
/// ┌──────────────────┐ ┌──────────────────┐
129+
/// │MemoryReservation │ │MemoryReservation │
130+
/// └───┬──────────────┘ └──────────────────┘ ......
131+
/// │belongs to │
132+
/// │ ┌───────────────────────┘ │ │
133+
/// │ │ │ │
134+
/// ▼ ▼ ▼ ▼
135+
/// ┌────────────────────────┐ ┌────────────────────────┐
136+
/// │ SharedRegistration │ │ SharedRegistration │
137+
/// │ ┌────────────────┐ │ │ ┌────────────────┐ │
138+
/// │ │ │ │ │ │ │ │
139+
/// │ │ MemoryConsumer │ │ │ │ MemoryConsumer │ │
140+
/// │ │ │ │ │ │ │ │
141+
/// │ └────────────────┘ │ │ └────────────────┘ │
142+
/// └────────────┬───────────┘ └────────────┬───────────┘
143+
/// │ │
144+
/// │ register│into
145+
/// │ │
146+
/// └─────────────┐ ┌──────────────┘
147+
/// │ │
148+
/// ▼ ▼
149+
/// ╔═══════════════════════════════════════════════════╗
150+
/// ║ ║
151+
/// ║ MemoryPool ║
152+
/// ║ ║
153+
/// ╚═══════════════════════════════════════════════════╝
154+
/// ```
155+
///
156+
/// For example, there are two parallel partitions of an operator X: each partition
157+
/// corresponds to a `MemoryConsumer` in the above diagram. Inside each partition of
158+
/// operator X, there are typically several `MemoryReservation`s - one for each
159+
/// internal data structure that needs memory tracking (e.g., 1 reservation for the hash
160+
/// table, and 1 reservation for buffered input, etc.).
161+
///
101162
/// # Implementing `MemoryPool`
102163
///
103164
/// You can implement a custom allocation policy by implementing the

0 commit comments

Comments
 (0)