|
2 | 2 |
|
3 | 3 | **Understanding how ETL components work together to replicate data from Postgres** |
4 | 4 |
|
5 | | -ETL's architecture centers around four core abstractions that work together to provide reliable, high-performance data replication: `Pipeline`, `Destination`, `SchemaStore`, and `StateStore`. This document explains how these components interact and coordinate data flow from Postgres logical replication to target systems. |
| 5 | +ETL's architecture centers around five core abstractions that work together to provide reliable, high-performance data replication: `Pipeline`, `Destination`, `SchemaStore`, `StateStore`, and `CleanupStore`. This document explains how these components interact and coordinate data flow from Postgres logical replication to target systems. |
6 | 6 |
|
7 | 7 | A diagram of the overall architecture is shown below: |
8 | 8 |
|
@@ -36,6 +36,10 @@ flowchart LR |
36 | 36 | subgraph SchemaStore[Schema Store] |
37 | 37 | D2["Memory<br>Postgres"] |
38 | 38 | end |
| 39 | +
|
| 40 | + subgraph CleanupStore[Cleanup Store] |
| 41 | + D3["Memory<br>Postgres"] |
| 42 | + end |
39 | 43 | end |
40 | 44 |
|
41 | 45 | A --> ApplyWorker |
@@ -160,6 +164,24 @@ Like `SchemaStore`, `StateStore` uses cache-first reads with `load_*` methods fo |
160 | 164 |
|
161 | 165 | The store tracks both replication progress through `TableReplicationPhase` and source-to-destination table name mappings. |
162 | 166 |
|
| 167 | +### CleanupStore |
| 168 | + |
| 169 | +The `CleanupStore` trait provides atomic, table-scoped maintenance operations that affect both schema and state storage. The pipeline calls these primitives when tables are removed from a publication. |
| 170 | + |
| 171 | +```rust |
| 172 | +pub trait CleanupStore { |
| 173 | + /// Deletes all stored state for `table_id` for the current pipeline. |
| 174 | + /// |
| 175 | + /// Removes replication state (including history), table schemas, and |
| 176 | + /// table mappings. This must NOT drop or modify the actual destination table. |
| 177 | + /// |
| 178 | + /// Intended for use when a table is removed from the publication. |
| 179 | + fn cleanup_table_state(&self, table_id: TableId) -> impl Future<Output = EtlResult<()>> + Send; |
| 180 | +} |
| 181 | +``` |
| 182 | + |
| 183 | +Implementations must ensure the operation is consistent across in-memory caches and persistent storage, and must be idempotent. Cleanup only removes ETL-maintained metadata and state; it never touches destination tables. |
| 184 | + |
163 | 185 | ## Data Flow Architecture |
164 | 186 |
|
165 | 187 | ### Worker Coordination |
|
0 commit comments