Skip to content

Commit 55850de

Browse files
document scd2 row id uniqueness characteristics
1 parent cd155f3 commit 55850de

File tree

1 file changed

+13
-0
lines changed

1 file changed

+13
-0
lines changed

docs/website/docs/general-usage/incremental-loading.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -251,6 +251,19 @@ executed. You can achieve the same in the decorator `@dlt.source(root_key=True)`
251251
### `scd2` strategy
252252
`dlt` can create [Slowly Changing Dimension Type 2](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row) (SCD2) destination tables for dimension tables that change in the source. The resource is expected to provide a full extract of the source table each run. A row hash is stored in `_dlt_id` and used as surrogate key to identify source records that have been inserted, updated, or deleted. A `NULL` value is used by default to indicate an active record, but it's possible to use a configurable high timestamp (e.g. 9999-12-31 00:00:00.000000) instead.
253253

254+
:::note
255+
The `unique` hint for `_dlt_id` in the root table is set to `false` when using `scd2`. This differs from [default behavior](./destination-tables.md#child-and-parent-tables). The reason is that the surrogate key stored in `_dlt_id` contains duplicates after an _insert-delete-reinsert_ pattern:
256+
1. record with surrogate key X is inserted in a load at `t1`
257+
2. record with surrogate key X is deleted in a later load at `t2`
258+
3. record with surrogate key X is reinserted in an even later load at `t3`
259+
260+
After this pattern, the `scd2` table in the destination has two records for surrogate key X: one for validity window `[t1, t2]`, and one for `[t3, NULL]`. A duplicate value exists in `_dlt_id` because both records have the same surrogate key.
261+
262+
Note that:
263+
- the composite key `(_dlt_id, _dlt_valid_from)` is unique
264+
- `_dlt_id` remains unique for child tables—`scd2` does not affect this
265+
:::
266+
254267
#### Example: `scd2` merge strategy
255268
```py
256269
@dlt.resource(

0 commit comments

Comments
 (0)