Skip to content

scd2 merge strategy does not reinsert records #1683

@jorritsandbrink

Description

@jorritsandbrink

dlt version

0.5.2

Describe the problem

When using the scd2 merge strategy, records do not get reinserted after retirement:

  • load 1: source snapshot contains record x ➜ record x is inserted in destination
  • load 2: source snapshot does not contain record x ➜ record x is retired in destination ("valid to" field is filled with timestamp)
  • load 3: source snapshot contains record x ➜ nothing happens (this is the bug)

Issue only occurs if record x is exactly the same. If the natural key is the same, but one or more attributes is different, the record gets inserted as expected.

Expected behavior

Record x gets reinserted in load 3. Resultant destination table has two records for record x with different validity windows.

Steps to reproduce

import dlt

@dlt.resource(
    table_name="my_table", write_disposition={"disposition": "merge", "strategy": "scd2"}
)
def r(data):
    yield data

pipe = dlt.pipeline(destination="postgres", dataset_name="scd2_reinsert_bug")

# load 1 — initial load
dim_snap = [
    {"nk": 1, "c1": "foo", "c2": "foo"},
    {"nk": 2, "c1": "bar", "c2": "bar"},
]
pipe.run(r(dim_snap))
# result: inserted 2 records (table size: 2 records)

# load 2 — delete natural key 1
dim_snap = [
    {"nk": 2, "c1": "bar", "c2": "bar"},
]
pipe.run(r(dim_snap))
# result: retired record for natural key 1 (table size: 2 records)

# load 3 — reinsert natural key 1
dim_snap = [
    {"nk": 1, "c1": "foo", "c2": "foo"},
    {"nk": 2, "c1": "bar", "c2": "bar"},
]
pipe.run(r(dim_snap))
# result: nothing happened (table size: 2 records)
# desired result: reinsert record for natural key 1 (table size: 3 records)

Operating system

Windows

Runtime environment

Local

Python version

3.8

dlt data source

No response

dlt destination

Postgres

Other deployment details

No response

Additional information

No response

Metadata

Metadata

Labels

bugSomething isn't workingcommunityIssue from dlt Slack community

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions