-
Notifications
You must be signed in to change notification settings - Fork 415
Closed
Labels
good first issueGood for newcomersGood for newcomers
Description
TLDR
We use staging datasets for transactional safety on merge write_dispositions as well as some variants of the replace write_disposition. The default behavior is to have a second dataset called "<dataset_name>_staging". Users can change this name which can lead to a setup where final and staging datasets have the same name. We should prevent this or at least print a big fat warning if users try to do this, as data in the final dataset will be truncated by the setup commands that should only truncate the staging dataset.
ToDo
- Learn about the staging dataset: https://dlthub.com/docs/dlt-ecosystem/staging#staging-dataset
- Add a new method to the
WithStagingDatasetclass:def create_dataset_names(self, schema: Schema, config: DestinationClientDwhConfiguration) -> Tuple[str, str]:, which creates the regular and the staging dataset names for a given schema and config, this method should also raise an Exception if both are the same. See the point below to find the places where these normalized names are created. - Use this new method to create the normalized regular and staging dataset names in for all destinations (including the filesystem destination). You can find all destination implementations under dlt/destinations/impl, or just search for all the places where
normalize_staging_dataset_name()is used. - Write tests that demonstrate that this exception is raised if both datasets end up having the same name after normalization.
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomers
Type
Projects
Status
Done