Skip to content

Conversation

@rudolfix
Copy link
Collaborator

@rudolfix rudolfix commented Jul 2, 2024

Description

  1. we prevent changing naming convention when there are any tables that are already created in the destination
  2. we allow to change naming convention when all such tables are blocked and make sure the right names (pre change) are used when dropping tables
  3. bugfix: tables were not dropped for staging dataset so refresh was not working with merge
  4. no print linter rule added and some prints removed

@rudolfix rudolfix self-assigned this Jul 2, 2024
@netlify
Copy link

netlify bot commented Jul 2, 2024

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit f6dd9c1
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/6686fcb5d2f9c10008c223c1
😎 Deploy Preview https://deploy-preview-1536--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@rudolfix rudolfix requested a review from sh-rp July 3, 2024 08:45
@rudolfix rudolfix added the sprint Marks group of tasks with core team focus at this moment label Jul 3, 2024
@rudolfix rudolfix force-pushed the feat/improves-refresh-mode branch from ab98a81 to b6fc942 Compare July 4, 2024 11:58
@@ -0,0 +1,25 @@
"""Defines env variables that `dlt` uses independently of its configuration system"""

DLT_PROJECT_DIR = "DLT_PROJECT_DIR"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these in the docs somewhere? I think people had been asking about how change the datadir etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for table in self.data_tables(include_incomplete=True):
# TODO: when lineage is fully implemented we should use source identifiers
# not `table` which was already normalized
norm_table = utils.normalize_table_identifiers(table, to_naming)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make sense here? Maybe I have not thought it through properly, but we don't have the original identifiers in the schema for the data tables do we? Imho there is no way to really discover wether a new naming scheme is compatible with an old one except for on the internal tables. My intuitive solution for this would have been to allow naming changes and solve the problem by explaining in the docs that you'll need to manually update your destination table names and columns if you add a new convention that ends up loading to a different table or column.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right. next step in this naming convention thing is to store source identifiers in the schema (ie. attached to every table and column names). in that case whenever naming convention changes we are able to restore source identifiers and normalize them again. I do exactly this with dlt tables where we know the source identifiers (hardcoded in utils).

if we have source identifiers we are able to ensure that tables that have data didn't change any names when naming convention changes. currently I block changing naming conventions if there any tables with data. this is to prevent a catastrophic situation when someone changes naming by accident and tables are broken.

if you do not care of above, there's a config setting to enable that. I'm still writing the doc here:
https://dlthub.com/devel/general-usage/naming-convention#avoid-identifier-collisions

@rudolfix rudolfix merged commit 60c7327 into devel Jul 5, 2024
@rudolfix rudolfix deleted the feat/improves-refresh-mode branch July 5, 2024 06:08
@rudolfix rudolfix removed the sprint Marks group of tasks with core team focus at this moment label Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants