-
Notifications
You must be signed in to change notification settings - Fork 418
improves collision detection when naming convention changes #1536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for dlt-hub-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
ab98a81 to
b6fc942
Compare
| @@ -0,0 +1,25 @@ | |||
| """Defines env variables that `dlt` uses independently of its configuration system""" | |||
|
|
|||
| DLT_PROJECT_DIR = "DLT_PROJECT_DIR" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these in the docs somewhere? I think people had been asking about how change the datadir etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's just link this
https://deploy-preview-1536--dlt-hub-docs.netlify.app/docs/api_reference/common/known_env
from our docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for table in self.data_tables(include_incomplete=True): | ||
| # TODO: when lineage is fully implemented we should use source identifiers | ||
| # not `table` which was already normalized | ||
| norm_table = utils.normalize_table_identifiers(table, to_naming) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this make sense here? Maybe I have not thought it through properly, but we don't have the original identifiers in the schema for the data tables do we? Imho there is no way to really discover wether a new naming scheme is compatible with an old one except for on the internal tables. My intuitive solution for this would have been to allow naming changes and solve the problem by explaining in the docs that you'll need to manually update your destination table names and columns if you add a new convention that ends up loading to a different table or column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right. next step in this naming convention thing is to store source identifiers in the schema (ie. attached to every table and column names). in that case whenever naming convention changes we are able to restore source identifiers and normalize them again. I do exactly this with dlt tables where we know the source identifiers (hardcoded in utils).
if we have source identifiers we are able to ensure that tables that have data didn't change any names when naming convention changes. currently I block changing naming conventions if there any tables with data. this is to prevent a catastrophic situation when someone changes naming by accident and tables are broken.
if you do not care of above, there's a config setting to enable that. I'm still writing the doc here:
https://dlthub.com/devel/general-usage/naming-convention#avoid-identifier-collisions
Description
refreshwas not working withmergeprintlinter rule added and some prints removed