-
Notifications
You must be signed in to change notification settings - Fork 210
Feat(experimental): DBT project conversion #4495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@click.pass_obj | ||
@error_handler | ||
@cli_analytics | ||
def dbt_convert( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we instead extend the init
command like we do for dlt generation?
# extract {{ var() }} references used in all jinja macro dependencies to check for any variables specific | ||
# to a migrated DBT package and resolve them accordingly | ||
# vars are added into __sqlmesh_vars__ in the Python env so that the native SQLMesh var() function can resolve them | ||
if migrated_dbt_project_name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be encapsulated into its own function?
@@ -491,6 +491,18 @@ def _merge_filter_validator( | |||
|
|||
return v.transform(d.replace_merge_table_aliases) | |||
|
|||
@field_validator("batch_concurrency", mode="before") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? There's already a validator for this field
) | ||
|
||
|
||
class DbtConversionConsole(TerminalConsole): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to inherit TerminalConsole
?
yield prev, curr | ||
|
||
|
||
class JinjaGenerator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: any reason to have this class? It doesn't look like the methods benefit from the shared self
instance in any way. Should these just be top-level functions?
This PR contains an initial implementation of a command that can take a DBT project, read it into memory and write the result out as a native SQLMesh project.
To invoke it, use:
The way this works is that the project is first loaded into memory using our existing
DbtLoader
.The resulting models and macros are extracted from the context and their Jinja is parsed into an AST (using the Jinja library). A bunch of AST transforms are run to replace DBT-isms with SQLMesh native concepts as much as possible:
{{ ref() }}
and{{ source() }}
calls are replaced with the actual model names they reference (where possible){% is_incremental() %}
blocks are removedJinja has no way of turning a Jinja AST back into a Jinja template string (since its goal is to generate Python code, not more Jinja) so I wrote a
JinjaGenerator
class to go from AST back tostr
.If, after applying all the Jinja AST transforms, there is still some Jinja left in the result then it is written to the SQLMesh model file as a Jinja model surrounded with
JINJA_QUERY_BEGIN; JINJA_END;
blocks.If there is no Jinja left after applying the transforms, it is written directly as a native SQL model with no Jinja wrapping.
Macros that call into DBT packages are also handled. The dependency tree is migrated and put in the target folder under
macros/__dbt_packages__
so that the macro hierarchy is still available when the macros are called on the SQLMesh side.A migrated project isnt truly native until all the dbt-isms have been removed. If the native loader detects it is loading a migrated project, it injects DBT shims into the Jinja context to make the migrated macros still work. The bulk of the DBT shim code is re-used from our existing DBT loader.
Known limitations:
from_sqlmesh()
in the relevant dbtTargetConfig
class{%-
or{%
block was used. This can lead to less than ideal formatting once the AST transforms are run and the resulting AST is turned back into a Jinja string.{{ source() }}
calls on the DBT side that used dynamic inputs and were aliased in the DBT config are not correctly migrated