Skip to content

Feat(experimental): DBT project conversion #4495

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

erindru
Copy link
Collaborator

@erindru erindru commented May 22, 2025

This PR contains an initial implementation of a command that can take a DBT project, read it into memory and write the result out as a native SQLMesh project.

To invoke it, use:

$ sqlmesh dbt convert -i <input_path> -o <output_path>

The way this works is that the project is first loaded into memory using our existing DbtLoader.

  • This means that the existing SQLMesh mappings from DBT model types -> SQLMesh model types are utilized
  • It also means that our existing DBT shims in the Jinja context can be utilized

The resulting models and macros are extracted from the context and their Jinja is parsed into an AST (using the Jinja library). A bunch of AST transforms are run to replace DBT-isms with SQLMesh native concepts as much as possible:

  • {{ ref() }} and {{ source() }} calls are replaced with the actual model names they reference (where possible)
  • {% is_incremental() %} blocks are removed
  • ...etc

Jinja has no way of turning a Jinja AST back into a Jinja template string (since its goal is to generate Python code, not more Jinja) so I wrote a JinjaGenerator class to go from AST back to str.

If, after applying all the Jinja AST transforms, there is still some Jinja left in the result then it is written to the SQLMesh model file as a Jinja model surrounded with JINJA_QUERY_BEGIN; JINJA_END; blocks.

If there is no Jinja left after applying the transforms, it is written directly as a native SQL model with no Jinja wrapping.

Macros that call into DBT packages are also handled. The dependency tree is migrated and put in the target folder under macros/__dbt_packages__ so that the macro hierarchy is still available when the macros are called on the SQLMesh side.

A migrated project isnt truly native until all the dbt-isms have been removed. If the native loader detects it is loading a migrated project, it injects DBT shims into the Jinja context to make the migrated macros still work. The bulk of the DBT shim code is re-used from our existing DBT loader.

Known limitations:

  • The source DBT project must be loadable by the SQLMesh DBT loader. It doesnt need to be runnable, just loadable.
  • Currently only works for BigQuery and DuckDB as these were the initial focus. Other DB types can be added by implementing from_sqlmesh() in the relevant dbt TargetConfig class
  • Jinja handles whitespace stripping at parse time, so the AST has no idea if eg a {%- or {% block was used. This can lead to less than ideal formatting once the AST transforms are run and the resulting AST is turned back into a Jinja string.
  • Only the jinja constructs that were present in the test projects are handled by the Jinja generator, some more esoteric AST nodes are currently not handled
  • Audits are not generalized by our DBT loader so each model currently gets its own version as an inline audit
  • {{ source() }} calls on the DBT side that used dynamic inputs and were aliased in the DBT config are not correctly migrated
  • pre/post hooks / statements are not currently handled

@erindru erindru force-pushed the erin/dbt-convert branch from 0f4ddc6 to 738b2c7 Compare May 22, 2025 01:39
@erindru erindru marked this pull request as ready for review May 22, 2025 02:46
@click.pass_obj
@error_handler
@cli_analytics
def dbt_convert(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we instead extend the init command like we do for dlt generation?

# extract {{ var() }} references used in all jinja macro dependencies to check for any variables specific
# to a migrated DBT package and resolve them accordingly
# vars are added into __sqlmesh_vars__ in the Python env so that the native SQLMesh var() function can resolve them
if migrated_dbt_project_name:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be encapsulated into its own function?

@@ -491,6 +491,18 @@ def _merge_filter_validator(

return v.transform(d.replace_merge_table_aliases)

@field_validator("batch_concurrency", mode="before")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? There's already a validator for this field

)


class DbtConversionConsole(TerminalConsole):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to inherit TerminalConsole?

yield prev, curr


class JinjaGenerator:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: any reason to have this class? It doesn't look like the methods benefit from the shared self instance in any way. Should these just be top-level functions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants