Run independent dbt selects for a @dbt_assets declaration with many models #23665

ssillaots-boku · 2024-08-15T11:10:11Z

ssillaots-boku
Aug 15, 2024

I'm wondering if there is a way to break up dbt run statement for @dbt_assets asset which selects many models.
Let me show an example.

@dbt_assets(manifest=manifest,
            select='some_model another_model last_model')
def daily_models(...):
    yield from dbt.cli(...)

So in Dagster this creates a run where dbt command would look like dbt run --select some_model another_model last_model. So a singular run statement for all the models.
Now let's say that two first model run successfully but the last one fails. Then the run will retry and all the models will be ran again. It's fine if @dbt_assets doesn't select many assets. But let's say you have 20+ assets lined up where each run lasts for 3 minutes. So altogether ~60 minutes of runs. Now when the last run fails for some reason then we'll go again for another 60 minutes. In the end it gets costly as well (on Redshift's side).
I'm trying to find out if there's a way to break up singular dbt run statement into three separate dbt run statements. So that when a model run fails then the next Dagster run will pick off where it failed.

Answered by maximearmstrong

Aug 26, 2024

Hi @ssillaots-boku - When using dbt assets, the run launched by Dagster is leveraging a dbt CLI invocation with the --select flag, which means that a single Dagster run selecting multiple dbt assets will always be invoked with a command like dbt run --select some_model another_model last_model.

To break your run statement in three, you'll need to create your own asset selections and jobs to launched individual Dagster runs. You can use schedules and sensors to orchestrate everything:

@dbt_assets(manifest=manifest)
def my_dbt_assets(...):
    yield from dbt.cli(...)

# Create the asset selections and jobs
some_model_job = define_asset_job(
    name="some_model_job",
    selection=build_dbt…

View full answer

ssillaots-boku · 2024-08-26T05:30:30Z

ssillaots-boku
Aug 26, 2024
Author

Up! Any ideas? 🤔

0 replies

maximearmstrong · 2024-08-26T20:57:04Z

maximearmstrong
Aug 26, 2024
Maintainer

Hi @ssillaots-boku - When using dbt assets, the run launched by Dagster is leveraging a dbt CLI invocation with the --select flag, which means that a single Dagster run selecting multiple dbt assets will always be invoked with a command like dbt run --select some_model another_model last_model.

To break your run statement in three, you'll need to create your own asset selections and jobs to launched individual Dagster runs. You can use schedules and sensors to orchestrate everything:

@dbt_assets(manifest=manifest)
def my_dbt_assets(...):
    yield from dbt.cli(...)

# Create the asset selections and jobs
some_model_job = define_asset_job(
    name="some_model_job",
    selection=build_dbt_asset_selection(
        [my_dbt_assets],
        dbt_select="some_model"
    ),
)

another_model_job = define_asset_job(
    name="another_model_job",
    selection=build_dbt_asset_selection(
        [my_dbt_assets],
        dbt_select="another_model"
    ),
)

last_model_job = define_asset_job(
    name="last_model_job",
    selection=build_dbt_asset_selection(
        [my_dbt_assets],
        dbt_select="last_model"
    ),
)

# Create a daily schedule for `some_model`, the most upstream model
some_model_schedule = ScheduleDefinition(
    name="some_model_schedule",
    cron_schedule="@daily",
    job=some_model_job,
)

# Create a sensor for `another_model`
# If `some_model_job` succeeds, a run for `another_model_job` is launched
@run_status_sensor(
    run_status=DagsterRunStatus.SUCCESS,
    request_job=another_model_job,
    monitor_jobs=[some_model_job]
)
def another_model_sensor(context):
    yield RunRequest(...)
 
 
# Create a sensor for `last_model`
# If `another_model_job` succeeds, a run for `last_model_job` is launched
@run_status_sensor(
    run_status=DagsterRunStatus.SUCCESS,
    request_job=last_model_job,
    monitor_jobs=[another_model_job]
)
def last_model_sensor(context):
    yield RunRequest(...)

Note that in this pattern, you will launch N Dagster jobs, where N is the total number of dbt models that you want to select.

If one specific model fails more often and you want to isolate it, but reduce the number of Dagster runs that are being launched, you can update your asset selections to group your dbt models:

some_model_and_upstream_selection = build_dbt_asset_selection(
    [my_dbt_assets],
    dbt_select="some_model"
).upstream()

isolated_model_selection = build_dbt_asset_selection(
    [my_dbt_assets],
    dbt_select="isolated_model"
)

another_model_and_downstream_selection = build_dbt_asset_selection(
    [my_dbt_assets],
    dbt_select="another_model"
).downstream()

1 reply

ssillaots-boku Aug 27, 2024
Author

Gotcha! Thanks for the response and comprehensive example!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run independent dbt selects for a @dbt_assets declaration with many models #23665

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Run independent dbt selects for a @dbt_assets declaration with many models #23665

Uh oh!

Uh oh!

ssillaots-boku Aug 15, 2024

Replies: 2 comments · 1 reply

Uh oh!

ssillaots-boku Aug 26, 2024 Author

Uh oh!

Uh oh!

maximearmstrong Aug 26, 2024 Maintainer

Uh oh!

ssillaots-boku Aug 27, 2024 Author

ssillaots-boku
Aug 15, 2024

Replies: 2 comments 1 reply

ssillaots-boku
Aug 26, 2024
Author

maximearmstrong
Aug 26, 2024
Maintainer

ssillaots-boku Aug 27, 2024
Author