You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'll be short, this is the requirement that I have:
check for any runs that have been stuck in STARTING state for more than 10 minutes (not due to the code, because it does run on some days, but on some it gets stuck)
cancel and rerun them
Dagster version is 1.7.9.
I am fairly new to dagster, but I can get around. Now, I made a schedule, which runs every 10 minutes, and triggers a job that does this. It all goes fine - it does the detection, it cancels the stalled one, and it issues a new run, BUT, the new run never seems to start, it stays stuck in the STARTING phase... and the "rerun" job crashes with the error: AttributeError: 'DagsterRun' object has no attribute 'dagster_run'
I tried a lot of variations, and none work, this is the latest version that is the "closest" to the solution, so any and all suggestions would be more than appreciated:
@op
def rerun_stuck_jobs(context):
now = datetime.utcnow()
threshold_time = now - timedelta(minutes=10)
# Get all runs that are currently in STARTING status
filters = RunsFilter(
statuses=[DagsterRunStatus.STARTING]
)
starting_runs = context.instance.get_runs(filters)
for run in starting_runs:
run_id = run.run_id
job_name = run.job_name
if job_name == 'rerun_stuck_jobs':
continue
# Get events to find the PIPELINE_STARTING timestamp
event_records = list(context.instance.all_logs(run_id))
starting_event_time = None
for event in event_records:
if (
event.dagster_event.event_type_value == DagsterEventType.PIPELINE_STARTING
):
starting_event_time = event.timestamp
break
if starting_event_time and starting_event_time < threshold_time.timestamp():
minutes_stuck = round((now.timestamp() - starting_event_time) / 60, 2)
stuck_message = (
f"Run {run_id} from job '{job_name}' has been stuck in STARTING for "
f"{minutes_stuck} minutes. Cancelling and restarting."
)
context.log.info(stuck_message)
context.instance.report_run_canceled(run)
if not run.job_code_origin:
# this is okay, it never writes this log
context.log.warning(f"Run {run.run_id} cannot be restarted — no job_code_origin.")
new_tags = dict(run.tags)
new_run = context.instance.create_run(
job_name=run.job_name,
run_id=None,
run_config=run.run_config,
tags=new_tags,
root_run_id=run.root_run_id or run.run_id,
parent_run_id=run.run_id,
status= None,
step_keys_to_execute= None,
execution_plan_snapshot= None,
job_snapshot= None,
parent_job_snapshot= None,
asset_selection= None,
asset_check_selection= None,
resolved_op_selection= None,
op_selection= None,
remote_job_origin= None,
job_code_origin=run.job_code_origin,
asset_graph= None,
)
context.log.info(f"Launching new run: {new_run.run_id}")
context.instance.run_launcher.launch_run(new_run) # failing in here
else:
context.log.info(
f"Run {run_id} from job '{job_name}' is not stuck."
)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I'll be short, this is the requirement that I have:
Dagster version is 1.7.9.
I am fairly new to dagster, but I can get around. Now, I made a schedule, which runs every 10 minutes, and triggers a job that does this. It all goes fine - it does the detection, it cancels the stalled one, and it issues a new run, BUT, the new run never seems to start, it stays stuck in the STARTING phase... and the "rerun" job crashes with the error:
AttributeError: 'DagsterRun' object has no attribute 'dagster_run'
I tried a lot of variations, and none work, this is the latest version that is the "closest" to the solution, so any and all suggestions would be more than appreciated:
Thanks in advance! 🙏
Beta Was this translation helpful? Give feedback.
All reactions