-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Describe the current behavior
#18466 introduced the prefect.run-count field to flow and task run state change events. This is extremely useful for limiting the number of times an automation may execute once a run has reached the "Running" state by matching on its value. However, there are situations before a flow run hits "Running" (when run-count is 0) where it would be beneficial to be able to retry the run a specific number of times, such as if the image cannot be pulled due to a network blip or a transient websockets issue. The within field of an event trigger can be used as a way to limit the number of retries in this case, but it is not intuitive and error-prone; you must reason about how long it takes a flow run to spin up, and can't definitively say that your run will attempt to execute a set number of times.
Describe the proposed behavior
Add submission_count as a field to the FlowRun class to track the number of times a run was submitted, separately from the run count. Expose the field in flow run state change events to enable retrying failures on startup a specific number of times.
Example Use
A sample Automation trigger to retry a flow run that Crashed before it made it to "Running"
{
"type": "event",
"match": {
"prefect.submission-count": [
"1",
"2"
],
"prefect.resource.id": "prefect.flow-run.*"
},
"match_related": {
"prefect.resource.id": [
"prefect.deployment.*"
]
},
"after": [],
"expect": [
"prefect.flow-run.Crashed"
],
"for_each": [
"prefect.resource.id"
],
"posture": "Reactive",
"threshold": 1,
"within": 0
}Additional context
No response