-
Notifications
You must be signed in to change notification settings - Fork 16.2k
Description
Description
We often need to be notified if a task is not finished by certain deadline. This may sound very similar to the existing SLA concept, but unfortunately, the SLA implementation makes it not useful for such cases for a few reasons:
- sla_miss_callback only fires after the task finishes. That means if the task is never finished in the first place due to it being blocked, or is still running,
sla_miss_callbackis not fired. - SLA is defined as a timedelta relative to the
execution_date. But we may have deadlines specified in various timezone that is difficult to define as a simple timedelta relative to theexecution_date. Some DAGs are triggered externally, meaning they don't have a fixed schedule or fixed start time, makingdag.following_schedule()and thus SLA not to work. - sla_miss_callback is an attribute of the DAG. But users often need task level notification if deadlines are missed.
Given all these shortcomings of SLA, I'm proposing to create a new task level concept called deadline and its corresponding deadline_miss_callback, where deadline is defined as a jinja-template str that can be converted to a pendulum.DateTime object, and deadline_miss_callback is a callable to be called if the task is not finished by the given deadline.
The alternative is to revamp SLA to become a timezone-aware datetime object rather than a timedelta, and making sure sla_miss_callback is called at the deadline rather than after the task finishes. These two changes may make the SLA concept very different from what it currently is.
Use case/motivation
For example, given the simple DAG shown below, we need to know at 20210904 05:00 America/New_York if the generate_model task of the 20210903 DagRun is not yet finished. So if either download_file or generate_model takes too long, causing generate_model not to be finished by this deadline, the users should be notified.
download_file >> generate_modelRelated issues
I see a few attempts to revamp/improve SLA. There's some overlap, but none of them does exact what's needed here.
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct