Skip to content

Alert if a task misses deadline #18031

@yuqian90

Description

@yuqian90

Description

We often need to be notified if a task is not finished by certain deadline. This may sound very similar to the existing SLA concept, but unfortunately, the SLA implementation makes it not useful for such cases for a few reasons:

  1. sla_miss_callback only fires after the task finishes. That means if the task is never finished in the first place due to it being blocked, or is still running, sla_miss_callback is not fired.
  2. SLA is defined as a timedelta relative to the execution_date. But we may have deadlines specified in various timezone that is difficult to define as a simple timedelta relative to the execution_date. Some DAGs are triggered externally, meaning they don't have a fixed schedule or fixed start time, making dag.following_schedule() and thus SLA not to work.
  3. sla_miss_callback is an attribute of the DAG. But users often need task level notification if deadlines are missed.

Given all these shortcomings of SLA, I'm proposing to create a new task level concept called deadline and its corresponding deadline_miss_callback, where deadline is defined as a jinja-template str that can be converted to a pendulum.DateTime object, and deadline_miss_callback is a callable to be called if the task is not finished by the given deadline.

The alternative is to revamp SLA to become a timezone-aware datetime object rather than a timedelta, and making sure sla_miss_callback is called at the deadline rather than after the task finishes. These two changes may make the SLA concept very different from what it currently is.

Use case/motivation

For example, given the simple DAG shown below, we need to know at 20210904 05:00 America/New_York if the generate_model task of the 20210903 DagRun is not yet finished. So if either download_file or generate_model takes too long, causing generate_model not to be finished by this deadline, the users should be notified.

download_file >> generate_model

Related issues

I see a few attempts to revamp/improve SLA. There's some overlap, but none of them does exact what's needed here.

#12008
#16389
#8545

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions