Tasks in the upstream_failed state should be cleared when the upstream failed tasks are cleared #31540

dstaple · 2023-05-24T12:57:18Z

dstaple
May 24, 2023

Description

When a failed task is deleted or cleared, any downstream tasks should be cleared from the "upstream_failed" state. This should not require users to manually select the "downstream" and "recursive" options, which are more invasive and will clear downstream tasks regardless of their state. In general, if a task is in the "upstream failed" state, it should actually have upstream tasks that are in the failed state.

Use case/motivation

When a user clears or deletes a failed task, downstream tasks can be left in the "upstream_failed" state. This happens any time a failed task is deleted via the Browse, Task Instances search feature. It also happens in the graph view if a failed task is cleared without downstream and recursive checked.

The issue with the current behavior is if a previously failed tasks succeeds after being cleared, then any downstream tasks will be indefinitely orphaned in an inconsistent state where they say "upstream_failed", but actually, these tasks no longer have any upstream tasks that have failed.

This behavior is also inconsistent with what happens when you manually mark a task as success in the graph view: In that scenario, all downstream tasks that were previously in the "upstream_failed" state are immediately cleared and will attempt to run as expected.

Finally, the current behavior limits the usability of clearing tasks via the task search feature and makes it prone to user error.

The behavior that occurs when tasks are manually marked as success makes more sense because it prevents tasks from being orphaned. Clearing a task and having it succeed should have the same behavior as manually marking a task as success: namely downstream tasks that are specifically in the upstream_failed state should be cleared.

Related issues

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

hussein-awala · 2023-05-25T09:23:01Z

hussein-awala
May 25, 2023
Collaborator

Why do you think that only_failed flag is not enough to achieve what you want to do?

IMO implementing your suggestion as the default behavior is unlikely to occur. Even if we manually change the task's state to success, we often need to rerun it when we clear the upstream task, as it relies on the output of one of its upstream tasks.

1 reply

dstaple May 25, 2023
Author

Why do you think that only_failed flag is not enough to achieve what you want to do?

Clearing with downstream, recursive, and failed checked in graph view can work around the issue in that one specific case. However:
(1) This does not fix the search functionality, which still leaves upstream tasks in the upstream_failed state.
(2) This does not resolve the inconsistency between clearing a task versus marking a task as success. The latter has a much better behavior.

Even if we manually change the task's state to success, we often need to rerun it when we clear the upstream task, as it relies on the output of one of its upstream tasks.

Manually marking a task as success has a completely different behavior. Downstream tasks are correctly cleared if they are in the upstream_failed state and the upstream task is marked as success. Clearing a task and having it succeed should have the same behavior as marking a task as success.

The current behavior of silently leaving tasks in an inconsistent “upstream_failed” state if an upstream task is cleared causes production issues because a graph stops moving after a task is cleared. This does not happen when you manually mark a task as success.

OfSixes · 2023-07-28T09:21:49Z

OfSixes
Jul 28, 2023

IMO implementing your suggestion as the default behavior is unlikely to occur. Even if we manually change the task's state to success, we often need to rerun it when we clear the upstream task, as it relies on the output of one of its upstream tasks.

I don't think this is the suggestion though? The suggestion is exactly to indeed rerun these tasks when the upstream task is cleared.

The point here is that there is a difference in behavior how "upstream_failed" tasks react when the upstream task that triggered this state leaves the "failed" state. I think everyone agrees that the "upstream_failed" state only makes sense if there actually is an upstream task that has failed (as it is essentially a "no status" state within a context, not a state on its own).

Current behavior:

	Mark failed task as success	Clear failed task
Failed task behavior	status "failed" -> status "success"	status "failed" -> status "no status"
Downstream task behavior	status "upstream_failed" -> status "no_status"	nothing happens

If I am reading this correctly the only suggestion is to align "nothing happens" to also mark downstream tasks as "no_status", allowing the DAG to properly progress when the failed task is no longer blocking traversal of the graph.

0 replies

dstaple · 2023-08-09T12:43:43Z

dstaple
Aug 9, 2023
Author

@OfSixes Yes you are correct. The current behavior of upstream failed is arguably a bug. At best it is treated inconsistently and a source of production issues.

To work around this I am currently using a system outside Airflow to check for and deal with this inconsistent behavior. However, this should be fixed in Airflow itself.

I think everyone agrees that the "upstream_failed" state only makes sense if there actually is an upstream task that has failed

Yes, this is the short story, and the reason I raised this as an issue #31510 / argue this is a bug.

@OfSixes If you think this is still an issue then you can try posting your own issue with your deployment information and link to this discussion or the original issue.

Alternatively @hussein-awala if you are still following this thread, can we get some eyes on this? I realize the issue looks at a glance like a feature request, but I argue this is not the case, the current behavior is inconsistent and can be significantly improved.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tasks in the upstream_failed state should be cleared when the upstream failed tasks are cleared #31540

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Tasks in the upstream_failed state should be cleared when the upstream failed tasks are cleared #31540

Uh oh!

dstaple May 24, 2023

Description

Use case/motivation

Related issues

Are you willing to submit a PR?

Code of Conduct

Replies: 3 comments · 1 reply

Uh oh!

hussein-awala May 25, 2023 Collaborator

Uh oh!

dstaple May 25, 2023 Author

Uh oh!

OfSixes Jul 28, 2023

Uh oh!

Uh oh!

dstaple Aug 9, 2023 Author

dstaple
May 24, 2023

Replies: 3 comments 1 reply

hussein-awala
May 25, 2023
Collaborator

dstaple May 25, 2023
Author

OfSixes
Jul 28, 2023

dstaple
Aug 9, 2023
Author