Skip to content

Resolve race condition re creation of APDR records #58919

@dstandish

Description

@dstandish

Body

When two tasks run at the same time and they contribute to the creation of the same dag run and partition key, then they can create different apdr records. This has to be fixed somehow. Maybe different design. More notes on options in comments below.

How to repro

Check out branch poc-asset-partitions. .

Make 2 identical dags with schedule e.g. */5 * * * *, each updating a different asset. Make another dag that listens to those assets using partitioned asset timetable.

Enable all the dags.

The two producer dags will run at basically the same time. Two APDR records will get created instead of one, and as a result, the downstream dag won't schedule.

More explanation

right now, as soon as one asset fires which maps to target key X, we create APDR record for target key X
once that record is created, scheduler will consider APDR record for target key X in each loop
when the second asset fires an event which maps to target key X, it should see that there is an existing APDR record and it should therefore just create the PAKL record which maps to this APDR record
but if they fire at exactly the same time, currently due to this bug, they will both see that there is no APDR record, and they will therefore both create an APDR record, and therefore even though a run should be created, it isn't

Committer

  • I acknowledge that I am a maintainer/committer of the Apache Airflow project.

Metadata

Metadata

Assignees

Labels

area:Schedulerincluding HA (high availability) schedulerkind:bugThis is a clearly a bugkind:metaHigh-level information important to the community

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions