-
Notifications
You must be signed in to change notification settings - Fork 16.1k
Description
Body
When two tasks run at the same time and they contribute to the creation of the same dag run and partition key, then they can create different apdr records. This has to be fixed somehow. Maybe different design. More notes on options in comments below.
How to repro
Check out branch poc-asset-partitions. .
Make 2 identical dags with schedule e.g. */5 * * * *, each updating a different asset. Make another dag that listens to those assets using partitioned asset timetable.
Enable all the dags.
The two producer dags will run at basically the same time. Two APDR records will get created instead of one, and as a result, the downstream dag won't schedule.
More explanation
right now, as soon as one asset fires which maps to target key X, we create APDR record for target key X
once that record is created, scheduler will consider APDR record for target key X in each loop
when the second asset fires an event which maps to target key X, it should see that there is an existing APDR record and it should therefore just create the PAKL record which maps to this APDR record
but if they fire at exactly the same time, currently due to this bug, they will both see that there is no APDR record, and they will therefore both create an APDR record, and therefore even though a run should be created, it isn't
Committer
- I acknowledge that I am a maintainer/committer of the Apache Airflow project.