-
|
Over the last few weeks I’ve been debugging severe API server instability when loading the Airflow UI on Airflow 3.x. I’m running on Kubernetes using version ProblemI have a DAG that fans out using mapped tasks. At peak:
Airflow executes all of this successfully when the UI is not being used. However, as soon as I load the UI and start clicking around, the following happens:
The system remains stable indefinitely as long as I do not load the UI. The issue only occurs with UI interaction. Workaround (but not acceptable)If I set: …the UI becomes stable again, and the API server stops crashing. But this throttles my DAG to only 2 parallel runs, and I’d like the system to handle 10 runs in parallel as intended. Limiting concurrency only to make the UI load seems unnecessary. QuestionAre there recommended settings to make the API server/UI more resilient with many active DAG runs? What I’ve Tried (None of These Helped)1. Increased PgBouncer pool sizespgbouncer:
maxClientConn: 500
metadataPoolSize: 50
resultBackendPoolSize: 252. Disabled SQLAlchemy pooling3. Changed health probes
4. Scaled API server significantly
5. Scaled scheduler significantly
6. Upgraded AirflowI'm seeing this issue on the below version of Airflow:
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
|
I am facing exactly the same issue. Running on AKS in Azure, with Celery and Kubernetes executor |
Beta Was this translation helpful? Give feedback.
-
|
Upgrading to Airflow 3.1.5 seems to have resolved this issue for me. The release notes mention fixing a few issues related to excessive DB queries, so that might be what solved it. |
Beta Was this translation helpful? Give feedback.
Upgrading to Airflow 3.1.5 seems to have resolved this issue for me.
The release notes mention fixing a few issues related to excessive DB queries, so that might be what solved it.