-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamically Mapped Tasks: DB performance issues #39680
Comments
This pretty fine, because Airflow tried to obtain row level lock by utilise statement |
Correction: |
But it seems what has been added in #38914. I'm not sure that ignoring this error can be considered a solution. If it always results in an ERROR when trying to obtain a lock, why do we even need to attempt the query? |
Are you sure that it is always return an error? |
I'm not sure. But then should be kind of retry mechanism(I'm also not sure if we have it). Anyway I just see a lot of errors in DB and my first guess: something is not correct with transaction level. |
This one use in mini scheduler mechanism, which might fail and it is fine and by design, this just an optimisation mechanism. |
As measure might be use
|
yeah, would be great to avoid DB internal error |
Feel free to fix it. Apache Airflow it is open source project, and every one could contribute changes (fixes/features) back, especially if they know what then outcome achievement/benefits of the changes. So I would recommend try to patch this part into your side and check is it still work as expected, and without additional error logs in DB backend and without side effects. diff --git a/airflow/models/taskinstance.py b/airflow/models/taskinstance.py
index f154461a77..a08b9cd94a 100644
--- a/airflow/models/taskinstance.py
+++ b/airflow/models/taskinstance.py
@@ -3454,8 +3454,12 @@ class TaskInstance(Base, LoggingMixin):
run_id=ti.run_id,
),
session=session,
- nowait=True,
- ).one()
+ skip_locked=True,
+ ).one_or_none()
+ if not dag_run:
+ # Need to log something?
+ session.rollback()
+ return
task = ti.task
if TYPE_CHECKING: |
thanks @Taragolis, I will try with my first PR) |
@VladimirYushkevich thanks for the PR, I am seeing the same errors since updating to |
I tested the image from PR on our environment and this error disappeared. |
@Taragolis, do I need to ask for the review explicitly? I believe the PR is ready from my side, but maybe I'm missing something. |
I've added couple additional reviewers which might more familiar with the mini scheduler. But it might take a time - all review happen by people on their free time, and some reviews require a bit more time that others. You could also ask a round in |
Does #39745 solves this issue or are there additional tasks? |
It does, we can close it. |
Apache Airflow version
2.9.1
If "Other Airflow 2 version" selected, which one?
No response
What happened?
We are running Airflow on Kubernetes (GCP) with a Postgres database (Cloud SQL). We are using
pgbouncer
as a DB connection pool. We have a single DAG in a separate Airflow worker pool that runs every hour and creates 1000+ Dynamically Mapped Tasks. As mentioned in #35267 (comment) upgrading to2.9.1
helped to eliminate long-running transactions. However, it introduced another issue that we did not encounter in the previous version:could not obtain lock on row in relation "dag_run"
errors:What you think should happen instead?
No response
How to reproduce
Create dag with following tasks:
Operating System
Debian GNU/Linux 12 (bookworm)
Versions of Apache Airflow Providers
apache-airflow-providers-celery==3.6.2
apache-airflow-providers-common-io==1.3.1
apache-airflow-providers-common-sql==1.12.0
apache-airflow-providers-datadog==3.5.1
apache-airflow-providers-fab==1.0.4
apache-airflow-providers-ftp==3.8.0
apache-airflow-providers-google==10.17.0
apache-airflow-providers-http==4.10.1
apache-airflow-providers-imap==3.5.0
apache-airflow-providers-postgres==5.10.2
apache-airflow-providers-slack==8.6.2
apache-airflow-providers-smtp==1.6.1
apache-airflow-providers-sqlite==3.7.1
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: