This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Massive slowdown of DAG File processor due to JSON schema upgrade in Airflow's core. #28059
Closed
2 tasks done
Uh oh!
There was an error while loading. Please reload this page.
Apache Airflow version
2.4.3
What happened
We recently updated our dev environment from airflow 2.2.5 (python 3.9) to 2.4.3 (python 3.10). For our workloads, we use a DAG Factory that parses JSONS and converts them into DAGs. In airflow 2.2.5, the DAG Factory needed approximately 30-140 seconds to generate 100 DAGs. In airflow 2.4.3, the same Dags required considerably more time to load (from 3x to over 5x at some tests).
We investigated by using scalene (python profiler) by running the DagFileProcess directly and discovered the following:

A huge percentage of CPU time was spent on json validation at line 91 of models/params. Indeed, our Factory does generate quite a few params per DAG it creates, so it would make sense for it to need some time to validate all of them per DAG. However, upgrading airflow shouldn't result in such a big, flat increase in parsing time, and we figured that jsonschema was the probable issue.
To verify that the JSON validation was the reason for the increase, we checked airflow's dependencies and found out that in the official image for 2.4.3 jsonschema version 3.2.0 is used, in airflow 2.2.5, jsonschema 4.17.3 is used instead.
As a final test, we uninstalled jsonschema version 4.17.3 from our image and replaced it with 3.2.0. The DAG Factory immediately run as expected, taking approximately 30 seconds to load 100 DAGs when the cluster was under little load, or about 100-140 when the cluster was under heavy load.
Example logs:
Version 2.4.3:
{{processor.py:176}} INFO - Processing /opt/airflow/dags/{other_folders}/{file_name}.py took 125.556 seconds
Version 2.2.5:
{{processor.py:249}} WARNING - Killing DAGFileProcessorProcess (PID=5943)
This occured constantly with a timeout setting of 300 seconds
What you think should happen instead
Airflow should require the same time to parse 100 DAGs in both versions.
How to reproduce
Create a DAG with many params (ideally over 20-30, the more the better), using mainly string, integer and nested dict types. Check how long it takes to load in airflow 2.2.5. Then use airflow 2.4.3. There should be a noticeable difference in loading times (at least 3x).
Operating System
Debian GNU Linux 11
Versions of Apache Airflow Providers
No relevant providers used
Deployment
Other Docker-based deployment
Deployment details
We use an AKS cluster in combination with a customised Docker image stemming from the official full docker image (not slim).
Anything else
This problem may be very noticable for us and our deployment due to the way we build DAGs (many params), but it should impact all DAG generation where params are used.
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: