Airflow fact vs fiction, Part 1
Debunking myths about the Airflow user experience

Introduction
Apache Airflow has come a long way since its start in 2014 as an internal tool at Airbnb. What began as a tool used mainly to orchestrate ETL pipelines feeding analytics dashboards has evolved into the world’s orchestration tool of choice, used to power complex use cases like machine learning and LLM workflows, infrastructure management, and everything needed for companies to deliver critical data products. With every new version, including the recent major 3.0 release, the project has addressed long-standing pain points, added key features like dynamic task mapping, event-driven scheduling, and DAG versioning, and embraced a community-driven roadmap that reflects how data pipelines are built today—not five or even ten years ago. In addition, managed Airflow services like Astro have continued to evolve and make it easier than ever to rely on Airflow at scale.
But despite this evolution, certain narratives about Airflow have continued to linger. You’ve probably seen them on Stack Overflow threads, Reddit debates, or blog posts written as if still using Airflow 1.x was the norm. Some critiques were valid at the time. Others were never quite accurate to begin with. And many that criticize Airflow simply haven’t kept pace with the project’s momentum.
So in this series of posts, we’re setting the record straight. We’ve compiled some of the most common statements we hear about Airflow—both good and bad—and evaluated them through the lens of the modern project. What’s still true? What’s outdated? What’s misunderstood? And what’s just flat-out fiction?
In this first post, we’ll focus on Airflow’s user experience. In the next two, we’ll dive into Airflow’s performance and architecture, and Airflow use cases, including ML and AI. Let’s dig in.
Statement: Writing DAGs is difficult and un-Pythonic
Verdict: Fiction
One of the most persistent criticisms leveled against Airflow is that authoring pipelines is an unintuitive and un-Pythonic experience. This statement originated in the Airflow 1.x days, when the only way to define DAGs was by using traditional operator classes. The result was boilerplate-heavy code that at times seemed unnatural for those who were used to writing Python, while it also had a high learning curve for those who were more comfortable working in SQL or other languages.
But this statement is years out of date. The TaskFlow API was introduced in Airflow 2.0 in 2020, providing a more Pythonic way of defining DAGs and tasks with decorators. It allows you to quickly turn any Python function into an Airflow task, and significantly reduces the typical amount of boilerplate code by simplifying the definition of dependencies and passing data between tasks.
And, as often is the case with Airflow, users were given full flexibility, with traditional operators still being a valid way to define DAGs for those who prefer to use them and the possibility to mix and match the pythonic decorators with traditional operators.
For example, turning three Python functions into an ETL pipeline is as simple as using a couple of @task decorators:
import logging from typing import Dict import requests from airflow.sdk import dag, task API = "https://www.alphavantage.co/query?function=GLOBAL_QUOTE&symbol=IBM&apikey=demo" @dag def stock_taskflow(): @task(task_id="extract", retries=2) def extract_stock_price() -> Dict: response = requests.get(API, timeout=10) data = response.json() quote = data["Global Quote"] return { "symbol": quote["01. symbol"], "current_price": float(quote["05. price"]), "previous_close": float(quote["08. previous close"]), "change": float(quote["09. change"]), "change_percent": quote["10. change percent"].replace("%", ""), "high": float(quote["03. high"]), "low": float(quote["04. low"]), "volume": int(quote["06. volume"]), } @task(multiple_outputs=True) def process_stock_data(stock_data: Dict) -> Dict: logging.info(f"Processing stock data: {stock_data}") return { "symbol": stock_data["symbol"], "current_price": stock_data["current_price"], "change": stock_data["change"], "change_percent": float(stock_data["change_percent"]), "high": stock_data["high"], "low": stock_data["low"], "volume": stock_data["volume"], } @task def store_stock_data(data: Dict): logging.info( f"Store: {data['symbol']} at ${data['current_price']:.2f} " f"with change ${data['change']:.2f} ({data['change_percent']:.2f}%) " f"Volume: {data['volume']:,}" ) store_stock_data(process_stock_data(extract_stock_price())) stock_taskflow()
More recently, Airflow 3.0 introduced assets, the next evolution of datasets from Airflow 2.x. Assets allow you to write DAGs based on the data object they create (like tables or files) rather than the tasks that create them. With the asset-oriented approach to creating pipelines, DAGs are created by defining the desired asset in a function, decorated with @asset. The benefit here is DAGs that are built around the movement of your data, making them more efficient to write and easier to understand.
from airflow.sdk import asset @asset(schedule="@daily") def raw_quotes(): """ Extracts a random set of quotes. """ import requests r = requests.get( "https://zenquotes.io/api/quotes/random" ) quotes = r.json() return quotes
These features help make DAG authoring in Airflow simpler and more Pythonic, but what if you don’t want to work in Python at all? Many Airflow users are more comfortable working with SQL, or other languages. While this may have generated valid critiques about Airflow’s learning curve in the past, today there are options for creating pipelines in Airflow without manually writing a .py file. One notable tool is DAG Factory, an open source library for constructing Airflow DAGs using configuration files.
Statement: Local Development and Testing Is Impractical
Verdict: misleading
For years, the local development experience was a significant and valid pain point for Airflow users. Developers struggled to replicate production environments on their local machines, battled complex dependency management, and found it difficult to write and execute unit tests for their pipelines.
There is still some truth to this - Airflow can be difficult to run locally depending on your setup, and testing is a complex topic that often requires nuanced implementation to meet the needs of every team.
However, it’s not as simple as “Airflow local dev is terrible”, hence why we have marked this statement as misleading. There are tools out there that make local development easier for many teams, like the free Astro CLI, which allows you to get Airflow up and running on a container service with only two commands. And for Astronomer customers, local development is even easier with the Astro environment manager, which provides an alternative way of storing secrets and connections that you can pull into your local environment, so teams no longer need to manually share connection information to test their pipelines locally.
For many teams, testing is the trickier part, and there is no one-size-fits-all answer for how best to test your DAGs. Typically, testing requires a combination of strategic local development that emulates how DAGs will also work in production, unit tests, DAG validation tests (or other guardrails your team puts in place), and more. But again here, there are ways to make this easier. For local testing, the dag.test()
method (Airflow 2.5 and later) allows you to run all tasks in a DAG within a single serialized Python process, without running the Airflow scheduler, which means you can iterate faster and use IDE debugging tools when developing DAGs. And for testing when deploying to production, it can help a lot to have a robust CICD process like is possible with Astro.
Statement: Airflow does not support Dynamic Pipelines
Verdict: Fiction
Airflow's origins are rooted in time-based, scheduled batch processing. Since Airflow pipelines are written in Python, you could always build in a dynamic aspect to your tasks using loops or something similar, or even generate DAGs themselves dynamically using a script. But these methods were hacky, lacked visibility, and frequently caused performance issues. For a long time, there was no first class feature in Airflow to support dynamic workflows.
But Airflow’s capabilities for dynamism have evolved significantly, culminating in a highly flexible platform that supports multiple layers of dynamic behavior. Dynamic task mapping, introduced in Airflow 2.3, was the biggest leap forward here. Using the .expand() function, a single task definition can be "mapped" over the output of an upstream task, generating a variable number of parallel task instances at run time. For example, a task could list files in a directory, and a downstream mapped task would then run in parallel for each file found. This allows the DAG's structure to adapt to the data it is processing, while maintaining the full history of every DAG run. The following code snippet shows a simple example:
from airflow.sdk import task @task def add(x: int, y: int): return x + y added_values = add.partial(y=10).expand(x=[1, 2, 3])

Granted, despite dynamic task mapping being available, some teams will still need to generate full DAGs dynamically. Even this has gotten better - while it can still create performance issues at very high scale, improvements in the stability and scalability of Airflow’s architecture (stay tuned for Part 2 of this series for more on this!) has made this much more tenable for large teams.
In addition to dynamic pipeline code, you can schedule DAGs more dynamically using assets or event-driven scheduling in Airflow 3 (or datasets in Airflow 2), so that pipelines run only when the data they depend on is ready, not on a predefined schedule. Airflow 3 also removed the unique constraint on the logical date, meaning you can trigger multiple, concurrent, parameterized runs of the same DAG, making certain dynamic inference execution use cases easier.
There is much more to come in Airflow for all of these features, especially event-driven scheduling, where watchers for other systems (such as Apache Kafka, Google Pub/Sub, Amazon S3) are being created as we write this. But even today, Airflow’s capabilities for dynamic pipelines are significantly greater than they were even a couple of years ago.
Conclusion
Airflow today is not the same tool it was five years ago. The developer experience has changed dramatically: there are more options for DAG authoring, local development is easier with container-based tooling, and modern features like dynamic task mapping and event-driven scheduling have unlocked entirely new use cases.
Some of the narratives around Airflow’s developer experience, shaped by early versions or passed along by secondhand stories, no longer reflect the current reality. In this post, we’ve revisited some of the most common critiques of Airflow’s user experience, and evaluated them in the context of what the project is today:
Statement | Verdict | Modern features |
Authoring DAGs is difficult and un-Pythonic | ❌ Fiction | TaskFlow API, Assets, DAG Factory |
Local development and testing is impractical | ⚠️ Misleading | Astro CLI, CI/CD, Connections management on Astro |
Airflow does not support dynamic pipelines | ❌ Fiction | Dynamic task mapping, Assets and event-driven scheduling |
This post focused only on common misconceptions related to Airflow’s user experience. In parts 2 and 3, we’ll cover how Airflow performs at scale and dig into advanced use cases including ML and AI. Stay tuned.