Debunking myths about Airflow’s use cases
Airflow fact vs fiction, Part 3

Introduction
Apache Airflow has significantly evolved from its origins at Airbnb in 2014, from mostly orchestrating ETL pipelines into the industry standard for complex data workflows, powering everything from machine learning and GenAI, infrastructure management to mission-critical analytics (the exact topic of this post!). Yet outdated narratives and misconceptions persist, despite major advancements, including in the recent Airflow 3.0 release. In this series, we’re separating fact from fiction to clarify what’s true, what’s changed, and what’s simply misunderstood.
In Part 1 of this series, we looked at some of the most persistent myths about Airflow’s user experience, focused on DAG authoring, local development, and dynamic pipelines. In Part 2, we looked at Airflow’s architecture and performance, digging into scaling, scheduler reliability, and processing data. A consistent theme was that valid critiques from older versions of Airflow have stuck around, even though Airflow has evolved significantly.
Similarly, it’s still common to hear that Airflow is “just for batch ETL” or “isn’t built for AI workflows.” Others suggest it’s not ideal for modern, event-driven or real-time systems—or that it’s legacy tech altogether. These claims often overlook just how widely Airflow is used today, and how the project has continued to evolve to support an expanding range of orchestration needs across teams, industries, and use cases.
So in this post, we’ll explore where Airflow fits in the modern data stack: what it’s great at, where it still isn’t the best choice by itself, and how teams are using it today to power everything from ETL pipelines to complex GenAI workflows to managing infrastructure.
Let’s get into it.
Statement: Airflow only supports time-based schedules for pipelines
Verdict: Fiction
This statement has never really been true. It may come from the fact that Airflow was designed for batch workflows, which often run on a regular schedule, but even in early versions of Airflow scheduling wasn’t limited to cron-like intervals. Features like sensors allowed pipelines to wait for external conditions (e.g. a file arriving in a bucket) before proceeding, so Airflow has long offered scheduling flexibility beyond simply running on a clock.
That flexibility has expanded significantly in more recent versions. Airflow 2.4 introduced datasets as a way to express inter-DAG dependencies based on upstream data readiness, and Airflow 3 builds on that with assets, making it easier to express workflows that should run in response to upstream data changes rather than fixed schedules. For example, you might have an MLOps pipeline where one team produces clean data and another team consumes that data in their ML models. This can be easily implemented with assets or datasets.
You can also combine asset and time schedules. This is helpful for use cases like analytics reporting, where you need your reports to be refreshed at least every day by 9am (time schedule), but also want to update them right away if new data arrives (asset schedule).
In addition, event-driven scheduling in Airflow 3 enables pipelines to be triggered directly by external events, removing the need for periodic polling altogether in many scenarios. For example, you can configure Airflow to start a DAG run as soon as a message arrives in SQS, or a Kafka topic. To see this in action, check out our video tutorial on building a personalized newsletter with Airflow 3 and AWS.
All of this means that while Airflow is still best suited for orchestrating batch or micro-batch pipelines (see the next statement for more on this), you aren’t locked into rigid, time-based schedules. Whether it’s cron-style intervals, upstream data events, external triggers via API, or asset dependencies, Airflow offers a wide range of scheduling options that reflect the diverse orchestration needs of today’s data platforms.
Statement: Airflow is not a stream processing tool
Verdict: Fact
This one’s true—and it’s important to state it clearly. Airflow was never designed to be a streaming system, and even with performance improvements in recent versions, it remains fundamentally a batch workflow orchestrator.
Airflow’s architecture is built around the idea of discrete tasks and DAG runs: you define a workflow, optionally schedule it, and Airflow manages the execution lifecycle for that workflow. Each run is tracked as an independent unit of work. That’s a perfect fit for ETL pipelines, machine learning training jobs, infrastructure automation, and other bounded workloads; but it’s a poor fit for continuous, unbounded event processing with low-latency requirements. While there is a “continuous” schedule option, it is best for use cases where you are using sensors or deferrable operators to wait for highly irregular events.
If you need to run pipelines more frequently than every minute or have very low latency requirements, Astronomer recommends using Airflow along with tools designed specifically for that purpose, like Apache Kafka. Airflow still has a role in this use case, and many teams use Airflow to orchestrate the lifecycle around streaming pipelines, including:
- Provisioning and managing infrastructure for streaming jobs
- Coordinating streaming and batch workflows together
- Running periodic aggregations on streaming outputs
- Triggering downstream processing or ML jobs based on outputs from streaming systems
And with recent features like event-driven scheduling, Airflow is even better positioned to respond to streaming data availability - just not to process the stream itself (leave that to tools like Apache Flink or Kafka).
So while Airflow is a great orchestration layer in architectures that include streaming, it’s not and never has been a streaming engine. Recognizing that distinction is key to picking the right solution for the job.
Statement: Airflow isn’t the best choice for ML or AI workflows
Verdict: Fiction
It's a common belief that Apache Airflow is only used for traditional ETL jobs, and that data teams looking to operationalize MLOps or GenAI workflows will use more niche tools specifically built for that purpose. While certainly true for some teams, this general statement isn’t held up by data from Airflow users.
According to the 2024 Airflow Survey, while 90% of respondents utilize Airflow for ETL/ELT pipelines, 23% also use it for MLOps, and 9% use it for GenAI workflows, a 24% year over year growth. In other words, ML and AI workloads are one of the fastest-growing categories of Airflow use.
And it’s no accident that Airflow is proving useful in these contexts. Several key features directly enable ML and AI orchestration:
- Dynamic task mapping makes it easy to create individual parallel tasks for multiple training jobs or inference requests at runtime, allowing workflows to respond automatically based on incoming data or experiment configurations.
- Airflow 3 has better support for inference execution. Event-driven scheduling enables on-demand execution of inference tasks, so you aren’t constrained by predefined schedules, and the removal of the unique constraint on the logical date allows for simultaneous DAG runs.
- The Airflow AI SDK, an open source package developed by Astronomer, provides a Python SDK that you can use to easily call LLMs and orchestrate agent calls directly from Airflow DAGs.
- On Astro, worker queues allow teams to assign heavy ML workloads to properly resourced workers, isolating them from lightweight orchestration tasks.
Additionally, at the time of writing this post, AIP 90, or “human in the loop”, is currently in development and will be released in Airflow 3.1, and will bring the ability for users to directly interact with orchestration flows by taking actions like choosing a branch, approving or rejecting output of tasks, or providing input. This functionality is critical for many LLM workflows, and shows how Airflow is continually evolving to support this type of use case.
In production environments, Airflow is often used as the glue that brings together different components of an ML or GenAI workflow: coordinating feature extraction, model training, evaluation, promotion, inference, and monitoring. For just a couple of examples, see using Airflow for hybrid search for eCommerce, implementing smart content moderation pipelines with Airflow, AI-ready pipelines at Veyer Logistics, or Unlocking the Power of AI at Ford. So while specialized ML orchestration platforms exist, saying that “Airflow isn’t the best choice” for ML or AI workflows misses the reality: that many teams are using it today for exactly those workflows, and recent improvements have only made it more capable.
Statement: Airflow is legacy tech, innovation is happening elsewhere
Verdict: Fiction
For this statement, we’ll spare everybody a debate of what “legacy tech” even means, and focus on the idea that more innovation is mostly happening elsewhere in the data ecosystem. The reality couldn’t be further from the truth for Airflow - both from a project development perspective and from a user perspective.
The Airflow project has been very active over the past five years, with more than 10 significant releases, including 2 major releases. During that time the number of contributors to the project has grown to over 3400, surpassing other massive Apache Software Foundation projects like Spark (>2100 contributors) and Kafka (>1200 contributors). This vibrant, active community ensures that Airflow is not just maintained but continuously improved.
Those releases have brought game-changing features — many of which we’ve covered throughout this series — like the highly available scheduler, the TaskFlow API, dynamic task mapping, assets, DAG versioning, remote execution, and more. And these aren’t just incremental improvements or random contributions; they directly address real feedback from users. The continued investment from Astronomer and many other contributors is focused on ensuring Airflow can meet the needs of modern data teams tasked with orchestrating increasingly complex workloads.
The result of this investment is shown in the other perspective of this argument: the users of Airflow. As highlighted in the 2024 Airflow survey, 82% of teams say they use Airflow beyond its initial scope, and 44% reported significant growth in new use cases. Use of Airflow for managing infrastructure, MLOps, and GenAI workflows is growing significantly year over year.
While there are certainly many new tools popping up to address these growing use cases (we’ve all seen the MAD diagram), that doesn’t mean that projects that have been around for longer are standing still. In fact, the opposite is true here: Airflow is being used more frequently and for a wider range of use cases today than it was five or ten years ago precisely because it has continued to evolve alongside the rapidly changing data ecosystem.
Conclusion
Throughout this series, we’ve revisited some of the most persistent narratives about Airflow—from the challenges of authoring DAGs and local development, to architectural critiques around scaling and reliability, and now, to perceptions about where Airflow fits in the modern data stack.
The pattern is clear: while many of these critiques stem from real limitations in Airflow’s early history, they no longer reflect the capabilities of the project today. The Airflow community has invested heavily in expanding and improving the project—not just in terms of user experience and performance, but in the breadth of use cases it supports.
Here’s a summary of the statements we examined in this post:
Statement | Verdict | Current features |
---|---|---|
Airflow only supports time-based schedules for pipelines | ❌ Fiction | Sensors, assets, event-driven scheduling, the Airflow API |
Airflow is not a streaming tool | ✅ Fact | Airflow alone should not be used for streaming use cases, but it can be used in combination with streaming tools like Kafka |
Airflow isn’t the best choice for ML or AI workflows | ❌ Fiction | Dynamic task mapping, event-driven scheduling, the Airflow AI SDK, worker queues on Astro |
Airflow is legacy tech, innovation is happening elsewhere | ❌ Fiction | Continual significant releases including Airflow 3, and massive growth in contributors and user base. |
As we close this series, the message is simple: Airflow’s relevance today isn’t defined by its origins—it’s defined by how actively it has grown and adapted to support the future of data engineering, machine learning, and AI operations.
Thanks for reading, and we look forward to seeing what the community builds next with Airflow.