Blog |

Introducing Apache Airflow 2.10

7 min read |

Every couple of months, the Apache Airflow project releases a new version
with numerous features, improvements, and bug fixes that enhance
functionality. The release of Airflow 2.10 brings greater flexibility and
expansion of some of the most widely used features. This release contains
more than 40 great new features, over 80 improvements, and over 40 bug
fixes.

While we are, of course, excited about every new feature and improvement,
we’re particularly jazzed about the new dataset improvements. Datasets are
one of the most popular Airflow features and are often a key part of
implementing rapidly growing use cases like MLOps and GenAI. The updates
in this release make the feature more flexible and easy to use, and we are
sure the community will be quick to adopt them.

But 2.10 is definitely not only about datasets! This blog post will walk
you through everything you need to know about this release, so you don’t
miss out on any of the exciting additions and changes.

Dataset Enhancements

Datasets and data-aware
scheduling

were originally released in Airflow 2.4. They provide a way for DAGs that
access the same data to have explicit, visible relationships and get
scheduled based on updates to these datasets. Using datasets allows you to
create smaller DAGs instead of large monolithic ones and allows different
teams to maintain their own DAGs, even if data is shared across them,
while gaining increased visibility into DAG dependencies.

In the 2023 Airflow survey, almost 50% of users indicated they have
adopted dataset functionality. The previous release, Airflow
2.9
,
brought the biggest updates yet to this important feature, and 2.10 builds
upon those improvements.

Dynamic Dataset Definition

In previous versions of Airflow, dataset inlets and outlets were required
to be set during DAG parsing time; in other words, they were static. This
design helped avoid poorly formed dataset URIs but did not allow the
flexibility of setting inlets and outlets during task execution, which
could be helpful in cases where you wanted to use datasets in combination
with other features like dynamic task mapping.

To make this feature more flexible, Airflow 2.10 brings a new class,
DatasetAlias, that can accept dataset values and is resolved at runtime.
The alias allows you to define downstream schedules or inlets without
knowing the exact name of the dynamic dataset ahead of time. To use a
dataset alias, you simply set it as an outlet for your task and then
associate dataset events to it by defining outlet_events. For example,
you might have:

In this case, the ds part of the dataset URI will be filled in at runtime
based on the information passed to the task. Since you don’t know that
information ahead of time, you can schedule a downstream DAG based on the
alias:

This feature is very flexible and is designed to work with older
implementations of datasets as well. Even if you use an alias, you can
still schedule based on a dataset URI, and you can add multiple events to
a single alias.

Add Metadata to Dataset Events

One other benefit of the new dataset alias feature is that you can now
attach metadata to an event using either the extra parameter or the
Metadata class.

This allows you to save information about data that was processed, such as
the number of records processed in that task, a new model accuracy score
after training, or the filenames of any processed files. This metadata can
also be used by tasks in downstream DAGs that interact with the same
dataset.

Dataset UI Updates

To support the new dataset alias feature, the datasets page has gotten a
refresh to focus on dataset events. The new view has richer information
about each dataset event, including the source, DAG runs that were
triggered by that dataset, and extras.

The dependency graph and list of all datasets in that Airflow instance are
now on separate tabs, making it cleaner and easier to navigate.

Dataset events are also now shown in the Details tab of each DAG run and
in the DAG graph.

User Interface Improvements

Nearly every Airflow release brings great UI updates that improve the
experience of working with Airflow, and Airflow 2.10 is particularly
exciting in this regard. In addition to the dataset UI updates mentioned
above, this release brings a highly requested and anticipated dark mode to
Airflow.

By simply toggling the icon on the right side of the navigation bar, you
can switch easily between light and dark mode.

In addition, 2.10 brings other convenient features to the UI, including a
new button to reparse DAGs on demand, thanks to the addition of a DAG
reparsing endpoint to the API.

You also have more visibility in the 2.10 UI, such as task failed
dependencies on the details page and a better XCom display thanks to the
view being rewritten as a proper JSON react view.

Lineage Enhancements

Data lineage can help with everything from understanding your data
sources, to troubleshooting job failures, to managing PII, to ensuring
compliance with data regulations. OpenLineage, the industry standard
framework for data lineage, has a robust Airflow integration that allows
you to have more insight into the operation and structure of the complex
data ecosystems that Airflow orchestrates.

The OpenLineage Airflow integration has been around and in use for a
while. However, it previously only gathered lineage information from
explicitly implemented operators. One large gap was the PythonOperator,
which, despite being the most widely used Airflow operator, had no support
for lineage.

Now, with AIP
62
,
instrumentation has been added to get lineage information from important
hooks, so that popular operators like the PythonOperator, as well as the
TaskFlow API, and Object Storage API can emit lineage information. This is
a key step forward in closing the gaps for lineage in Airflow that will
translate to real-world benefits for users.

Multiple Executor Configuration

Picking an executor is one of the important choices you must make when
setting up your Airflow instance. Each executor (Celery and Kubernetes
being the most common) has advantages and disadvantages, balancing factors
like latency, isolation, and compute efficiency. In previous versions of
Airflow, you could pick just one executor for your Airflow instance,
potentially leading to tradeoffs for your workflows.

Now, Airflow supports configuring multiple executors concurrently, so you
can have the best of both worlds. Once multiple executors are set up in
your Airflow config, you can assign specific tasks to the one that
optimizes resource utilization, latency, and custom execution
requirements.

Note that if you are an Astronomer customer, Astro does not currently
support configuring multiple executors for one Deployment. However, using
worker
queues

with the Celery executor offers similar customization for task execution.

Other Noteworthy Features and Updates

There are lots more notable updates in 2.10 to be aware of, including:

  • Deferrable operators can now start execution directly from the triggerer
    without going to the worker. For certain operators, like sensors, this is
    more efficient and can save teams time and money.

  • As part of AIP
    64
    ,
    task instance history is now kept for all task instance tries, not only
    the most recent attempt. This information is available to users now, but,
    excitingly, it is also part of the development of DAG
    versioning
    ,
    which will come in a future Airflow release.

  • Important executor logs are now sent to the task logs. If the executor
    fails to start a task, the relevant error messages will be accessible to
    the user in the task logs, making debugging much easier.

And even these updates barely scratch the surface. Regardless of how you
use Airflow, there’s something for you in 2.10.

Get Started with Airflow 2.10

Airflow 2.10 has way more features and improvements than we can cover in a
single blog post. To learn more, check out the full release
notes
, and join us
for a
webinar
on August 22nd that will cover the new release in more detail.

To try Airflow 2.10 and see all the great features for yourself, get
started with a Free 14-Day Trial of
Astro
.
We offer same-day support for all new Airflow releases, so you don’t have
to wait to take advantage of the latest and greatest features.

Build, run, & observe your data workflows.
All in one place.

Build, run, & observe
your data workflows.
All in one place.

Try Astro today and get up to $20 in free credits during your 14-day trial.