Join us for Astro Days: NYC on Sept 27!
Webinar Recap

What's New in Airflow 2.3

By Kenten Danas, Field Engineer at Astronomer

Useful links:

1. What is Apache Airflow?

Apache Airflow is one of the world’s most popular data orchestration tools — an open-source platform that lets you programmatically author, schedule, and monitor your data pipelines.

Apache Airflow was created by Maxime Beauchemin in late 2014, and brought into the Apache Software Foundation’s Incubator Program two years later. In 2019, Airflow was announced as a Top-Level Apache Project, and it is now considered the industry’s leading workflow orchestration solution.

Key benefits of Airflow:

  • Proven core functionality for data pipelining
  • An extensible framework
  • Scalability
  • A large, vibrant community

2. Airflow Updates Timeline

whats-new-airflow-2-3-image2

3. What’s new in Airflow 2.3?

  • 2 Major New Features
  • Over 30 Minor New Features
  • Over 200 Merged PRs

4. Major New Features

Dynamic Task Mapping

Dynamic task mapping allows DAGs to take a set of inputs at run time and creates a single task for each one.

This is helpful for use cases like:

  • ETL for processing files or tables, without knowing their final number in advance.
  • Running a particular task per customer, when the list of customers changes frequently.
  • ML applications like hyperparameter tuning, or training a different set of models for each input dataset.

Benefits of Dynamic Task Mapping:

  • Implement tasks that change at run time based on the state of previous tasks
  • Maintain task atomicity
  • Reduce top-level code in your DAG files
  • Maintain history of tasks even when that particular input no longer exists
  • Avoid the pitfalls of dynamic DAGs at a high scale

Dynamic Task Mapping is simple to implement with two new functions:

  • expand(): This function passes the parameter or parameters that you want to map on. A separate parallel task will be created for each input.
  • partial(): This function passes any parameters that remain constant across all mapped tasks generated by expand().

The following task code example uses both of these functions to dynamically generate 3 task runs:

@task
    def add(x: int, y: int):
        return x + y

    added_values = add.partial(y=10).expand(x=[1, 2, 3])

Grid View

The grid view is a UI update replacing the tree view, focusing on task history. The grid view shows runs and tasks but leaves dependency lines to the graph view, and handles task groups better.

Example of an ELT task grid view:

whats-new-airflow-2-3-image1

Benefits of the grid view:

  • Focus on task history
  • More granularity and control than the tree view
  • Dependencies and structure within the graph view.
  • Durations of DAG runs visible to quickly inspect performance changes.
  • Summaries for task groups.
  • Play icon to better indicate manual DAG runs.

5. Demo

Begins at 15:15 in the webinar video.

Demo example: A task called “make_list” makes a list to map over, returning a list of some random number of integers between two and four. That list then gets passed to the downstream task that is going to take it and get mapped accordingly.

whats-new-airflow-2-3-image4

The graph view of two tasks:

whats-new-airflow-2-3-image3

More examples discussed in the Demo and Q&A.

6. Minor New Features

New/Updated Operators

  • BranchPythonOperator Decorator added
  • ShortCircuitOperator configurability updated to respecting downstream trigger rules
  • DummyOperator replaced with EmptyOperator
  • New show_return_value_in_logs parameter for PythonOpertor
  • SmoothOperator added

Connection Updates

  • The requirement that custom connection UI fields be prefixed removed
  • JSON serialization for connections enabled

Other great additions

  • New ALL_SKIPPED trigger rule
  • New Events custom timetable
  • New Airflow db downgrade CLI command
  • dag_id_pattern parameter added to the /dags endpoint

7. Learn more

Getting Apache Airflow Certified

Join the 1000s of other data engineers who have received the Astronomer Certification for Apache Airflow Fundamentals. This exam assesses an understanding of the basics of the Airflow architecture and the ability to create simple data pipelines for scheduling and monitoring tasks.

Keep Your Data Flowing with Astro

Get a demo that’s customized around your unique data orchestration workflows and pain-points.

Get Started