Dynamic DAGs

Watch On Demand

Hosted By

  • Kenten Danas
  • Viraj Parekh

Since the release of dynamic task mapping in Airflow 2.3, many of the concepts in this webinar have been changed and improved upon. Please check out our newer Dynamic Tasks in Airflow webinar for the latest dynamic dag best practices, including how dynamic tasks can accomplish many of the same use cases more efficiently.

The simplest way of creating an Airflow DAG is to write it as a static Python file. However, sometimes manually writing DAGs isn’t practical.

Maybe you have hundreds or thousands of DAGs that do similar things, with just a parameter changing between them. Or maybe you need a set of DAGs to load tables, but don’t want to manually update DAGs every time those tables change.

In these cases, and others, it can make more sense to dynamically generate DAGs. Because everything in Airflow is code, you can dynamically generate DAGs using Python alone.

In this webinar, we’ll talk about when you might want to dynamically generate your DAGs, show a couple of methods for doing so, and discuss problems that can arise when implementing dynamic generation at scale.

In this webinar we cover:

Generating DAGs - The Static Way

Most people who have used Airflow are familiar with defining DAGs statically.

You create a Python file, instantiate your DAG, and define your tasks.


But What Actually Makes a DAG?



A dynamically generated DAG is created when each parsing of the DAG file could create different results.

Why is this useful?

Dynamically generating DAGs can be helpful when you have DAGs that follow a similar pattern, and:

Ways to Dynamically Generate DAGs: Single File

Create a Python script that lives in your DAG_FOLDER that generates DAG objects.

You may have a function that creates the DAG based on some parameters, and then a loop that calls that function for each input.

Those parameters may come from:



Ways to Dynamically Generate DAGs: Multiple Files

Create a Python script (or other script) that actually generates DAG .py files, which are then loaded into your Airflow environment.

This is most straightforward if you are parameterizing the same DAG structure, and want to automatically read those params from YAML, Json, etc.



Pros and Cons



Any code in the DAG_FOLDER will be executed on every Scheduler heartbeat. Methods where that code is dynamically generating DAGs, such as the single-file method, are more likely to cause performance issues at scale.

If DAG parsing time > Scheduler heartbeat interval, the scheduler can get locked up and tasks won’t be executed.

Community Tools A notable tool for dynamically creating DAGs from the community is dag-factory. dag-factory is an open source Python library for dynamically generating Airflow DAGs from YAML files.



Code Examples

This repo contains an Astronomer project with multiple examples showing how to dynamically generate DAGs in Airflow. https://github.com/astronomer/dynamic-dags-tutorial