Cross-DAG dependencies | Astronomer Documentation

When designing Airflow DAGs, it is often best practice to put all related tasks in the same DAG. However, it’s sometimes necessary to create dependencies between your DAGs. In this scenario, one node of a DAG is its own complete DAG, rather than just a single task. Throughout this guide, the following terms are used to describe DAG dependencies:

Upstream DAG: A DAG that must reach a specified state before a downstream DAG can run
Downstream DAG: A DAG that cannot run until an upstream DAG reaches a specified state

The Airflow topic Cross-DAG Dependencies, indicates cross-DAG dependencies can be helpful in the following situations:

A DAG should only run after one or more assets have been updated by tasks in other DAGs.
Two DAGs are dependent, but they have different schedules.
Two DAGs are dependent, but they are owned by different teams.
A task depends on another task but for a different execution date.

In this guide, you’ll review the methods for implementing cross-DAG dependencies, including how to implement dependencies if your dependent DAGs are located in different Airflow deployments.

Assumed knowledge

To get the most out of this guide, you should have an understanding of:

Dependencies in Airflow. See Managing Dependencies in Apache Airflow.
Airflow DAGs. See Introduction to Airflow DAGs.
Airflow operators. See Operators 101.
Airflow sensors. See Sensors 101.

Implement cross-DAG dependencies

There are multiple ways to implement cross-DAG dependencies in Airflow, including:

In this section, you’ll learn how and when you should use each method and how to view dependencies in the Airflow UI.

Asset dependencies

The most common way to define cross-DAG dependencies is by using assets. DAGs that access the same data can have explicit, visible relationships, and DAGs can be scheduled based on updates to this data.

You should use this method if you have a downstream DAG that should only run after an asset has been updated by an upstream DAG, especially if those updates are irregular. This type of dependency also provides you with increased observability into the dependencies between your DAGs and assets in the Airflow UI.

Using assets requires knowledge of the following scheduling concepts:

Producing task: A task that updates a specific asset, defined by its outlets parameter.
Consuming DAG: A DAG that runs as soon as a specific asset is updated.

Any task can be made into a producing task by providing one or more assets to the outlets parameter. For example:

1 from airflow.sdk import Asset 
2 from airflow.providers.standard.operators.empty import EmptyOperator 
3 
4 asset1 = Asset('asset1')
5 
6 # producing task in the upstream DAG
7 EmptyOperator(
8     task_id="producing_task",
9     outlets=[asset1]  # flagging to Airflow that asset1 was updated
10 )

The following downstream DAG is scheduled to run after asset1 has been updated by providing it to the schedule parameter.

1 from airflow.sdk import Asset, dag
2 
3 asset1 = Asset('asset1')
4 
5 # consuming DAG
6 @dag(schedule=[asset1])

See Assets and data-aware scheduling in Airflow to learn more.

TriggerDagRunOperator

The TriggerDagRunOperator is a straightforward method of implementing cross-DAG dependencies from an upstream DAG. This operator allows you to have a task in one DAG that triggers another DAG in the same Airflow environment. For more information about this operator, see TriggerDagRunOperator.

You can trigger a downstream DAG with the TriggerDagRunOperator from any point in the upstream DAG. If you set the operator’s wait_for_completion parameter to True, the upstream DAG pauses and then resumes only after the downstream DAG has finished running. This waiting process can be deferred to the triggerer by setting the parameter, deferrable, to True. This setting turns the operator into a deferrable operator, which increases Airflow’s scalability and can reduce cost.

A common use case for this implementation is when an upstream DAG fetches new testing data for a machine learning pipeline, runs and tests a model, and publishes the model’s prediction. In case of the model underperforming, the TriggerDagRunOperator is used to start a separate DAG that retrains the model while the upstream DAG waits. Once the model is retrained and tested by the downstream DAG, the upstream DAG resumes and publishes the new model’s results.

The schedule of the downstream DAG is independent of the runs triggered by the TriggerDagRunOperator. To run a DAG solely with the TriggerDagRunOperator, set the DAG’s schedule parameter to None. Note that the dependent DAG must be unpaused to get triggered.

The following example DAG implements the TriggerDagRunOperator to trigger a DAG with the dag_id dependent_dag between two other tasks. Since both the wait_for_completion and the deferrable parameters of the trigger_dependent_dag task in the trigger_dagrun_dag are set to True, the task is deferred until the dependent_dag has finished its run. Once the trigger_dagrun_dag task completes, the end_task will run.

Taskflow

1 from airflow.decorators import dag, task
2 from airflow.operators.trigger_dagrun import TriggerDagRunOperator
3 from pendulum import datetime, duration
4 
5 
6 @task
7 def start_task(task_type):
8     return f"The {task_type} task has completed."
9 
10 
11 @task
12 def end_task(task_type):
13     return f"The {task_type} task has completed."
14 
15 
16 # Default settings applied to all tasks
17 default_args = {
18     "owner": "airflow",
19     "depends_on_past": False,
20     "email_on_failure": False,
21     "email_on_retry": False,
22     "retries": 1,
23     "retry_delay": duration(minutes=5),
24 }
25 
26 
27 @dag(
28     start_date=datetime(2023, 1, 1),
29     max_active_runs=1,
30     schedule="@daily",
31     default_args=default_args,
32     catchup=False,
33 )
34 def trigger_dagrun_dag():
35     trigger_dependent_dag = TriggerDagRunOperator(
36         task_id="trigger_dependent_dag",
37         trigger_dag_id="dependent_dag",
38         wait_for_completion=True,
39         deferrable=True,  # Note that this parameter only exists in Airflow 2.6+
40     )
41 
42     start_task("starting") >> trigger_dependent_dag >> end_task("ending")
43 
44 
45 trigger_dagrun_dag()

Traditional

1 from airflow import DAG
2 from airflow.operators.python import PythonOperator
3 from airflow.operators.trigger_dagrun import TriggerDagRunOperator
4 from pendulum import datetime, duration
5 
6 
7 def print_task_type(**kwargs):
8     print(f"The {kwargs['task_type']} task has completed.")
9 
10 
11 # Default settings applied to all tasks
12 default_args = {
13     "owner": "airflow",
14     "depends_on_past": False,
15     "email_on_failure": False,
16     "email_on_retry": False,
17     "retries": 1,
18     "retry_delay": duration(minutes=5),
19 }
20 
21 with DAG(
22     dag_id="trigger_dagrun_dag",
23     start_date=datetime(2023, 1, 1),
24     max_active_runs=1,
25     schedule="@daily",
26     default_args=default_args,
27     catchup=False,
28 ) as dag:
29     start_task = PythonOperator(
30         task_id="start_task",
31         python_callable=print_task_type,
32         op_kwargs={"task_type": "starting"},
33     )
34 
35     trigger_dependent_dag = TriggerDagRunOperator(
36         task_id="trigger_dependent_dag",
37         trigger_dag_id="dependent_dag",
38         wait_for_completion=True,
39         deferrable=True,  # Note that this parameter only exists in Airflow 2.6+
40     )
41 
42     end_task = PythonOperator(
43         task_id="end_task",
44         python_callable=print_task_type,
45         op_kwargs={"task_type": "ending"},
46     )
47 
48     start_task >> trigger_dependent_dag >> end_task

If your dependent DAG requires a config input or a specific logical date, you can specify them in the operator using the conf and logical_date params respectively.

You can set skip_when_already_exists to True to keep the operator from attempting to trigger runs that have already occurred, and failing as a result. This can happen when trying to rerun DAGs and tasks. Additionally, you can set fail_when_dag_is_paused to fail the task instantiated with the TriggerDagRunOperator if the dependent DAG is paused.

ExternalTaskSensor

To create cross-DAG dependencies from a downstream DAG, consider using one or more ExternalTaskSensors. The downstream DAG will pause until a task is completed in the upstream DAG before resuming.

This method of creating cross-DAG dependencies is especially useful when you have a downstream DAG with different branches that depend on different tasks in one or more upstream DAGs. Instead of defining an entire DAG as being downstream of another DAG as you do with assets, you can set a specific task in a downstream DAG to wait for a task to finish in an upstream DAG.

For example, you could have upstream tasks modifying different tables in a data warehouse and one downstream DAG running one branch of data quality checks for each of those tables. You can use one ExternalTaskSensor at the start of each branch to make sure that the checks running on each table only start after the update to the specific table is finished.

You use the ExternalTaskSensor in deferrable mode using deferrable=True. For more info on deferrable operators and their benefits, see Deferrable Operators

The following example DAG uses three ExternalTaskSensors at the start of three parallel branches in the same DAG.

Taskflow

1 from airflow.decorators import dag, task
2 from airflow.sensors.external_task import ExternalTaskSensor
3 from airflow.operators.empty import EmptyOperator
4 from pendulum import datetime, duration
5 
6 
7 @task
8 def downstream_function_branch_1():
9     print("Upstream DAG 1 has completed. Starting tasks of branch 1.")
10 
11 
12 @task
13 def downstream_function_branch_2():
14     print("Upstream DAG 2 has completed. Starting tasks of branch 2.")
15 
16 
17 @task
18 def downstream_function_branch_3():
19     print("Upstream DAG 3 has completed. Starting tasks of branch 3.")
20 
21 
22 default_args = {
23     "owner": "airflow",
24     "depends_on_past": False,
25     "email_on_failure": False,
26     "email_on_retry": False,
27     "retries": 1,
28     "retry_delay": duration(minutes=5),
29 }
30 
31 
32 @dag(
33     start_date=datetime(2022, 8, 1),
34     max_active_runs=3,
35     schedule="*/1 * * * *",
36     catchup=False,
37 )
38 def external_task_sensor_taskflow_dag():
39     start = EmptyOperator(task_id="start")
40     end = EmptyOperator(task_id="end")
41 
42     ets_branch_1 = ExternalTaskSensor(
43         task_id="ets_branch_1",
44         external_dag_id="upstream_dag_1",
45         external_task_id="my_task",
46         allowed_states=["success"],
47         failed_states=["failed", "skipped"],
48     )
49 
50     task_branch_1 = downstream_function_branch_1()
51 
52     ets_branch_2 = ExternalTaskSensor(
53         task_id="ets_branch_2",
54         external_dag_id="upstream_dag_2",
55         external_task_id="my_task",
56         allowed_states=["success"],
57         failed_states=["failed", "skipped"],
58     )
59 
60     task_branch_2 = downstream_function_branch_2()
61 
62     ets_branch_3 = ExternalTaskSensor(
63         task_id="ets_branch_3",
64         external_dag_id="upstream_dag_3",
65         external_task_id="my_task",
66         allowed_states=["success"],
67         failed_states=["failed", "skipped"],
68     )
69 
70     task_branch_3 = downstream_function_branch_3()
71 
72     start >> [ets_branch_1, ets_branch_2, ets_branch_3]
73 
74     ets_branch_1 >> task_branch_1
75     ets_branch_2 >> task_branch_2
76     ets_branch_3 >> task_branch_3
77 
78     [task_branch_1, task_branch_2, task_branch_3] >> end
79 
80 
81 external_task_sensor_taskflow_dag()

Traditional

1 from airflow import DAG
2 from airflow.operators.python import PythonOperator
3 from airflow.sensors.external_task import ExternalTaskSensor
4 from airflow.operators.empty import EmptyOperator
5 from pendulum import datetime, duration
6 
7 
8 def downstream_function_branch_1():
9     print("Upstream DAG 1 has completed. Starting tasks of branch 1.")
10 
11 
12 def downstream_function_branch_2():
13     print("Upstream DAG 2 has completed. Starting tasks of branch 2.")
14 
15 
16 def downstream_function_branch_3():
17     print("Upstream DAG 3 has completed. Starting tasks of branch 3.")
18 
19 
20 default_args = {
21     "owner": "airflow",
22     "depends_on_past": False,
23     "email_on_failure": False,
24     "email_on_retry": False,
25     "retries": 1,
26     "retry_delay": duration(minutes=5),
27 }
28 
29 with DAG(
30     "external-task-sensor-dag",
31     start_date=datetime(2022, 8, 1),
32     max_active_runs=3,
33     schedule="*/1 * * * *",
34     catchup=False,
35 ) as dag:
36     start = EmptyOperator(task_id="start")
37     end = EmptyOperator(task_id="end")
38 
39     ets_branch_1 = ExternalTaskSensor(
40         task_id="ets_branch_1",
41         external_dag_id="upstream_dag_1",
42         external_task_id="my_task",
43         allowed_states=["success"],
44         failed_states=["failed", "skipped"],
45     )
46 
47     task_branch_1 = PythonOperator(
48         task_id="task_branch_1",
49         python_callable=downstream_function_branch_1,
50     )
51 
52     ets_branch_2 = ExternalTaskSensor(
53         task_id="ets_branch_2",
54         external_dag_id="upstream_dag_2",
55         external_task_id="my_task",
56         allowed_states=["success"],
57         failed_states=["failed", "skipped"],
58     )
59 
60     task_branch_2 = PythonOperator(
61         task_id="task_branch_2",
62         python_callable=downstream_function_branch_2,
63     )
64 
65     ets_branch_3 = ExternalTaskSensor(
66         task_id="ets_branch_3",
67         external_dag_id="upstream_dag_3",
68         external_task_id="my_task",
69         allowed_states=["success"],
70         failed_states=["failed", "skipped"],
71     )
72 
73     task_branch_3 = PythonOperator(
74         task_id="task_branch_3",
75         python_callable=downstream_function_branch_3,
76     )
77 
78     start >> [ets_branch_1, ets_branch_2, ets_branch_3]
79 
80     ets_branch_1 >> task_branch_1
81     ets_branch_2 >> task_branch_2
82     ets_branch_3 >> task_branch_3
83 
84     [task_branch_1, task_branch_2, task_branch_3] >> end

In this DAG:

ets_branch_1 waits for the my_task task of upstream_dag_1 to complete before moving on to execute task_branch_1.
ets_branch_2 waits for the my_task task of upstream_dag_2 to complete before moving on to execute task_branch_2.
ets_branch_3 waits for the my_task task of upstream_dag_3 to complete before moving on to execute task_branch_3.

These processes happen in parallel and are independent of each other. The graph view shows the state of the DAG after my_task in upstream_dag_1 has finished which caused ets_branch_1 and task_branch_1 to run. ets_branch_2 and ets_branch_3 are still waiting for their upstream tasks to finish.

ExternalTaskSensor 3 Branches

If you want the downstream DAG to wait for the entire upstream DAG to finish instead of a specific task, you can set the external_task_id to None. In the example above, you specified that the external task must have a state of success for the downstream task to succeed, as defined by the allowed_states and failed_states.

In the previous example, the upstream DAG (example_dag) and downstream DAG (external-task-sensor-dag) must have the same start date and schedule interval. This is because the ExternalTaskSensor will look for completion of the specified task or DAG at the same logical_date. To look for completion of the external task at a different date, you can make use of either of the execution_delta or execution_date_fn parameters (these are described in more detail in the documentation linked above).

Airflow API

The Airflow API is another way of creating cross-DAG dependencies. To use the API to trigger a DAG run, you can make a POST request to the DAGRuns endpoint.

The following script shows how to trigger a DAG run using the Airflow API, for cross-dag dependencies, you would run this code inside an @task decorated function in your upstream DAG.

1 import requests
2 
3 # Replace with your Airflow instance details
4 USERNAME = "admin"
5 PASSWORD = "admin"
6 HOST = "http://localhost:8080/"
7 
8 MY_DAG = "example_dag" # The id of the DAG you want to trigger
9 
10 def get_jwt_token():
11     token_url = f"{HOST}/auth/token"
12     payload = {"username": USERNAME, "password": PASSWORD}
13     headers = {"Content-Type": "application/json"}
14     response = requests.post(token_url, json=payload, headers=headers)
15 
16     token = response.json().get("access_token")
17     return token
18 
19 def run_dag(dag_id, logical_date=None):
20     event_payload = {"conf": {"param1": "Hello World"}, "logical_date": logical_date}
21     token = get_jwt_token()
22 
23     if token:
24         url = f"{HOST}/api/v2/dags/{dag_id}/dagRuns"
25         headers = {"Authorization": f"Bearer {token}"}
26         response = requests.post(url, json=event_payload, headers=headers)
27 
28         print(response.status_code)
29         print(response.json())
30     else:
31         raise Exception("Failed to get JWT token")
32 
33 if __name__ == "__main__":
34     run_dag(dag_id=MY_DAG, logical_date=None)

You can also update an asset using the API by making a POST request to the Assets endpoint.

Cross-deployment dependencies

To implement cross-DAG dependencies on two different Airflow environments on Astro, follow the guidance in Cross-deployment dependencies.