Guide

From Operators to DagRuns


How Work Gets Executed

Operators become Tasks

  • Operators contain the logic - but once they are added to a DAG file with a task_id and dag, they become tasks.
  • Being explicit, when an operator class is instantiated with a task_id and dag (along with its other settings) it becomes a task within a DAG.

Tasks become Task Instances

  • Once a series of tasks becomes bundled to the same DAG object, the DAG can be executed based on its schedule.
  • The scheduler "taps" the DAG and begins to execute the tasks depending on their dependencies.
  • Tasks that get executed have a execution_date and are now called task instances. These get logged in the metadata database.

DAGs become DAG Runs

  • DAGs that have run or are running (i.e. have an associated execution_date) are referred to as DAG Runs.
  • DAG Runs are logged in the metadata database with their corresponding states.
  • Tasks associated with a DAG Run are called task instances.

title

A Dag Run is an instantiation of a DAG object in time.

A Task Instance is an instantiation of a Task in time and in a DAG object.

States

States are used to keep track of how scheduled tasks and DAG Runs are doing. DAG Runs and tasks can have the following states:

DAG States


Running (Lime): The DAG is currently being executed.

Success (Green): The DAG executed successfully.

Failed (Red): The task or DAG failed.

Task States


None (Light Blue): No associated state. Syntactically - set as Python None.

Queued (Gray) : The task is waiting to be executed, set as queued.

Scheduled (Tan): The task has been scheduled to run.

Running (Lime): The task is currently being executed.

Failed (Red): The task failed.

Success (Green): The task executed successfully.

Skipped (Pink): The task has been skipped due to an upstream condition.

Shutdown (Blue): The task is up for retry.

Removed (Light Grey): The task has been removed.

Retry (Gold): The task is up for retry.

Upstream Failed (Orange): The task will not be run because of a failed upstream dependency.

title


Ready to run production-grade Airflow?

Astronomer is the easiest way to run Apache Airflow. Choose from a fully hosted Cloud option or an in-house Enterprise option and run a production-grade Airflow stack, including monitoring, logging, and first-class support.