Operators are the building blocks of Airflow DAGs. They contain the logic of how data is processed in a pipeline. Each task in a DAG is defined by instantiating an operator.
There are many different types of operators available in Airflow. Some operators such as Python functions execute general code provided by the user, while other operators perform very specific actions such as transferring data from one system to another.
In this guide, you’ll learn the basics of using operators in Airflow and then implement them in a DAG.
To view all of the available Airflow operators, go to the Airflow Registry.
To get the most out of this guide, you should have an understanding of:
Operators are Python classes that encapsulate logic to do a unit of work. They can be viewed as a wrapper around each unit of work that defines the actions that will be completed and abstract the majority of code you would typically need to write. When you create an instance of an operator in a DAG and provide it with its required parameters, it becomes a task.
All operators inherit from the abstract BaseOperator class, which contains the logic to execute the work of the operator within the context of a DAG.
The following are some of the most frequently used Airflow operators:
Operators typically only require a few parameters. Keep the following considerations in mind when using Airflow operators:
PythonOperator and BashOperator. These operators are automatically available in your Airflow environment. All other operators are part of provider packages, which you must install separately. For example, the SnowflakeOperator is part of the Snowflake provider package.The following example shows how to use multiple operators in a DAG to transfer data from Amazon S3 to Redshift and perform data quality checks.
The code for this example is available in the Astronomer Registry.
The following operators are used in this example:
@task decorator. Included with Airflow.There are a few things to note about the operators in this example DAG:
task_id. This is a required parameter, and the value provided is displayed as the name of the task in the Airflow UI. In Airflow 2.9 and later you can override the task name in the UI using the task_display_name, which allows special characters.sql parameter for the SQL script to be executed, and the S3ToRedshiftOperator has parameters to define the location and keys of the files being copied from Amazon S3 and the Redshift table receiving the data.conn_id, postgres_conn_id, and aws_conn_id all point to the names of the relevant connections stored in Airflow.The following code shows how each of the operators is instantiated in a DAG file to define the pipeline:
The resulting DAG appears similar to this image:
