Using the BashOperator
The BashOperator is one of the most commonly used operators in Airflow. It executes bash commands or a bash script from within your Airflow DAG.
In this guide you’ll learn:
- When to use the BashOperator.
- How to use the BashOperator and
@task.bashdecorator. - How to use the BashOperator including executing bash commands and bash scripts.
- How to run scripts in non-Python programming languages using the BashOperator.
Assumed knowledge
To get the most out of this guide, you should have an understanding of:
- Airflow operators. See Operators 101.
- Airflow decorators. See Introduction to the TaskFlow API and Airflow decorators.
- Basic bash commands. See the Bash Reference Manual.
How to use the BashOperator and @task.bash decorator
The BashOperator is part of core Airflow and can be used to execute a single bash command, a set of bash commands, or a bash script ending in .sh. The @task.bash decorator can be used to create bash statements using Python functions and is available as of Airflow 2.9.
Traditional
Taskflow
The following parameters can be provided to the operator and decorator:
bash_command: Defines a single bash command, a set of commands, or a bash script to execute. This parameter is required.env: Defines environment variables in a dictionary for the bash process. By default, the defined dictionary overwrites all existing environment variables in your Airflow environment, including those not defined in the provided dictionary. To change this behavior, you can set theappend_envparameter. If you leave this parameter blank, the BashOperator inherits the environment variables from your Airflow environment.append_env: Changes the behavior of theenvparameter. If you set this toTrue, the environment variables you define inenvare appended to existing environment variables instead of overwriting them. The default isFalse.output_encoding: Defines the output encoding of the bash command. The default isutf-8.skip_on_exit_code: Defines which bash exit code should cause the BashOperator to enter askippedstate. The default is99.cwd: Changes the working directory where the bash command is run. The default isNoneand the bash command runs in a temporary directory.
The behavior of a BashOperator task is based on the status of the bash shell:
- Tasks succeed if the whole shell exits with an exit code of 0.
- Tasks are skipped if the exit code is 99 (unless otherwise specified in
skip_exit_code). - Tasks fail in case of all other exit codes.
Tip
If you expect a non-zero exit from a sub-command you can add the prefix
set -e;to your bash command to make sure that the exit is captured as a task failure.
Both the bash_command and the env parameter can accept Jinja templates. However, the input given through Jinja templates to bash_command is not escaped or sanitized. If you are concerned about potentially harmful user input you can use the setup shown in the BashOperator documentation.
When to use the BashOperator
The following are common use cases for the BashOperator and @task.bash decorator in Airflow DAGs:
- Creating and running bash commands based on complex Python logic.
- Running a single or multiple bash commands in your Airflow environment.
- Running a previously prepared bash script.
- Running scripts in a programming language other than Python.
- Running commands to initialize tools that lack specific operator support. For example Soda Core.
Example: Using Python to create bash commands
You can use @task.bash to create bash statements using Python functions. This decorator is especially useful when you want to run bash commands based on complex Python logic, including inputs from upstream tasks. The following example demonstrates how to use the @task.bash decorator to conditionally run different bash commands based on the output of an upstream task.
Example: Execute two bash commands using one BashOperator
The BashOperator can execute any number of bash commands separated by &&.
In this example, you run two bash commands in a single task:
echo Hello $MY_NAME!prints the environment variableMY_NAMEto the console.echo $A_LARGE_NUMBER | rev 2>&1 | tee $AIRFLOW_HOME/include/my_secret_number.txttakes the environment variableA_LARGE_NUMBER, pipes it to therevcommand which reverses any input, and saves the result in a file calledmy_secret_number.txtlocated in the/includedirectory. The reversed number will also be printed to the console.
The second command uses an environment variable from the Airflow environment, AIRFLOW_HOME. This is only possible because append_env is set to True.
It is also possible to use two separate BashOperators to run the two commands, which can be useful if you want to assign different dependencies to the tasks.
Example: Execute a bash script
The BashOperator can also be provided with a bash script (ending in .sh) to be executed.
For this example, you run a bash script which iterates over all files in the /include folder and prints their names to the console.
Make sure that your bash script (my_bash_script.sh in this example) is available to your Airflow environment. If you use the Astro CLI, you can make this file accessible to Airflow by placing it in the /include directory of your Astro project.
It is important to make the bash script executable by running the following command before making the script available to your Airflow environment:
If you use the Astro CLI, you can run this command before running astro dev start, or you can add the command to your project’s Dockerfile with the following RUN command:
Astronomer recommends running this command in your Dockerfile for production builds such as Astro Deployments or in production CI/CD pipelines.
After making the script available to Airflow, you only have to provide the path to the script in the bash_command parameter. Be sure to add a space character at the end of the filepath, or else the task will fail with a Jinja exception!
Example: Run a script in another programming language
Using the BashOperator is a straightforward way to run a script in a non-Python programming language in Airflow. You can run a script in any language that can be run with a bash command.
In this example, you run some JavaScript to query a public API providing the current location of the international Space Station. The query result is pushed to XCom so that a second task can extract the latitude and longitude information in a script written in R and print the data to the console.
The following setup is required:
- Install the JavaScript and R language packages at the OS level.
- Write a JavaScript file.
- Write a R script file.
- Make the scripts available to the Airflow environment.
- Execute the files from within a DAG using the BashOperator.
If you use the Astro CLI, the programming language packages can be installed at the OS level by adding them to the packages.txt file of your Astro project.
The following JavaScript file contains code for sending a GET request to the /iss-now path at api.open-notify.org and returning the results to stdout, which will both be printed to the console and pushed to XCom by the BashOperator.
The second task runs a script written in R that uses a regex to filter and print the longitude and latitude information from the API response.
To run these scripts using the BashOperator, ensure that they are accessible to your Airflow environment. If you use the Astro CLI, you can place these files in the /include directory of your Astro project.
The DAG uses the BashOperator to execute both files defined above sequentially.