Customize your image on Astronomer Software
The Astro CLI is intended to make it easier to develop with Apache Airflow, whether you're developing on your local machine or deploying code to Astronomer. The following guidelines describe a few of the methods you can use to customize the Docker Image that gets pushed to Airflow every time you rebuild your image locally using $ astro dev start
or deploy to Astronomer using $ astro deploy
.
More specifically, this doc includes instructions for how to:
- Add Python and OS-level packages
- Add dependencies
- Run commands on build
- Access the Airflow CLI
- Add environment variables locally
- Build from a private repository
Note: The guidelines below assume that you've initialized a project on Astronomer via
$ astro dev init
. If you haven't done so already, refer to our "CLI Quickstart" doc.
Add Python and OS-level dependencies
To add Python packages to an Airflow Deployment, add the packages to the Deployment requirements.txt
file. To add OS-level packages to an Airflow Deployment, add them to the Deployment packages.txt
file. The requirements.txt
and packages.txt
files were automatically generated after running astro dev init
to initialize your local Astro project.
Add Python dependencies
To add all Python packages to an Airflow Deployment, add the packages to the Deployment requirements.txt
file.
To pin a version of that package, use the following syntax:
<package-name>==<version>
If you'd like to exclusively use Pymongo 3.7.2, for example, you'd add the following in your requirements.txt
:
pymongo==3.7.2
If you do not pin a package to a version, the latest version of the package that's publicly available will be installed by default.
Add OS-level dependencies
To add OS-level packages to an Airflow Deployment, add them to the Deployment packages.txt
file.
Rebuild your image
Once you've saved those packages in your text editor or version control tool, rebuild your image by running:
astro dev stop
followed by
astro dev start
This process stops your running Docker containers and restarts them with your updated image.
Confirm your package was installed (Optional)
If you added pymongo
to your requirements.txt
file, for example, you can confirm that it was properly installed by running a $ docker exec
command into your scheduler.
- Run
$ docker ps
to identify the 3 running docker containers on your machine - Grab the container ID of your scheduler container
- Run the following:
docker exec -it <scheduler-container-id> pip freeze | grep pymongo
pymongo==3.7.2
Add other dependencies
In the same way you can build Python and OS-level packages into your image, you're free to build additional dependencies and files for your DAGs to use.
In the example below, we'll add a folder of helper_functions
with a file (or set of files) that our Airflow DAGs can then use.
Add the folder into your project directory
.
├── airflow_settings.yaml
├── dags
│ └── example-dag.py
├── Dockerfile
├── helper_functions
│ └── helper.py
├── include
├── packages.txt
├── plugins
│ └── example-plugin.py
└── requirements.txt
Rebuild your image
Follow the instructions in the "Rebuild your Image" section above.
Confirm your files were added (Optional)
Similar to the pymongo
example above, you can confirm that helper.py
was properly built into your image by running a $ docker exec
command into your scheduler.
- Run
$ docker ps
to identify the 3 running docker containers on your machine - Grab the container ID of your scheduler container
- Run the following:
docker exec -it <scheduler-container-id> /bin/bash
bash-4.4$ ls
Dockerfile airflow_settings.yaml helper_functions logs plugins unittests.cfg
airflow.cfg dags include packages.txt requirements.txt
Notice that helper_functions
folder has been built into your image.
Configure airflow_settings.yaml
When you first initialize a new Astro project on Astronomer, a file titled airflow_settings.yaml
will be automatically generated. With this file you can configure and programmatically generate Airflow connections, Pools, and Variables when you're developing locally.
For security reasons, the airflow_settings.yaml
file is currently only for local development and should not be used for pushing up code to Astronomer via $ astro deploy
. For the same reason, we'd recommend adding this file to your .gitignore
.
If you're interested in programmatically managing Airflow connections, Variables or environment variables on Astronomer Software, we recommend integrating a Secret Backend.
Add Airflow connections, pools, variables
By default, the airflow_settings.yaml
file includes the following template:
airflow:
connections: ## conn_id and conn_type are required
- conn_id: my_new_connection
conn_type: postgres
conn_host: 123.0.0.4
conn_schema: airflow
conn_login: user
conn_password: pw
conn_port: 5432
conn_extra:
pools: ## pool_name, pool_slot, and pool_description are required
- pool_name: my_new_pool
pool_slot: 5
pool_description:
variables: ## variable_name and variable_value are required
- variable_name: my_variable
variable_value: my_value
Make sure to specify all required fields that correspond to the objects you create. If you don't specify them, you will see a build error on $ astro dev start
.