Orchestrating Azure Container Instances with Airflow
IntegrationsAzureDAGs
Note: All code in this guide can be found in this Github repo.
Azure Container Instances (ACI) is one service that Azure users can leverage for working with containers. In this guide, we'll outline how to orchestrate ACI using Airflow and walk through an example DAG.
The easiest way to orchestrate Azure Container Instances with Airflow is to use the AzureContainerInstancesOperator. This operator starts a container on ACI, runs the container, and terminates the container when all processes are completed.
The only prerequisites for using this operator are:
This operator can also be used to run existing container instances and make certain updates, including the docker image, environment variables, or commands. Some updates to existing container groups are not possible with the operator, including CPU, memory, and GPU; those updates require deleting the existing container group and recreating it, which can be accomplished using the AzureContainerInstanceHook.
There are multiple ways to manage containers with Airflow on Azure. The most flexible and scalable method is to use the KubernetesPodOperator. This lets you run any container as a Kubernetes pod, which means you can pass in resource requests and other native Kubernetes parameters. Using this operator requires an AKS cluster (or a hand-rolled Kubernetes cluster).
If you are not running on AKS, ACI can be a great choice:
With these points in mind, we recommend using ACI with the AzureContainerInstancesOperator for testing or lightweight tasks that don't require scaling. For heavy production workloads, we recommend sticking with AKS and the KubernetesPodOperator.
Using Airflow to create and run an Azure Container Instance is straightforward: You first identify the Azure resource group you want to create the Azure Container Instance in (or create a new one), then ensure your Azure instance has a service principle with write access over that resource group. For more information on setting this up, refer to the Azure documentation.
Note: In Airflow 2.0, provider packages are separate from the core of Airflow. If you are running 2.0 with Astronomer, the
apache-airflow-providers-microsoft-azure
package is already included in our Astronomer Certified Image; if you are not using Astronomer you may need to install this package separately to use the hooks, operators, and connections described here. To learn more, read Airflow Docs on Provider Packages.
Next, create an Airflow connection with the type Azure Container Instance
. Specify your Client ID in the Login field, Client Secret in the Password field, and Tenant and Subscription IDs in the Extras field as json. It should look something like this:
Lastly, define a DAG using the AzureContainerInstancesOperator:
from airflow import DAG
from airflow.providers.microsoft.azure.operators.azure_container_instances import AzureContainerInstancesOperator
from datetime import datetime, timedelta
# Default settings applied to all tasks
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=1)
}
with DAG('azure_container_instances',
start_date=datetime(2020, 12, 1),
max_active_runs=1,
schedule_interval='@daily',
default_args=default_args,
catchup=False
) as dag:
opr_run_container = AzureContainerInstancesOperator(
task_id='run_container',
ci_conn_id='azure_container_conn_id',
registry_conn_id=None,
resource_group='adf-tutorial',
name='azure-tutorial-container',
image='hello-world:latest',
region='East US',
cpu=1,
memory_in_gb=1.5,
fail_if_exists=False
)
The parameters for the operator are:
None
Note that you can also provide the operator with other parameters such as environment variables, volumes, and a command as needed to run the container.
If we run this DAG, an ACI will spin up, run the container with the Hello World image, and spin down. If we look at the Airflow task log, we see the printout from the container has propagated to the logs:
From here we can build out our DAG as needed with any other dependent or independent tasks.
Do Airflow the easy way.