Launch a Pod in an AKS cluster on Azure

If some of your tasks require specific resources such as a GPU, you might want to run them in a different cluster than your Airflow instance. In setups where both clusters are used by the same AWS, Azure or GCP account, you can manage separate clusters with roles and permissions.

To launch Pods in external clusters from a local Airflow environment, you must have valid authentication for the external cluster so that your local Airflow environment has permissions to launch a Pod in the external cluster. For managed Kubernetes services from public cloud providers, authentication is federated through the native IAM service. To grant the Astro role permissions to launch pods on your cluster, you can either include static credentials or use workload identity to authorize the Astro role to your cluster.

This example shows how to configure an Azure Managed Identity (MI) to run a Pod on an AKS cluster from an Airflow instance where cross-account access is not available.

Prerequisites

Setup

1

Set Up Azure Managed Identity

  1. Create a Microsoft Entra ID tenant with Global Administrator or Application Administrator privileges.
  2. Create a user-assigned managed identity on Azure.
  3. Authorize your Astro Deployment to Azure using Azure Managed Identity (MI) by following steps 1 and 2 described in the Deployment Workload identity set up.
  4. Confirm that the OIDC credentials appear in the Managed Identity’s Federated credentials tab.
  5. From the Managed Identity’s Properties tab, note the Client ID. From your Azure Portal, go to Azure Active Directory (Microsoft Entra ID) and note the Tenant ID. Both the Client ID and Tenant ID will be needed in Step 3 to configure your kubeconfig file.
2

Install dependencies in your Astro Runtime Docker Image

To trigger remote Pods on an Azure AKS Cluster, the following packages and dependencies need to be added to your Docker image.

  • Azure CLI
  • Kubectl
  • Kubelogin

To do so, add the following commands to your Dockerfile:

FROM quay.io/astronomer/astro-runtimeX.Y.Z
USER root
RUN curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
RUN az aks install-cli
USER astro
3

Configure your kubeconfig file

The following configuration file below is a sample Kubernetes kubeconfig file that allows the Kubernetes command-line tool, kubectl, or other clients to connect to a remote Kubernetes cluster, remote-kpo, using Azure Workload Identity for authentication.

1# Specifies the version of the Kubernetes API for this configuration file.
2# v1 is the standard version used for kubeconfig files.
3apiVersion: v1
4# List of Kubernetes clusters that the configuration can connect to.
5clusters:
6- cluster:
7 # base64-encoded certificate for the Kubernetes API server to verify SSL communication.
8 certificate-authority-data: <certificate>
9 # URL of the Kubernetes API server.
10 # This is the endpoint of the remote cluster you want to interact with.
11 server: <Azure server address>
12 # Name of the cluster, which is referenced in the contexts section.
13 name: <AKS cluster>
14# List of contexts that define which cluster and user combination to use when interacting with Kubernetes.
15contexts:
16# Describes the context for connecting to the cluster.
17- context:
18 # References the cluster from the clusters section.
19 cluster: <AKS cluster>
20 # Associates the user configuration to be used for authentication with the cluster.
21 user: <user>
22 # The name of the context, which is referenced by current-context.
23 name: <AKS cluster>
24# Specifies the active context that will be used by default when running kubectl commands.
25current-context: <AKS cluster>
26# Identifies the file type as a Kubernetes Config.
27kind: Config
28preferences: {}
29# List of users and the method they use for authentication.
30users:
31# Defines the user that is being used in the context.
32# This user is responsible for authenticating with the Kubernetes cluster.
33- name: <your user>
34 user:
35 exec:
36 apiVersion: client.authentication.k8s.io/v1beta1
37 args:
38 - get-token
39 - --login
40 - workloadidentity
41 - --tenant-id
42 - <Tenant ID from Step 1>
43 - --client-id
44 - <Client ID from Step 1>
45 # The server ID for Azure Kubernetes Service (AKS). This is a static ID representing AKS.
46 - --server-id
47 - 6dae42f8-4368-4678-94ff-3960e28e3630
48 # Specifies the path to the federated token that the managed identity uses to authenticate.
49 - --federated-token-file
50 - /var/run/secrets/azure/tokens/azure-identity-token
51 - --environment
52 - AzurePublicCloud
53 command: kubelogin
54 # Specifies if the kubelogin command should not attempt to provide additional cluster information beyond the authentication token.
55 provideClusterInfo: false
4

Create an Airflow Connection to use the kubeconfig file

To use the kubeconfig file, you will need to create a new Kubernetes Airflow Connection.

There are multiple ways to pass the kubeconfig file to your Airflow Connection. If your kubeconfig file contains any sensitive information, we recommend storing it as JSON inside the connection, described in option 3.

  1. External File in the default location If the kubeconfig file resides in the default location on the machine (~/.kube/config), you can leave all fields empty in the connection configuration. Airflow will automatically use the kubeconfig from the default location. Add the following COPY command at the end of your Dockerfile to add your kubeconfig file inside your Astro Runtime Docker Image.
COPY kubeconfig ~/.kube/config/airflow/kubeconfig
  1. External file with a Custom Path: You can specify a custom path to the kubeconfig file by inserting the path into the Kube config path field of your Airflow Connection. Add the following COPY command at the end of your Dockerfile to add your kubeconfig file inside your Astro Runtime Docker Image.
COPY kubeconfig /usr/local/airflow/kubeconfig
  1. JSON Format You can convert the kubeconfig file to JSON format and paste it into the Kube config (JSON format) field in the connection configuration. Use an online converter like https://jsonformatter.org/yaml-to-json to convert YAML to JSON. Remove any sensitive information first.
5

Configure your task

Run a Kubernetes Pod with Airflow KubernetesPodOperator.

1# import dag object
2from airflow.decorators import DAG
3
4# import the KubernetesPodOperator
5from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import (
6 KubernetesPodOperator,
7)
8from airflow.utils.dates import days_ago
9
10default_args = {
11 "owner": "Astronomer",
12 "depends_on_past": False,
13}
14
15# instantiate the dag
16with DAG(
17dag_id="remote_kpo",
18default_args=default_args,
19schedule_interval=None,
20start_date=days_ago(1),
21tags=["KPO"],
22):
23
24# launch a pod in the Kubernetes cluster
25remote_kpo = KubernetesPodOperator(
26 task_id="az_remote_kpo",
27 kubernetes_conn_id="<my-az-connection>",
28 namespace="<my-aks-namespace>",
29 image="debian",
30 cmds=["bash", "-cx"],
31 arguments=["echo", "hello world!"],
32 name="hello-world",
33 get_logs=True,
34 in_cluster=False,
35 )