If some of your tasks require specific resources such as a GPU, you might want to run them in a different cluster than your Airflow instance. In setups where both clusters belong to the same Google Cloud project, you can manage separate clusters with roles and permissions.
This document shows how to configure a Google Kubernetes Engine (GKE) cluster on Google Cloud and run a Pod on it from an Airflow instance where cross-project access isn’t available.
To launch Pods in external clusters from a local Airflow environment, you must have valid authentication for the external cluster. For managed Kubernetes services from public cloud providers, authentication is federated through the native IAM service. To grant the Astro role permissions to launch Pods on your cluster, you can either include static credentials or use workload identity to authorize the Astro role to your cluster.
Follow Google Cloud’s documentation to prepare a GKE cluster that your Astro Deployment can authenticate to:
Create a GKE cluster if you don’t already have one.
Authorize your Astro Deployment to Google Cloud by following the Deployment workload identity setup.
Grant the service account IAM and Kubernetes RBAC permissions in the namespace where your KubernetesPodOperator tasks run.
At a minimum, provision the following permissions for your service account in your specified namespace:
container.clusters.getcontainer.events.listcontainer.pods.getcontainer.pods.getLogscontainer.pods.listcontainer.pods.createcontainer.pods.deletecontainer.pods.updateIf your Dag uses do_xcom_push=True, also grant the container.pods.exec permission.
To connect to your external GKE cluster, the gcloud CLI and the gke-gcloud-auth-plugin must be available inside your Astro Runtime image.
Add the following to your Dockerfile:
For production deployments, consider pinning the google-cloud-cli-gke-gcloud-auth-plugin version for build reproducibility, or using a multi-stage build with the google/cloud-sdk:slim image to copy only the plugin binary into your final image and reduce its size.
Add the following line to your requirements.txt to include the CNCF Kubernetes provider:
kubeconfig fileThe following sample Kubernetes kubeconfig file allows the Kubernetes command-line tool, kubectl, or other clients to connect to a remote Kubernetes cluster using Google Cloud workload identity for authentication.
Fetch the certificate-authority-data and cluster-endpoint fields from the GKE cluster details page or using the Google Cloud SDK.
kubeconfig fileTo use the kubeconfig file, create a new Kubernetes Airflow connection.
There are multiple ways to pass the kubeconfig file to your Airflow connection. If your kubeconfig file contains any sensitive information, Astronomer recommends storing it as JSON inside the connection, as described in the JSON format tab.
Convert the kubeconfig file to JSON format and paste it into the Kube config (JSON format) field in the connection configuration.
In your KubernetesPodOperator task, set kubernetes_conn_id to the connection you created, namespace to the namespace in your GKE cluster where the Pod should run, and in_cluster=False so that the operator uses the connection’s kubeconfig instead of looking for an in-cluster service account.