Launch a Pod in an external cluster
Launch a Pod in an external cluster
If some of your tasks require specific resources such as a GPU, you might want to run them in a different cluster than your Airflow instance. In setups where both clusters are used by the same AWS, Azure or GCP account, you can manage separate clusters with roles and permissions.
Prerequisites
- A network connection between your Astro Deployment and your external cluster.
Setup
EKS cluster on AWS
AKS cluster on Azure
This example shows how to set up an EKS cluster on AWS and run a Pod on it from an Airflow instance where cross-account access is not available.
Step 1: Set up your external cluster
-
Create an EKS cluster IAM role with a unique name and add the following permission policies:
AmazonEKSWorkerNodePolicy
AmazonEKS_CNI_Policy
AmazonEC2ContainerRegistryReadOnly
Record the ARN of the new role, as it will be needed below.
-
Update the trust policy of this new role to include the workload identity of your Deployment. This step ensures that the role can be assumed by your Deployment.
-
If you don’t already have a cluster, create a new EKS cluster and assign the new role to it.
Step 2: Retrieve the KubeConfig file from the EKS cluster
-
Use a
KubeConfig
file to remotely connect to your new cluster. On AWS, you can run the following command to retrieve it:This command creates a new
KubeConfig
file calledmy_kubeconfig.yaml
. -
Ensure that the file below matches your generated KubeConfig. The newly generated KubeConfig must be edited to instruct the AWS IAM Authenticator for Kubernetes to assume your new IAM Role created in Step 1. Replace
<your assume role arn>
with the IAM Role ARN from Step 1.
Step 3: Create a Kubernetes cluster connection
Astronomer recommends creating a Kubernetes cluster connection because it’s more secure than adding an unencrypted kubeconfig
file directly to your Astro project.
- Convert the
kubeconfig
configuration you retrieved from your cluster to JSON format. - In either the Airflow UI or the Astro environment manager, create a new Kubernetes Cluster Connection connection. In the Kube config (JSON format) field, paste the
kubeconfig
configuration you retrieved from your cluster after converting it fromyaml
tojson
format. - Click Save.
You can now specify this connection in the configuration of any KubernetesPodOperator task that needs to access your external cluster.
Step 4: Install the AWS CLI in your Astro environment
To connect to your external EKS cluster, you need to install the AWS CLI in your Astro project.
-
Add the following to your
Dockerfile
to install the AWS CLI: -
Add the
unzip
package to yourpackages.txt
file to make theunzip
command available in your Docker container:
If you are working locally, you need to restart your Astro project to apply the changes.
Step 5: Configure your task
In your KubernetesPodOperator task configuration, ensure that you set cluster-context
and namespace
for your remote cluster. In the following example, the task launches a Pod in an external cluster based on the configuration defined in the k8s
connection.
Example dag
The following dag uses several classes from the Amazon provider package to dynamically spin up and delete Pods for each task in a newly created node group. If your remote Kubernetes cluster already has a node group available, you only need to define your task in the KubernetesPodOperator itself.
The example dag contains 5 consecutive tasks:
- Create a node group according to the user’s specifications (For the example that uses GPU resources).
- Use a sensor to check that the cluster is running correctly.
- Use the KubernetesPodOperator to run any valid Docker image in a Pod on the newly created node group on the remote cluster. The example dag uses the standard
Ubuntu
image to print “hello” to the console using abash
command. - Delete the node group.
- Verify that the node group has been deleted.