Run the KubernetesPodOperator on Astro

The KubernetesPodOperator is one of the most customizable Apache Airflow operators. A task using the KubernetesPodOperator runs in a dedicated, isolated Kubernetes Pod that terminates after the task completes. To learn more about the benefits and usage of the KubernetesPodOperator, see the KubernetesPodOperator Learn guide.

On Astro, the infrastructure required to run the KubernetesPodOperator is built into every Deployment and is managed by Astronomer. Astro supports setting a default Pod configuration so that any task Pods without specific resource requests and limits cannot exceed your expected resource usage for the Deployment.

Some task-level configurations will differ on Astro compared to other Airflow environments. Use this document to learn how to configure individual task Pods for different use cases on Astro. To configure the default Pod resources for all KubernetesPodOperator Pods, see Configure Kubernetes Pod resources.

Known limitations

By default, Astro supports a maximum KubernetesPodOperator Pod size of 43 vCPU and 86 GiB of memory. Contact Astro support if you need to run larger jobs on KubernetesPodOperator.

Cross-account service accounts are not supported on Pods launched in an Astro cluster. To allow access to external data sources, you can provide credentials and secrets to tasks.
PersistentVolumes (PVs) are not supported on Pods launched in an Astro cluster.
You can’t use an image built for an ARM architecture in the KubernetesPodOperator. To build images using the x86 architecture on a Mac with an Apple chip, include the --platform flag in the FROM command of the Dockerfile that constructs your custom image. For example:
```
$ FROM --platform=linux/amd64 postgres:latest
```
If you use an ARM image, your KPO task will fail with the error: base] exec /usr/bin/psql: exec format error.

Prerequisites

An Astro project.
An Astro Deployment.

Set up the KubernetesPodOperator on Astro

The following snippet is the minimum configuration you’ll need to create a KubernetesPodOperator task on Astro:

1 from airflow.configuration import conf
2 from airflow.providers.cncf.kubernetes.operators.pod import KubernetesPodOperator
3 
4 namespace = conf.get("kubernetes", "NAMESPACE")
5 
6 KubernetesPodOperator(
7     namespace=namespace,
8     image="<your-docker-image>",
9     cmds=["<commands-for-image>"],
10     arguments=["<arguments-for-image>"],
11     labels={"<pod-label>": "<label-name>"},
12     name="<pod-name>",
13     task_id="<task-name>",
14     get_logs=True,
15     in_cluster=True,
16 )

For each instantiation of the KubernetesPodOperator, you must specify the following values:

namespace = conf.get("kubernetes", "NAMESPACE"): Every Deployment runs on its own Kubernetes namespace within a cluster. Information about this namespace can be programmatically imported as long as you set this variable.
image: This is the Docker image that the operator will use to run its defined task, commands, and arguments. Astro assumes that this value is an image tag that’s publicly available on Docker Hub. To pull an image from a private registry, see Pull images from a Private Registry.
in_cluster: If a Connection object is not passed to the KubernetesPodOperator’s kubernetes_conn_id parameter, specify in_cluster=True to run the task in the Deployment’s Astro cluster.

Known limitations

Prerequisites

Set up the KubernetesPodOperator on Astro

Related documentation