Run the KubernetesPodOperator on Astro
The KubernetesPodOperator is one of the most customizable Apache Airflow operators. A task using the KubernetesPodOperator runs in a dedicated, isolated Kubernetes Pod that terminates after the task completes. To learn more about the benefits and usage of the KubernetesPodOperator, see the KubernetesPodOperator Learn guide.
On Astro, the infrastructure required to run the KubernetesPodOperator is built into every Deployment and is managed by Astronomer. Astro supports setting a default Pod configuration so that any task Pods without specific resource requests and limits cannot exceed your expected resource usage for the Deployment.
Some task-level configurations will differ on Astro compared to other Airflow environments. Use this document to learn how to configure individual task Pods for different use cases on Astro. To configure the default Pod resources for all KubernetesPodOperator Pods, see Configure Kubernetes Pod resources.
Known limitations
-
Cross-account service accounts are not supported on Pods launched in an Astro cluster. To allow access to external data sources, you can provide credentials and secrets to tasks.
-
PersistentVolumes (PVs) are not supported on Pods launched in an Astro cluster.
-
You can’t use an image built for an ARM architecture in the KubernetesPodOperator. To build images using the x86 architecture on a Mac with an Apple chip, include the
--platform
flag in theFROM
command of theDockerfile
that constructs your custom image. For example:If you use an ARM image, your KPO task will fail with the error:
base] exec /usr/bin/psql: exec format error
.
Prerequisites
- An Astro project.
- An Astro Deployment.
Set up the KubernetesPodOperator on Astro
The following snippet is the minimum configuration you’ll need to create a KubernetesPodOperator task on Astro:
For each instantiation of the KubernetesPodOperator, you must specify the following values:
namespace = conf.get("kubernetes", "NAMESPACE")
: Every Deployment runs on its own Kubernetes namespace within a cluster. Information about this namespace can be programmatically imported as long as you set this variable.image
: This is the Docker image that the operator will use to run its defined task, commands, and arguments. Astro assumes that this value is an image tag that’s publicly available on Docker Hub. To pull an image from a private registry, see Pull images from a Private Registry.in_cluster
: If a Connection object is not passed to theKubernetesPodOperator
’skubernetes_conn_id
parameter, specifyin_cluster=True
to run the task in the Deployment’s Astro cluster.