Run the KubernetesPodOperator on Astro
The KubernetesPodOperator is one of the most customizable Apache Airflow operators. A task using the KubernetesPodOperator runs in a dedicated, isolated Kubernetes Pod that terminates after the task completes. To learn more about the benefits and usage of the KubernetesPodOperator, see the KubernetesPodOperator Learn guide.
On Astro, the infrastructure required to run the KubernetesPodOperator is built into every Deployment and is managed by Astronomer. Astro supports setting a default Pod configuration so that any task Pods without specific resource requests and limits cannot exceed your expected resource usage for the Deployment.
Some task-level configurations will differ on Astro compared to other Airflow environments. Use this document to learn how to configure individual task Pods for different use cases on Astro. To configure the default Pod resources for all KubernetesPodOperator Pods, see Configure Kubernetes Pod resources.
Known limitations
-
Cross-account service accounts are not supported on Pods launched in an Astro cluster. To allow access to external data sources, you can provide credentials and secrets to tasks.
-
PersistentVolumes (PVs) are not supported on Pods launched in an Astro cluster.
-
You can’t use an image built for an ARM architecture in the KubernetesPodOperator. To build images using the x86 architecture on a Mac with an Apple chip, include the
--platform
flag in theFROM
command of theDockerfile
that constructs your custom image. For example:If you use an ARM image, your KPO task will fail with the error:
base] exec /usr/bin/psql: exec format error
.
Prerequisites
- An Astro project.
- An Astro Deployment.
Set up the KubernetesPodOperator on Astro
The following snippet is the minimum configuration you’ll need to create a KubernetesPodOperator task on Astro:
For each instantiation of the KubernetesPodOperator, you must specify the following values:
namespace = conf.get("kubernetes", "NAMESPACE")
: Every Deployment runs on its own Kubernetes namespace within a cluster. Information about this namespace can be programmatically imported as long as you set this variable.image
: This is the Docker image that the operator will use to run its defined task, commands, and arguments. Astro assumes that this value is an image tag that’s publicly available on Docker Hub. To pull an image from a private registry, see Pull images from a Private Registry.in_cluster
: If a Connection object is not passed to theKubernetesPodOperator
’skubernetes_conn_id
parameter, specifyin_cluster=True
to run the task in the Deployment’s Astro cluster.
Configure task-level Pod resources
Astro automatically allocates resources to Pods created by the KubernetesPodOperator. Unless otherwise specified in your task-level configuration, the amount of resources your task Pod can use is defined by your default Pod resource configuration. To further optimize your resource usage, Astronomer recommends specifying compute resource requests and limits for each task.
To do so, define a kubernetes.client.models.V1ResourceRequirements
object and provide that to the container_resources
argument of the KubernetesPodOperator. For example:
Applying the previous code example ensures that when this dag runs, it launches a Kubernetes Pod with exactly 800m of CPU and 3Gi of memory as long as that infrastructure is available in your Deployment. After the task finishes, the Pod will terminate gracefully.
Mount a temporary directory
To run a task run the KubernetesPodOperator that utilizes your Deployment’s ephemeral storage, mount an emptyDir volume to the KubernetesPodOperator. For example:
Run the current Airflow image
You can run your Deployment’s current Airflow image with the KubernetesPodOperator by using the environment variable, ASTRONOMER_AIRFLOW_IMAGE
. This environment variable allows you to run KPO tasks using the same Runtime image that your Deployment uses, in a way that won’t be affected by underlying cluster infrastructure changes that might change the base image repository URL. The ASTRONOMER_AIRFLOW_IMAGE
environment variable allows you to ensure that your Deployment retrieves the correct image URL.
The following code example shows how you can configure your KPO with ASTRONOMER_AIRFLOW_IMAGE
:
Run images from a private registry
By default, the KubernetesPodOperator expects to pull a Docker image that’s hosted publicly. If your images are hosted on the container registry native to your cloud provider, you can grant access to the images directly. Otherwise, if you are using any other private registry, you need to create a Kubernetes Secret containing credentials to the registry, then specify the Kubernetes Secret in your dag.
Private Registry
Amazon Elastic Container Registry (ECR)
Google Artifact Registry
Prerequisites
- An Astro project.
- An Astro Deployment.
- Access to a private Docker registry.
Step 1: Create a Kubernetes Secret
To run Docker images from a private registry on Astro, a Kubernetes Secret that contains credentials to your registry must be created. Injecting this secret into your Deployment’s namespace will give your tasks access to Docker images within your private registry.
By default, the KubernetesPodOperator looks for publicly hosted images. However, you can pull images from a private registry.
Retrieve a config.json
file that contains your Docker credentials by following the Docker documentation. The generated file looks similar to the following:
Submit a request to Astronomer support for creating a Kubernetes Secret to enable pulling images from private registries. Astronomer Support can provide you the necessary instructions on how to generate and securely send the credentials.
Step 2: Specify the Kubernetes Secret in your dag
Once Astronomer has added the Kubernetes secret to your Deployment, you will be notified and provided with the name of the secret.
After you receive the name of your Kubernetes secret from Astronomer, you can run images from your private registry by importing models
from kubernetes.client
and configuring image_pull_secrets
in your KubernetesPodOperator instantiation:
Use secret environment variables with the KubernetesPodOperator
Astro environment variables marked as secrets are stored in a Kubernetes secret called env-secrets
. To use a secret value in a task running on the Kubernetes executor, you pull the value from env-secrets
and mount it to the Pod running your task as a new Kubernetes Secret.
-
Add the following import to your dag file:
-
Define a Kubernetes
Secret
in your dag instantiation using the following format: -
Reference the key for the environment variable, formatted as
$VARIABLE_KEY
in the task using the KubernetesPodOperator.
In the following example, a secret named MY_SECRET
is pulled from env-secrets
and printed to logs.
Use the @task.kubernetes decorator
The @task.kubernetes
decorator provides a TaskFlow alternative to the traditional KubernetesPodOperator, which allows you to run a specified task in its own Kubernetes pod. Note that the Docker image provided to the @task.kubernetes
decorator’s image
parameter must support executing Python scripts in order to leverage the KubernetesPodOperator decorator.
Like regular @task
decorated functions, XComs can be passed to the Python script running in the dedicated Kubernetes pod. If do_xcom_push
is set to True
in the decorator parameters, the value returned by the decorated function is pushed to XCom.
Astronomer recommends using the @task.kubernetes
decorator instead of the KubernetesPodOperator when using XCom with Python scripts in a dedicated Kubernetes pod.