Run the KubernetesPodOperator on Astronomer Software
The KubernetesPodOperator is one of the most customizable Apache Airflow operators. A task using the KubernetesPodOperator runs in a dedicated, isolated Kubernetes Pod that terminates after the task completes. To learn more about the benefits and usage of the KubernetesPodOperator, see the KubernetesPodOperator Learn guide.
This guide explains how to complete specific goals using the KubernetesPodOperator on Astronomer Software.
Prerequisites
- A running Airflow Deployment on Astronomer Software
Set Up the KubernetesPodOperator
Import the operator
-
Run the following command to install the
apache-airflow-providers-cncf-kubernetes
package: -
Run the following command to import the KubernetesPodOperator:
Specify parameters
Instantiate the operator based on your image and setup:
For each instantiation of the KubernetesPodOperator, you must specify the following values:
namespace = conf.get("kubernetes", "NAMESPACE")
: Every Deployment runs on its own Kubernetes namespace. Information about this namespace can be programmatically imported as long as you set this variable.image
: This is the Docker image that the operator will use to run its defined task, commands, and arguments. The value you specify is assumed to be an image tag that’s publicly available on Docker Hub. To pull an image from a private registry, read Pull images from a Private Registry.in_cluster=True
: When this value is set, your task will run within the cluster from which it’s instantiated. This ensures that the Kubernetes Pod running your task has the correct permissions within the cluster.is_delete_operator_pod=True
: This setting ensures that once a KubernetesPodOperator task is complete, the Kubernetes Pod that ran that task is terminated. This ensures that there are no unused pods in your cluster taking up resources.
Add resources to your Deployment on Astronomer
The KubernetesPodOperator is entirely powered by the resources allocated to the Extra Capacity
slider of your deployment’s Configure
page in the Software UI in lieu of needing a Celery worker (or scheduler resources for those running the Local Executor). Raising the slider will increase your namespace’s resource quota such that Airflow has permissions to successfully launch pods within your deployment’s namespace.
In terms of resource allocation, Astronomer recommends starting with 10AU in Extra Capacity
and scaling up from there as needed. If it’s set to 0, you’ll get a permissions error:
On Astronomer Software, the largest node a single pod can occupy is dependent on the size of your underlying node pool.
Define resources per task
A notable advantage of leveraging Airflow’s KubernetesPodOperator is that you can control compute resources in the task definition.
executor_config
parameter. In this case, the executor_config
would only define the Airflow worker that is launching your Kubernetes task.Example Task Definition:
In the example above, the resources are defined by building the following V1ResourceRequirements
object:
This object allows you to specify Memory and CPU requests and limits for any given task and its corresponding Kubernetes Pod. For more information, read Kubernetes Documentation on Requests and Limits.
Once you’ve created the object, apply it to the resources
parameter of the task. When this DAG runs, it will launch a Pod that runs the hello-world
image, which is pulled from Docker Hub, in your Airflow Deployment’s namespace with the resource requests defined above. Once the task finishes, the Pod will be gracefully terminate.
requests={"cpu": "100m", "memory": "384Mi"}, limits={"cpu": "100m", "memory": "384Mi"}
.Pulling images from a private registry
By default, the KubernetesPodOperator will look for images hosted publicly on Docker Hub. If you want to pull images from a private registry, you may do so.
To pull images from a private registry on Astronomer Software:
-
Retrieve a
config.json
file that contains your Docker credentials by following the Docker documentation. The generated file should look something like this: -
Follow the Kubernetes documentation to create a secret based on your credentials.
-
In your DAG code, import
models
fromkubernetes.client
and specifyimage_pull_secrets
with your Kubernetes secret. After configuring this value, you can pull an image as you would from a public registry like in the following example.
Local testing
Astronomer recommends testing your DAGs locally before pushing them to a Deployment on Astronomer. For more information, read How to run the KubernetesPodOperator locally. That guide provides information on how to use MicroK8s or Docker for Kubernetes to run tasks with the KubernetesPodOperator in a local environment.