Configure tasks to run with the Kubernetes executor

The Kubernetes executor runs each Airflow task in a dedicated Kubernetes Pod. On Astro, you can customize these Pods on a per-task basis using a pod_override configuration. If a task doesn’t contain a pod_override configuration, it runs using the default Pod as configured in your Deployment resource settings.

This document describes how to configure individual task Pods for different use cases. To configure defaults for all Kubernetes executor task pods, see Configure Kubernetes Pod resources.

Prerequisites

An Astro Deployment using Astro Runtime version 8.1.0 or later.

If you use the Kubernetes executor on Astro, you can’t change the PYTHONPATH of your Astro project from its default value. If you do, the Kubernetes executor will be unable to read airflow_local_settings.py and will fail to start up new Pods.

Customize a task’s Kubernetes Pod

By default, Astro supports a maximum KubernetesExecutor Pod size of 43 vCPU and 86 GiB of memory. Contact Astro support if you need to run larger jobs on KubernetesExecutor.

While you can customize all values for a worker Pod, Astronomer does not recommend configuring complex Kubernetes infrastructure in your Pods, such as sidecars. These configurations have not been tested by Astronomer.

For each task running with the Kubernetes executor, you can customize its individual worker Pod and override the defaults used in Astro by configuring a pod_override file.

Add the following import to your dag file:

1 from kubernetes.client import models as k8s

Add a pod_override configuration to the dag file containing the task. See the kubernetes-client GitHub for a list of all possible settings you can include in the configuration.
Specify the pod_override in the task’s parameters.

See the following example of a pod_override configuration.

Example: Set CPU or memory limits and requests

You can request a specific amount of resources for a Kubernetes worker Pod so that a task always has enough resources to run successfully. When requesting resources, make sure that your requests don’t exceed the resource limits in your Deployment’s max pod size.

The following example shows how you can use a pod_override configuration in your dag code to request custom resources for a task:

1 import pendulum
2 import time
3 
4 from airflow.models.dag import DAG
5 from airflow.decorators import task
6 from airflow.operators.bash import BashOperator
7 from airflow.operators.python import PythonOperator
8 from airflow.example_dags.libs.helper import print_stuff
9 from kubernetes.client import models as k8s
10 
11 k8s_exec_config_resource_requirements = {
12     "pod_override": k8s.V1Pod(
13         spec=k8s.V1PodSpec(
14             containers=[
15                 k8s.V1Container(
16                     name="base",
17                     resources=k8s.V1ResourceRequirements(
18                         requests={"cpu": 0.5, "memory": "1024Mi", "ephemeral-storage": "1Gi"},
19                         limits={"cpu": 0.5, "memory": "1024Mi", "ephemeral-storage": "1Gi"}
20                     )
21                 )
22             ]
23         )
24     )
25 }
26 
27 with dag(
28     dag_id="example_kubernetes_executor_pod_override_sources",
29     schedule=None,
30     start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
31     catchup=False
32 ):
33     BashOperator(
34       task_id="bash_resource_requirements_override_example",
35       bash_command="echo hi",
36       executor_config=k8s_exec_config_resource_requirements
37     )
38 
39     @task(executor_config=k8s_exec_config_resource_requirements)
40     def resource_requirements_override_example():
41         print_stuff()
42         time.sleep(60)
43 
44     resource_requirements_override_example()

When this dag runs, it launches a Kubernetes Pod with exactly 0.5m of CPU and 1024Mi of memory, as long as that infrastructure is available in your Deployment. After the task finishes, the Pod terminates gracefully.

For Astro environments, if you set resource requests to be less than the maximum limit, Astro automatically requests the maximum limit that you set. This means that you might consume more resources than you expected if you set the limit much higher than the resource request you need. Check your Billing and usage to view your resource use and associated charges.

Use secret environment variables in worker Pods

On Astro Deployments, secret environment variable values are stored in a Kubernetes secret called env-secrets. These environment variables are available to your worker Pods, and you can access them in your tasks just like any other environment variable. For example, you can use os.environ[<your-secret-env-var-key>] or os.getenv(<your-secret-env-var-key>, None) in your dag code to access the variable value.

However, if you can’t use Python, or you are using a pre-defined code that expects specific keys for environment variables, you must pull the secret value from env-secrets and mount it to the Pod running your task as a new Kubernetes Secret.

Add the following import to your dag file:

1 from airflow.kubernetes.secret import Secret

Define a Kubernetes Secret in your dag instantiation using the following format:

1 secret_env = Secret(deploy_type="env", deploy_target="<VARIABLE_KEY>", secret="env-secrets", key="<VARIABLE_KEY>")
2 namespace = conf.get("kubernetes", "NAMESPACE")

Specify the Secret in the secret_key_ref section of your pod_override configuration.

In the task where you want to use the secret value, add the following task-level argument:

1 op_kwargs={
2         "env_name": secret_env.deploy_target
3 },

In the executable for the task, call the secret value using os.environ[env_name].

In the following example, a secret named MY_SECRET is pulled from env-secrets and printed to logs.

1 import pendulum
2 from kubernetes.client import models as k8s
3 
4 from airflow.configuration import conf
5 from airflow.kubernetes.secret import Secret
6 from airflow.models import DAG
7 from airflow.providers.cncf.kubernetes.operators.pod import KubernetesPodOperator
8 from airflow.operators.python import PythonOperator
9 
10 def print_env(env_name):
11     import os
12     print(os.environ[env_name])
13 
14 with dag(
15         dag_id='test-secret',
16         start_date=pendulum.datetime(2022, 1, 1, tz="UTC"),
17         end_date=pendulum.datetime(2022, 1, 5, tz="UTC"),
18         schedule="@once",
19 ) as dag:
20     secret_env = Secret(deploy_type="env", deploy_target="MY_SECRET", secret="env-secrets", key="MY_SECRET")
21     namespace = conf.get("kubernetes", "NAMESPACE")
22 
23     p = PythonOperator(
24         python_callable=print_env,
25         op_kwargs={
26             "env_name": secret_env.deploy_target
27         },
28         task_id='test-py-env',
29         executor_config={
30             "pod_override": k8s.V1Pod(
31                 spec=k8s.V1PodSpec(
32                     containers=[
33                         k8s.V1Container(
34                             name="base",
35                             env=[
36                                 k8s.V1EnvVar(
37                                     name=secret_env.deploy_target,
38                                     value_from=k8s.V1EnvVarSource(
39                                         secret_key_ref=k8s.V1SecretKeySelector(name=secret_env.secret,
40                                                                                key=secret_env.key)
41                                     ),
42                                 )
43                             ],
44                         )
45                     ]
46                 )
47             ),
48         }
49     )

Prerequisites

Customize a task’s Kubernetes Pod

Example: Set CPU or memory limits and requests

Use secret environment variables in worker Pods

See also