Version:

v0.10.0

Documentation

KubernetesPodOperator on Astronomer


The KubernetesPodOperator allows you to natively launch Kubernetes Pods in which to run a Docker container, all using the Kube Python Client to generate a Kubernetes API request. This allows Airflow to act as an orchestrator of your jobs, no matter the language they're written in.

Description

The KubePodOperator works the same way as the Docker Operator - all you need to do is supply a Docker image to run. Astronomer Cloud is a multi-tenant install of Astronomer Enterprise that sits on top of Google's Kubernetes Engine (GKE). As a Cloud customer, you do NOT need to provide the Kubernetes overhead. We'll take care of that for you. If you are Enterprise customer, everything is configured to run when you deploy the platform.

Note: The Docker Operator is NOT supported on Astronomer for security reasons (we'd have to expose the Docker socket through to containers with a mount and let an unmanaged container run on the host machine).

Usage on Astronomer

Make sure you are running Astronomer Airflow 1.10.x

If you're running Airflow 1.9, check out this forum post to upgrade.

Specify Parameters

You can import the Operator as you would any other plugin in its GitHub Contrib Folder

from airflow.contrib.operators.kubernetes_pod_operator import KubernetesPodOperator

Instantiate the operator based on your image and setup:

k = KubernetesPodOperator(
    namespace='astronomer-cloud-frigid-vacuum-0996',
    image="ubuntu:16.04",
    cmds=["bash", "-cx"],
    arguments=["echo", "10", "echo pwd"],
    labels={"foo": "bar"},
    name="airflow-test-pod",
    is_delete_pod_operator=True,
    in_cluster=True,
    task_id="task-two",
    get_logs=True)

For Astronomer Cloud, your namespace on Kubernetes will be astronomer-cloud-deployment_name (e.g. astronomer-cloud-frigid-vacuum-0996)

For Astronomer Enterprise, this would be be your base-namespace-deployment name (e.g. astronomer-frigid-vacuum-0996)

Set the in_cluster parameter to True in your code. This will tell your task to look inside the cluster for the Kubernetes config. In this setup, the workers are tied to a role with the right privileges in the cluster.

Set the is_delete_pod_operator parameter to True in your code. This will delete completed pod in the namespace as they finish, keeping Airflow below its resource quotas.

Add Resources to your Deployment on Astronomer

The KubernetesPodOperator will launch pods on resources allocated to it in the Extra Capacity section of your deployment's Configure page of the Astronomer UI. Pods will only run on the resources configured here. Adding Extra Capacity will increase your namespace's resource quotas so that Airflow has permissions to launch pods in the namespace.

For Extra Capacity, we recommend starting with 10AU, and scaling up from there as needed. If it's set to 0, you'll get a permissions error:

ERROR - Exception when attempting to create Namespace Pod.
Reason: Forbidden
"Failure","message":"pods is forbidden: User \"system:serviceaccount:astronomer-cloud-solar-orbit-4143:solar-orbit-4143-worker-serviceaccount\" cannot create pods in the namespace \"datarouter\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}

On Astronomer Cloud, the largest node a single pod can occupy is 13.01GB and 3.92 CPU. We'll be introducing larger options in Astronomer v0.9, so stay tuned.

On Enterprise, it will depend on the size of your underlying node pool.

Note: If you need your limit range increased, please contact your system admin if you are an Enterprise customer (or Astronomer if you are a cloud customer).

Using a Private Registry

By default, the KubePodOperator will look for images hosted publicly on Dockerhub. If you want to pull from a private registry, you'll have to create a dockerconfigjson using your existing Docker credentials. If you're an Astronomer Cloud customer, reach out to us and we can get this secret added for you.

For Enterprise customers, follow the official Kubernetes doc to add that secret to the right namespace.

Note: The KubernetesPodOperator doesn't support passing in image pull secrets until Airflow 1.10.2.

Local Testing

Follow our CLI doc on using Microk8s or Docker for Kubernetes.