Version:

v0.10.0

Documentation

Migrating An Airflow Deployment to Astronomer


The Metadata Database

To migrate over a pre-existing Airflow deployment into Astronomer, the metadata needs to be moved from the old deployment into one in Astronomer.

The Airflow metadata db stores all the history of DagRuns, Task Instances, and all other metadata associated with your Airflow instance (Connections, Varaibles, Xcoms, and more). This will give you the stability and features of running on Astronomer Enterprise, but with all data from your old instance.

Prerequisites

Before starting, ensure that:

  • Both versions of Airflow are the same version. This will ensure that the schema for the metadata is the same. The steps below are not guaranteed to work if this is not true.
  • There is enough room on your Astronomer cluster for a new deployment. This migration will only work for a new Airflow deployment on Astronomer, not a pre-existing one.
  • The Airflow db you are migrating from is using Postgres as the backend.
  • You have the fernet key for the old Airflow deployment
  • You have kubectl access to the Astronomer cluster.
  • Turn off all DAGs on the old Airflow deployment

Output old data

Grab all metadata from the old Airflow deployment (with your credentials subbed in):

pg_dump --host={host} --dbname={dbname} --schema=airflow --data-only --blobs --username={username} --file={pg_dump_file_path} --table=connection --table=dag --table=dag_pickle --table=dag_run --table=dag_stats --table=import_error --table=known_event --table=sla_miss --table=slot_pool --table=task_fail --table=task_instance --table=task_reschedule --table=variable --table=xcom

The file the command outputs will be used later.

Note: This does not migrate the Users table.

Create a new deployment

On your Astronomer cluster, create a new Airflow deployment. This will ultimately be the deployment your data gets migrated into. Once the deployment is created, run:

Swap the Fernet Key

:~$ kubectl get secrets -n <deployment_namespace>
NAME                                                        TYPE                                  DATA   AGE
amateur-sunspot-5075-airflow-metadata                       Opaque                                1      6d
amateur-sunspot-5075-airflow-result-backend                 Opaque                                1      6d
amateur-sunspot-5075-broker-url                             Opaque                                1      6d
amateur-sunspot-5075-elasticsearch                          Opaque                                1      6d
amateur-sunspot-5075-env                                    Opaque                                0      6d
amateur-sunspot-5075-fernet-key                             Opaque                                1      6d
amateur-sunspot-5075-pgbouncer-config                       Opaque                                2      6d
amateur-sunspot-5075-pgbouncer-stats                        Opaque                                1      6d
amateur-sunspot-5075-redis-password                         Opaque                                1      6d
amateur-sunspot-5075-registry                               kubernetes.io/dockerconfigjson        1      6d
amateur-sunspot-5075-scheduler-serviceaccount-token-dt7jr   kubernetes.io/service-account-token   3      6d
amateur-sunspot-5075-worker-serviceaccount-token-577cf      kubernetes.io/service-account-token   3      6d
default-token-cq752                                         kubernetes.io/service-account-token   3      6d
excited-armadillo-houston-jwt-certificate                   Opaque                                2      6d

The fernet key generated by Astronomer needs to be replaced with the fernet key from the other Airflow instance.

kubectl edit secret {new deployment name}-fernet-key -o yaml

apiVersion: v1
data:
  fernet-key: UmxaNGNFRTNiRkpaYmxaWllWRmliRTF1VkRaQk1rWkthRlZZYkRoQ1NsWT0=
kind: Secret
metadata:
  annotations:
    helm.sh/hook: pre-install
    helm.sh/hook-delete-policy: before-hook-creation
    helm.sh/hook-weight: "0"
  creationTimestamp: "2019-08-02T15:54:33Z"
  labels:
    chart: airflow
    heritage: Tiller
    release: amateur-sunspot-5075
    workspace: cjx0ygngg000y0a54bor7v2ww
  name: amateur-sunspot-5075-fernet-key
  namespace: astro-amateur-sunspot-5075
  resourceVersion: "136939754"
  selfLink: /api/v1/namespaces/astro-amateur-sunspot-5075/secrets/amateur-sunspot-5075-fernet-key
  uid: d2070031-b53d-11e9-8fcd-42010a960151
type: Opaque

Swap the fernet in the secret with the fernet key from the deployment. This will ensure that the Astronomer deployment will be able to decrypt your Connections and other encrypted fields form the pre-existing Airflow.

Connect to a deployment database.

To find the connection credentials to the metadatadb for the new Astronomer deployment, find the airflow-metadata secret from the list of secrets from above:

:~$ kubectl get secret <deployment-name>-airflow-metadata -o yaml -n <namespace>
apiVersion: v1
data:
  connection: <encrpyted_connection>
kind: Secret
metadata:
  creationTimestamp: "2019-08-02T15:54:33Z"
  labels:
    chart: airflow
    heritage: Tiller
    release: amateur-sunspot-5075
    workspace: cjx0ygngg000y0a54bor7v2ww
  name: amateur-sunspot-5075-airflow-metadata
  namespace: astro-amateur-sunspot-5075
  resourceVersion: "136939769"
  selfLink: /api/v1/namespaces/astro-amateur-sunspot-5075/secrets/amateur-sunspot-5075-airflow-metadata
  uid: d219125f-b53d-11e9-8fcd-42010a960151
type: Opaque

Grab the encrypted data in connection and decrpyt it:

:~$ echo <encrypted_connection> | base64 --decode
<decrypted_secret>

This will output as user:password@host:port/db. You can also connect to this database with the admin password on Postgres.

Migrate the data

Once you are connected, restore the data by pointing to the file that was generated with the pg_dump command.

psql -d {dbname} -c "TRUNCATE TABLE airflow.connection;"
psql {dbname} < <pg_dump_file>

Finally, force the pods to restart:

kubectl delete --all pods --namespace=<deployment_namespace>

Now you can turn on your DAGs in the Astronomer deployment and they will pickup right where they left off!


Content Navigator

The Metadata Database