Set up disaster recovery | Astronomer Documentation

Set up DR on a new cluster

You can enable DR when creating a new dedicated cluster on AWS through the Astro UI. See Create a dedicated Astro cluster for configuration steps and field descriptions. After the cluster is created, complete the required steps after enabling DR before triggering a failover.

Terraform support for DR cluster creation are planned for general availability (GA).

DR cluster creation is also supported using the Astro API.

Set up DR on an existing cluster

To enable DR on an existing dedicated AWS cluster, submit a support request through the Astro UI. Astronomer processes the request during your specified maintenance window.

Prerequisites

Organization Owner role with organization.clusters.update permission
An AWS dedicated cluster that is not already DR-enabled

Submit a DR enablement request

Open the support request form

You can open the form in two ways:

From the Disaster Recovery tab: In the Astro UI, go to Organization Settings > Clusters, select your cluster, open the Disaster Recovery tab, then click Enable Disaster Recovery.
From the support menu: In the Astro UI, open Support and select Enable AWS Data Plane Disaster Recovery as the request type.

Configure the request

Complete the following fields:

Cluster: Select the AWS cluster to enable DR on. Only eligible clusters appear — the cluster must be an AWS dedicated cluster that isn’t already DR-enabled.
Failover Region: Select the AWS region for the secondary cluster.
DR VPC Subnet Range: (Optional) Specify a VPC subnet range for the secondary cluster. Leave blank to use the same range as the primary cluster.
DR Pod CIDR Range: (Optional) Specify a Pod CIDR range for the secondary cluster. Leave blank to use the same range as the primary cluster.
Task Logs Replication SLA: Enable to guarantee a 15-minute RPO for task logs. Additional charges apply. See Task Logs Replication SLA.
Maintenance Window: Select a date and time for the maintenance window. The date must be at least 5 days from today. Weekends are not available.
Additional Details: (Optional) Include any additional context or requirements.
CC Emails: (Optional) Add email addresses to copy on the support ticket.

Submit the request

Click Submit. Astronomer will confirm the maintenance window and contact you before beginning the conversion.

Running, scheduled, and event-triggered tasks are affected during the maintenance window. Tasks may fail and require a retry. Plan for 1-2 hours of downtime during the maintenance window.

The conversion process has two phases:

Database migration: Migrates the metadata database. This phase requires no downtime.
Infrastructure switch: Enables cross-region replication. This phase requires approximately 1-2 hours of maintenance downtime.

Enabling DR incurs additional Astro credits. Secondary cluster resources and data replication incur ongoing charges based on your cluster configuration.

Disable DR

Disabling DR deprovisions the secondary cluster and deletes all compute and data stores in the secondary region. This stops data replication and can’t be undone without re-enabling DR.

Disabling DR is also supported using the Astro API.

Prerequisites

Organization Owner role with organization.clusters.update permission
The cluster must be running from the primary region. You can’t disable DR on a cluster that has failed over to the secondary region.

Open the DR tab

In the Astro UI, go to Organization Settings > Clusters, select your DR-enabled cluster, and open the Disaster Recovery tab.

Disable DR

On the Disaster Recovery tab, click Disable. Alternatively, open the cluster’s actions menu (⋯) at the top right of the page and select Disable Disaster Recovery…. Enter the cluster name to confirm, then click Disable Disaster Recovery.

This action is irreversible without re-enabling DR. Ensure you no longer need cross-region failover capability before disabling DR.

Required steps after enabling DR

After Astronomer creates the secondary cluster, complete the following steps before triggering a failover:

Networking and DNS: Configure all required networking and DNS customizations for the secondary cluster. See Networking considerations.
imagePullSecrets: If your Deployments use Kubernetes Pod Operators (KPOs), configure imagePullSecrets on the secondary cluster. See Pull images from a private registry.
Customer-managed workload identity: If your Deployments use customer-managed workload identities, configure the appropriate workload identity and update IAM trust relationships for the secondary cluster. See Workload identity.
Dag logic: Update your Dag logic to handle the ASTRONOMER_IS_DR_ENV environment variable for secondary-specific connections or configurations. See Prepare Dags for disaster recovery.