Set up disaster recovery
Set up DR on a new cluster
You can enable DR when creating a new dedicated cluster on AWS through the Astro UI. See Create a dedicated Astro cluster for configuration steps and field descriptions. After the cluster is created, complete the required steps after enabling DR before triggering a failover.
API and Terraform support for DR cluster creation are planned for general availability (GA).
Set up DR on an existing cluster
To enable DR on an existing dedicated AWS cluster, submit a support request through the Astro UI. Astronomer processes the request during your specified maintenance window.
Prerequisites
- Organization Owner role with
organization.clusters.updatepermission - An AWS dedicated cluster that is not already DR-enabled
Submit a DR enablement request
Open the support request form
You can open the form in two ways:
- From the cluster details page: In the Astro UI, go to Organization Settings > Clusters, select your cluster, then click Enable Disaster Recovery.
- From the support menu: In the Astro UI, open Support and select Enable AWS Data Plane Disaster Recovery as the request type.
Configure the request
Complete the following fields:
- Cluster: Select the AWS cluster to enable DR on. Only eligible clusters appear — the cluster must be an AWS dedicated cluster that isn’t already DR-enabled.
- Failover Region: Select the AWS region for the secondary cluster.
- DR VPC Subnet Range: (Optional) Specify a VPC subnet range for the secondary cluster. Leave blank to use the same range as the primary cluster.
- DR Pod CIDR Range: (Optional) Specify a Pod CIDR range for the secondary cluster. Leave blank to use the same range as the primary cluster.
- Task Logs Replication SLA: Enable to guarantee a 15-minute RPO for task logs. Additional charges apply. See Task Logs Replication SLA.
- Maintenance Window: Select a date and time for the maintenance window. The date must be at least 5 days from today. Weekends are not available.
- Additional Details: (Optional) Include any additional context or requirements.
- CC Emails: (Optional) Add email addresses to copy on the support ticket.
Running, scheduled, and event-triggered tasks are affected during the maintenance window. Tasks may fail and require a retry. Plan for 1-2 hours of downtime during the maintenance window.
The conversion process has two phases:
- Database migration: Migrates the metadata database. This phase requires no downtime.
- Infrastructure switch: Enables cross-region replication. This phase requires approximately 1-2 hours of maintenance downtime.
Enabling DR incurs additional Astro credits. Secondary cluster resources and data replication incur ongoing charges based on your cluster configuration.
Disable DR
Your cluster must be running out of the primary region to disable DR. To disable cross-region disaster recovery on an existing cluster, contact Astronomer support.
Required steps after enabling DR
After Astronomer creates the secondary cluster, complete the following steps before triggering a failover:
- Networking and DNS: Configure all required networking and DNS customizations for the secondary cluster. See Networking considerations.
- imagePullSecrets: If your Deployments use Kubernetes Pod Operators (KPOs), configure
imagePullSecretson the secondary cluster. See Pull images from a private registry. - Customer-managed workload identity: If your Deployments use customer-managed workload identities, configure the appropriate workload identity and update IAM trust relationships for the secondary cluster. See Workload identity.
- Dag logic: Update your Dag logic to handle the
ASTRONOMER_IS_DR_ENVenvironment variable for secondary-specific connections or configurations. See Prepare Dags for disaster recovery.