Trigger failover and failback

Trigger failover

Running, scheduled, and event-triggered tasks may be impacted during the failover window. Tasks may fail and require a retry.

1

Open the cluster details page

In the Astro UI, go to Organization Settings > Clusters and select your primary cluster. In the Disaster Recovery section, confirm the status badge shows Active: [region] (primary).

2

Initiate failover

Click the actions menu () in the Disaster Recovery section and select Failover to Secondary. Follow the prompts to confirm.

The secondary cluster is promoted to active, and all Deployments and data become available in the secondary cluster.

3

Validate Deployments

After failover completes, check the health and status of your Deployments — especially mission-critical ones. Validate that your Dags and tasks are running as expected and retry any failures if necessary.

Trigger failback

After the primary region recovers, you can fail back to the original primary cluster.

Running, scheduled, and event-triggered tasks may be impacted during the failback window. Tasks may fail and require a retry.

1

Open the cluster details page

In the Astro UI, go to Organization Settings > Clusters and select your original primary cluster. The Region field shows [region] (Failed over to [secondary region]), confirming the cluster has failed over.

2

Initiate failback

Click the actions menu () in the Disaster Recovery section and select Failback to Primary. Follow the prompts to confirm.

Universal Metrics Export in DR pairs

If you have Universal Metrics Export (UME) configured, the same UME configuration applies to both the primary and secondary clusters. Metrics exported from each cluster include a cloud_region attribute so you can distinguish data from each cluster in your metrics system.

After failover, update your UME settings if needed to reflect the new active cluster.