Debug an Astro Private Cloud upgrade
Use this guide when your Astro Private Cloud (APC) control plane or data plane Pods are not progressing to a healthy state after an upgrade.
Helm upgrade fails with patch conflict
- Cause: Environment variable ordering in the Houston Deployment changed between 0.37.x and 1.0, causing Helm’s strategic merge patch to fail.
- Symptoms: The upgrade fails with an error containing
cannot patch "astronomer-houston" with kind Deploymentand references to environment variable ordering. - Solution: Delete the Houston Deployment with
--cascade=orphanto preserve running Pods, then retry the Helm upgrade:
JetStream Pods stuck in Pending or CrashLoopBackOff
- Cause: You did not delete STAN volume claims or PVCs before migration.
- Solution: Delete existing PVCs for STAN:
Prisma migrate deploy or database migration job times out
- Cause: Database locks or concurrent transactions, often because of Prometheus or monitoring jobs.
- Solution:
- Check for locks:
- Terminate long-running transactions:
Houston Worker logs show NATS: connection timeout
- Cause: Houston tries to connect to STAN before JetStream is ready.
- Solution: Restart the Houston Worker Pod to reconnect to JetStream:
Airflow 3 Pods fail with ModuleNotFoundError: airflow.providers.cncf
- Cause: Cluster configuration override missing or not applied.
- Solution:
- Reapply the override function in the Airflow 3 migration guide.
- Ensure
minimumAstroRuntimeVersionis set to3.1-2or higher in the override config.
Airflow Deployments fail with Postgres connection errors after upgrade due to missing database name
- Cause: The
astronomer-bootstrapsecretconnectionstring does not include a database name suffix (for example,/mainon AWS RDS or/postgreson AKS), causing connection failures. Verify the correct database name by logging in to your database. - Solution:
- Patch the
astronomer-bootstrapsecret soconnectionends with/<database_name>, then run a Helm upgrade. - The secret is in the
astronomernamespace (or the namespace where you install APC). - In CP/DP mode, patch in both the Control Plane and Data Plane clusters.
- Patch the
Houston migration fails due to duplicate workspace labels
-
Cause: The upgrade includes a migration that adds a unique constraint to the
Workspace.labelcolumn. If your database contains workspaces with duplicate labels, the migration fails with Prisma errorP3009. Once the migration is marked as failed, all subsequent migrations are blocked, preventing Houston from starting. -
Symptoms: Houston database migration pods fail with
Error: P3009and the messagemigrate found failed migrations in the target database. Houston API and worker pods enter CrashLoopBackOff. -
Solution:
-
Connect to your Houston database and check for duplicate workspace labels:
-
Resolve the duplicates by renaming one of the workspace labels. Use the workspace ID as a suffix to guarantee uniqueness:
-
Verify the failed migration did not partially apply by checking whether the unique constraint already exists:
If the unique constraint on
labelis not present, delete the failed migration record so it can be retried. Back up your database before making this change. -
Re-run the Helm upgrade:
-
Partially applied database migration
- Cause: Prisma migration interrupted during the JetStream upgrade.
- Solution:
- Run migration manually:
- Verify the
ClusterandDeploymenttables exist.