Debug an Astro Private Cloud upgrade

Use this guide when your Astro Private Cloud (APC) control plane or data plane Pods are not progressing to a healthy state after an upgrade.

Helm upgrade fails with patch conflict

  • Cause: Environment variable ordering in the Houston Deployment changed between 0.37.x and 1.0, causing Helm’s strategic merge patch to fail.
  • Symptoms: The upgrade fails with an error containing cannot patch "astronomer-houston" with kind Deployment and references to environment variable ordering.
  • Solution: Delete the Houston Deployment with --cascade=orphan to preserve running Pods, then retry the Helm upgrade:
$kubectl delete deployment/<release-name>-houston -n astronomer --cascade=orphan
$helm upgrade -f values.yaml -n astronomer astronomer astronomer/astronomer --version 1.0.x

JetStream Pods stuck in Pending or CrashLoopBackOff

  • Cause: You did not delete STAN volume claims or PVCs before migration.
  • Solution: Delete existing PVCs for STAN:
1kubectl delete pvc -l app.kubernetes.io/name=stan

Prisma migrate deploy or database migration job times out

  • Cause: Database locks or concurrent transactions, often because of Prometheus or monitoring jobs.
  • Solution:
    • Check for locks:
    SELECT * FROM pg_locks pl JOIN pg_stat_activity psa ON pl.pid = psa.pid;
    • Terminate long-running transactions:
    SELECT pg_terminate_backend(pid)
    FROM pg_stat_activity
    WHERE state = 'active' AND now() - query_start > interval '5 minutes';

Houston Worker logs show NATS: connection timeout

  • Cause: Houston tries to connect to STAN before JetStream is ready.
  • Solution: Restart the Houston Worker Pod to reconnect to JetStream:
1kubectl rollout restart deploy/<release-name>-houston-worker

Airflow 3 Pods fail with ModuleNotFoundError: airflow.providers.cncf

  • Cause: Cluster configuration override missing or not applied.
  • Solution:
    • Reapply the override function in the Airflow 3 migration guide.
    • Ensure minimumAstroRuntimeVersion is set to 3.1-2 or higher in the override config.

Airflow Deployments fail with Postgres connection errors after upgrade due to missing database name

  • Cause: The astronomer-bootstrap secret connection string does not include a database name suffix (for example, /main on AWS RDS or /postgres on AKS), causing connection failures. Verify the correct database name by logging in to your database.
  • Solution:
    • Patch the astronomer-bootstrap secret so connection ends with /<database_name>, then run a Helm upgrade.
    • The secret is in the astronomer namespace (or the namespace where you install APC).
    • In CP/DP mode, patch in both the Control Plane and Data Plane clusters.
$# Namespace where APC is installed
$NAMESPACE=astronomer
$# Database name suffix; for AWS the default database is usually "main", for AKS "postgres"
$DB_NAME=main
$
$# Read current connection string, append database name, and update the secret
$CURRENT=$(kubectl -n "$NAMESPACE" get secret astronomer-bootstrap -o jsonpath='{.data.connection}' | base64 -d)
$NEW="${CURRENT%/}/$DB_NAME"
$kubectl -n "$NAMESPACE" patch secret astronomer-bootstrap --type=merge -p "{\"data\":{\"connection\":\"$(printf '%s' "$NEW" | base64 -w0)\"}}"
$
$# Apply the update with a Helm upgrade
$helm upgrade -f values.yaml -n "$NAMESPACE" astronomer astronomer/astronomer

Houston migration fails due to duplicate workspace labels

  • Cause: The upgrade includes a migration that adds a unique constraint to the Workspace.label column. If your database contains workspaces with duplicate labels, the migration fails with Prisma error P3009. Once the migration is marked as failed, all subsequent migrations are blocked, preventing Houston from starting.

  • Symptoms: Houston database migration pods fail with Error: P3009 and the message migrate found failed migrations in the target database. Houston API and worker pods enter CrashLoopBackOff.

  • Solution:

    1. Connect to your Houston database and check for duplicate workspace labels:

      1SET search_path TO "houston$default";
      2SELECT label, COUNT(*) FROM "Workspace" GROUP BY label HAVING COUNT(*) > 1;
    2. Resolve the duplicates by renaming one of the workspace labels. Use the workspace ID as a suffix to guarantee uniqueness:

      1UPDATE "Workspace" SET label = label || '-' || id
      2WHERE id = '<workspace-id-to-rename>';
    3. Verify the failed migration did not partially apply by checking whether the unique constraint already exists:

      1SELECT constraint_name FROM information_schema.table_constraints
      2WHERE table_schema = 'houston$default'
      3 AND table_name = 'Workspace'
      4 AND constraint_type = 'UNIQUE';

      If the unique constraint on label is not present, delete the failed migration record so it can be retried. Back up your database before making this change.

      1DELETE FROM "houston$default"."_prisma_migrations"
      2 WHERE migration_name = '20250918072802_added_unique_constraint_to_ws_label';
    4. Re-run the Helm upgrade:

      $helm upgrade -f values.yaml -n astronomer astronomer astronomer/astronomer --version 1.0.x

Partially applied database migration

  • Cause: Prisma migration interrupted during the JetStream upgrade.
  • Solution:
    • Run migration manually:
    1npx prisma migrate deploy
    • Verify the Cluster and Deployment tables exist.