Debug an Astro Private Cloud 2.0 upgrade

Use this guide when your Astro Private Cloud (APC) control plane or data plane Pods aren’t progressing to a healthy state after upgrading to 2.0.

Helm upgrade fails with patch conflict

Cause: Environment variable ordering in the APC Deployment changed between versions, causing Helm’s strategic merge patch to fail.
Symptoms: The upgrade fails with an error containing cannot patch "astronomer-houston" with kind Deployment and references to environment variable ordering.
Solution: Delete the APC Deployment with --cascade=orphan to preserve running Pods, then retry the Helm upgrade:

$ kubectl delete deployment/<release-name>-houston -n astronomer --cascade=orphan
$ helm upgrade -f migrated-values.yaml -n astronomer astronomer astronomer/astronomer --version 2.0.x

APC API crashes with “Configuration property isn’t defined”

Cause: The APC Docker image contains an older default.yaml configuration file that doesn’t include keys added in 2.0. If the Helm ConfigMap doesn’t explicitly set these new keys, the APC API fails at startup when it tries to read them.
Symptoms: APC Pods enter CrashLoopBackOff with errors like:

Error: Configuration property "deployments.runtimeManagement.astroRuntimeReleasesFile" is not defined

Solution: Ensure the APC Docker image matches the chart version you are deploying. If you built a custom APC Docker image, rebuild it with the 2.0 codebase that includes the updated config/default.yaml. Then restart the APC Pods:

$ kubectl rollout restart deploy/<release-name>-houston

JetStream Pods stuck in Pending or CrashLoopBackOff

Cause: STAN volume claims or PVCs weren’t deleted before the upgrade (applies when upgrading from 0.37.x).
Solution: Delete existing PVCs for STAN:

$ kubectl delete pvc -l app.kubernetes.io/name=stan

Prisma migrate deploy or database migration job times out

Cause: Database locks or concurrent transactions, often because of Prometheus or monitoring jobs.

Solution:

Check for locks:

1 SELECT * FROM pg_locks pl JOIN pg_stat_activity psa ON pl.pid = psa.pid;

Terminate long-running transactions:

1 SELECT pg_terminate_backend(pid)
2 FROM pg_stat_activity
3 WHERE state = 'active' AND now() - query_start > interval '5 minutes';

APC Worker logs show NATS: connection timeout

Cause: The APC API tries to connect to STAN before JetStream is ready (applies when upgrading from 0.37.x).
Solution: Restart the APC Worker Pod to reconnect to JetStream:

$ kubectl rollout restart deploy/<release-name>-houston-worker

Airflow 3 Pods fail with ModuleNotFoundError: airflow.providers.cncf

Cause: Cluster configuration override missing or not applied.
Solution:
- Reapply the override function in the Airflow 3 migration guide.
- Ensure minimumAstroRuntimeVersion is set to 3.1-2 or higher in the override config.

Airflow Deployments fail with Postgres connection errors after upgrade due to missing database name

Cause: The astronomer-bootstrap secret connection string doesn’t include a database name suffix (for example, /main on AWS RDS or /postgres on AKS), causing connection failures. Verify the correct database name by logging in to your database.
Solution:
- Patch the astronomer-bootstrap secret so connection ends with /<database_name>, then run a Helm upgrade.
- The secret is in the astronomer namespace (or the namespace where you install APC).
- In control plane/data plane (CP/DP) mode, patch the secret in both the control plane and data plane clusters.

$ # Namespace where APC is installed
$ NAMESPACE=astronomer
$ # Database name suffix; for AWS the default database is usually "main", for AKS "postgres"
$ DB_NAME=main
$ 
$ # Read current connection string, append database name, and update the secret
$ CURRENT=$(kubectl -n "$NAMESPACE" get secret astronomer-bootstrap -o jsonpath='{.data.connection}' | base64 -d)
$ NEW="${CURRENT%/}/$DB_NAME"
$ kubectl -n "$NAMESPACE" patch secret astronomer-bootstrap --type=merge -p "{\"data\":{\"connection\":\"$(printf '%s' "$NEW" | base64 -w0)\"}}"
$ 
$ # Apply the update with a Helm upgrade
$ helm upgrade -f values.yaml -n "$NAMESPACE" astronomer astronomer/astronomer

APC API migration fails due to duplicate workspace labels

Cause: The upgrade includes a migration that adds a unique constraint to the Workspace.label column. If your database contains workspaces with duplicate labels, the migration fails with Prisma error P3009. Once the migration is marked as failed, all subsequent migrations are blocked, preventing the APC API from starting.
Symptoms: APC database migration Pods fail with Error: P3009 and the message migrate found failed migrations in the target database. APC API and worker Pods enter CrashLoopBackOff.
Solution:

Back up the APC database before you run DELETE on _prisma_migrations or rely on a retry of this migration.

Check for duplicate workspace labels

Connect to your APC database and check for duplicate workspace labels:

1 SET search_path TO "houston$default";
2 SELECT label, COUNT(*) FROM "Workspace" GROUP BY label HAVING COUNT(*) > 1;

Resolve the duplicate labels

Rename one of the duplicate workspace labels, using the workspace ID as a suffix to guarantee uniqueness:

1 UPDATE "Workspace" SET label = label || '-' || id
2 WHERE id = '<workspace-id-to-rename>';

Check whether the migration partially applied

Before clearing the failed Prisma record, check whether the unique constraint was applied:

1 SELECT constraint_name FROM information_schema.table_constraints
2 WHERE table_schema = 'houston$default'
3   AND table_name = 'Workspace'
4   AND constraint_type = 'UNIQUE';

If the unique constraint on label isn’t listed, the migration didn’t complete. After fixing duplicate labels, you can clear the failed migration and retry. Back up your database first, then delete only that failed migration row:

1 DELETE FROM "houston$default"."_prisma_migrations"
2   WHERE migration_name = '20250918072802_added_unique_constraint_to_ws_label';

If a unique constraint on label is present, the schema change may have been applied even though Prisma recorded a failure. Don’t delete the _prisma_migrations row or re-run the migration without DBA or Astronomer support — you can create conflicting schema or migration state.

Re-run the Helm upgrade

$ helm upgrade -f migrated-values.yaml -n astronomer astronomer astronomer/astronomer --version 2.0.x

Values migration script reports unexpected errors

Cause: The Python migration script requires Python 3.10+ and the ruamel.yaml package.
Solution:
- Verify your Python version: python3 --version
- Install or upgrade the dependency: pip install ruamel.yaml
- Run the script in dry-run mode first to preview changes: ./bin/migrate-helm-chart-values-1x-to-2x.py --dry-run my-values.yaml

Airflow Deployments fail with Postgres connection errors after upgrade

Cause: The astronomer-bootstrap secret connection string doesn’t include a database name suffix (for example, /postgres), causing connection failures.
Solution:
- Patch the astronomer-bootstrap secret so connection ends with /<database_name>, then run a Helm upgrade.
- In CP/DP mode, patch the secret in both the control plane and data plane clusters.

$ NAMESPACE=astronomer
$ DB_NAME=postgres
$ 
$ CURRENT=$(kubectl -n "$NAMESPACE" get secret astronomer-bootstrap -o jsonpath='{.data.connection}' | base64 -d)
$ NEW="${CURRENT%/}/$DB_NAME"
$ kubectl -n "$NAMESPACE" patch secret astronomer-bootstrap --type=merge -p "{\"data\":{\"connection\":\"$(printf '%s' "$NEW" | base64 -w0)\"}}"
$ 
$ helm upgrade -f migrated-values.yaml -n "$NAMESPACE" astronomer astronomer/astronomer

Partially applied database migration

Cause: Prisma migration interrupted during the upgrade.

Solution:

Run migration manually:

$ npx prisma migrate deploy

Verify the Cluster and Deployment tables exist.

Vector Pods not running (upgrading from 0.37.x)

Cause: The fluentd key wasn’t renamed to vector in your migrated values file, or custom Fluentd resources were incompatible.
Solution: Verify that your migrated values file uses vector (not fluentd) as the top-level key. Re-run the migration script if needed, or manually rename the key.

Use this guide when your Astro Private Cloud (APC) control plane or data plane Pods aren’t progressing to a healthy state after upgrading to 2.0.

Helm upgrade fails with patch conflict

Cause: Environment variable ordering in the APC Deployment changed between versions, causing Helm’s strategic merge patch to fail.
Symptoms: The upgrade fails with an error containing cannot patch "astronomer-houston" with kind Deployment and references to environment variable ordering.
Solution: Delete the APC Deployment with --cascade=orphan to preserve running Pods, then retry the Helm upgrade:

$ kubectl delete deployment/<release-name>-houston -n astronomer --cascade=orphan
$ helm upgrade -f migrated-values.yaml -n astronomer astronomer astronomer/astronomer --version 2.0.x

APC API crashes with “Configuration property isn’t defined”

Cause: The APC Docker image contains an older default.yaml configuration file that doesn’t include keys added in 2.0. If the Helm ConfigMap doesn’t explicitly set these new keys, the APC API fails at startup when it tries to read them.
Symptoms: APC Pods enter CrashLoopBackOff with errors like:

Error: Configuration property "deployments.runtimeManagement.astroRuntimeReleasesFile" is not defined

Solution: Ensure the APC Docker image matches the chart version you are deploying. If you built a custom APC Docker image, rebuild it with the 2.0 codebase that includes the updated config/default.yaml. Then restart the APC Pods:

$ kubectl rollout restart deploy/<release-name>-houston

JetStream Pods stuck in Pending or CrashLoopBackOff

Cause: STAN volume claims or PVCs weren’t deleted before the upgrade (applies when upgrading from 0.37.x).
Solution: Delete existing PVCs for STAN:

$ kubectl delete pvc -l app.kubernetes.io/name=stan

Prisma migrate deploy or database migration job times out

Cause: Database locks or concurrent transactions, often because of Prometheus or monitoring jobs.

Solution:

Check for locks:

1 SELECT * FROM pg_locks pl JOIN pg_stat_activity psa ON pl.pid = psa.pid;

Terminate long-running transactions:

1 SELECT pg_terminate_backend(pid)
2 FROM pg_stat_activity
3 WHERE state = 'active' AND now() - query_start > interval '5 minutes';

APC Worker logs show NATS: connection timeout

Cause: The APC API tries to connect to STAN before JetStream is ready (applies when upgrading from 0.37.x).
Solution: Restart the APC Worker Pod to reconnect to JetStream:

$ kubectl rollout restart deploy/<release-name>-houston-worker

Airflow 3 Pods fail with ModuleNotFoundError: airflow.providers.cncf

Cause: Cluster configuration override missing or not applied.
Solution:
- Reapply the override function in the Airflow 3 migration guide.
- Ensure minimumAstroRuntimeVersion is set to 3.1-2 or higher in the override config.

Airflow Deployments fail with Postgres connection errors after upgrade due to missing database name

Cause: The astronomer-bootstrap secret connection string doesn’t include a database name suffix (for example, /main on AWS RDS or /postgres on AKS), causing connection failures. Verify the correct database name by logging in to your database.
Solution:
- Patch the astronomer-bootstrap secret so connection ends with /<database_name>, then run a Helm upgrade.
- The secret is in the astronomer namespace (or the namespace where you install APC).
- In control plane/data plane (CP/DP) mode, patch the secret in both the control plane and data plane clusters.

$ # Namespace where APC is installed
$ NAMESPACE=astronomer
$ # Database name suffix; for AWS the default database is usually "main", for AKS "postgres"
$ DB_NAME=main
$ 
$ # Read current connection string, append database name, and update the secret
$ CURRENT=$(kubectl -n "$NAMESPACE" get secret astronomer-bootstrap -o jsonpath='{.data.connection}' | base64 -d)
$ NEW="${CURRENT%/}/$DB_NAME"
$ kubectl -n "$NAMESPACE" patch secret astronomer-bootstrap --type=merge -p "{\"data\":{\"connection\":\"$(printf '%s' "$NEW" | base64 -w0)\"}}"
$ 
$ # Apply the update with a Helm upgrade
$ helm upgrade -f values.yaml -n "$NAMESPACE" astronomer astronomer/astronomer

APC API migration fails due to duplicate workspace labels

Cause: The upgrade includes a migration that adds a unique constraint to the Workspace.label column. If your database contains workspaces with duplicate labels, the migration fails with Prisma error P3009. Once the migration is marked as failed, all subsequent migrations are blocked, preventing the APC API from starting.
Symptoms: APC database migration Pods fail with Error: P3009 and the message migrate found failed migrations in the target database. APC API and worker Pods enter CrashLoopBackOff.
Solution:

Back up the APC database before you run DELETE on _prisma_migrations or rely on a retry of this migration.

Check for duplicate workspace labels

Connect to your APC database and check for duplicate workspace labels:

1 SET search_path TO "houston$default";
2 SELECT label, COUNT(*) FROM "Workspace" GROUP BY label HAVING COUNT(*) > 1;

Resolve the duplicate labels

Rename one of the duplicate workspace labels, using the workspace ID as a suffix to guarantee uniqueness:

1 UPDATE "Workspace" SET label = label || '-' || id
2 WHERE id = '<workspace-id-to-rename>';

Check whether the migration partially applied

Before clearing the failed Prisma record, check whether the unique constraint was applied:

1 SELECT constraint_name FROM information_schema.table_constraints
2 WHERE table_schema = 'houston$default'
3   AND table_name = 'Workspace'
4   AND constraint_type = 'UNIQUE';

If the unique constraint on label isn’t listed, the migration didn’t complete. After fixing duplicate labels, you can clear the failed migration and retry. Back up your database first, then delete only that failed migration row:

1 DELETE FROM "houston$default"."_prisma_migrations"
2   WHERE migration_name = '20250918072802_added_unique_constraint_to_ws_label';

If a unique constraint on label is present, the schema change may have been applied even though Prisma recorded a failure. Don’t delete the _prisma_migrations row or re-run the migration without DBA or Astronomer support — you can create conflicting schema or migration state.

Re-run the Helm upgrade

$ helm upgrade -f migrated-values.yaml -n astronomer astronomer astronomer/astronomer --version 2.0.x

Values migration script reports unexpected errors

Cause: The Python migration script requires Python 3.10+ and the ruamel.yaml package.
Solution:
- Verify your Python version: python3 --version
- Install or upgrade the dependency: pip install ruamel.yaml
- Run the script in dry-run mode first to preview changes: ./bin/migrate-helm-chart-values-1x-to-2x.py --dry-run my-values.yaml

Airflow Deployments fail with Postgres connection errors after upgrade

Cause: The astronomer-bootstrap secret connection string doesn’t include a database name suffix (for example, /postgres), causing connection failures.
Solution:
- Patch the astronomer-bootstrap secret so connection ends with /<database_name>, then run a Helm upgrade.
- In CP/DP mode, patch the secret in both the control plane and data plane clusters.

$ NAMESPACE=astronomer
$ DB_NAME=postgres
$ 
$ CURRENT=$(kubectl -n "$NAMESPACE" get secret astronomer-bootstrap -o jsonpath='{.data.connection}' | base64 -d)
$ NEW="${CURRENT%/}/$DB_NAME"
$ kubectl -n "$NAMESPACE" patch secret astronomer-bootstrap --type=merge -p "{\"data\":{\"connection\":\"$(printf '%s' "$NEW" | base64 -w0)\"}}"
$ 
$ helm upgrade -f migrated-values.yaml -n "$NAMESPACE" astronomer astronomer/astronomer

Partially applied database migration

Cause: Prisma migration interrupted during the upgrade.

Solution:

Run migration manually:

$ npx prisma migrate deploy

Verify the Cluster and Deployment tables exist.

Vector Pods not running (upgrading from 0.37.x)

Cause: The fluentd key wasn’t renamed to vector in your migrated values file, or custom Fluentd resources were incompatible.
Solution: Verify that your migrated values file uses vector (not fluentd) as the top-level key. Re-run the migration script if needed, or manually rename the key.

$	kubectl delete deployment/<release-name>-houston -n astronomer --cascade=orphan
$	helm upgrade -f migrated-values.yaml -n astronomer astronomer astronomer/astronomer --version 2.0.x

1	SELECT pg_terminate_backend(pid)
2	FROM pg_stat_activity
3	WHERE state = 'active' AND now() - query_start > interval '5 minutes';

$	# Namespace where APC is installed
$	NAMESPACE=astronomer
$	# Database name suffix; for AWS the default database is usually "main", for AKS "postgres"
$	DB_NAME=main
$
$	# Read current connection string, append database name, and update the secret
$	CURRENT=$(kubectl -n "$NAMESPACE" get secret astronomer-bootstrap -o jsonpath='{.data.connection}' \| base64 -d)
$	NEW="${CURRENT%/}/$DB_NAME"
$	kubectl -n "$NAMESPACE" patch secret astronomer-bootstrap --type=merge -p "{\"data\":{\"connection\":\"$(printf '%s' "$NEW" \| base64 -w0)\"}}"
$
$	# Apply the update with a Helm upgrade
$	helm upgrade -f values.yaml -n "$NAMESPACE" astronomer astronomer/astronomer

1	SET search_path TO "houston$default";
2	SELECT label, COUNT() FROM "Workspace" GROUP BY label HAVING COUNT() > 1;

1	UPDATE "Workspace" SET label = label \|\| '-' \|\| id
2	WHERE id = '<workspace-id-to-rename>';

1	SELECT constraint_name FROM information_schema.table_constraints
2	WHERE table_schema = 'houston$default'
3	AND table_name = 'Workspace'
4	AND constraint_type = 'UNIQUE';

1	DELETE FROM "houston$default"."_prisma_migrations"
2	WHERE migration_name = '20250918072802_added_unique_constraint_to_ws_label';

$	NAMESPACE=astronomer
$	DB_NAME=postgres
$
$	CURRENT=$(kubectl -n "$NAMESPACE" get secret astronomer-bootstrap -o jsonpath='{.data.connection}' \| base64 -d)
$	NEW="${CURRENT%/}/$DB_NAME"
$	kubectl -n "$NAMESPACE" patch secret astronomer-bootstrap --type=merge -p "{\"data\":{\"connection\":\"$(printf '%s' "$NEW" \| base64 -w0)\"}}"
$
$	helm upgrade -f migrated-values.yaml -n "$NAMESPACE" astronomer astronomer/astronomer