Debug an Astro Private Cloud installation

Use this guide when your Astro Private Cloud (APC) control plane or data plane Pods are not progressing to a healthy state after installation.

Ensure platform components are reaching full availability

Work through the following checks to from controllers to individual containers to isolate possible causes when Pods don’t reach the READY state.

1. Verify controllers and ReplicaSets

  1. List Deployments, StatefulSets, and ReplicaSets in your namespace and confirm the latest ReplicaSet or StatefulSet shows the expected number of available replicas:

    $kubectl get deployment,statefulset,replicaset -n <astronomer namespace>
  2. Identify the most recent ReplicaSet for the component that is failing, with results sorted by creation timestamp:

    $kubectl get replicaset -n <astronomer namespace> --sort-by=.metadata.creationTimestamp
  3. Inspect the returned ReplicaSet for status and events that may be preventing Pods from launching:

    $kubectl describe replicaset <replicaset-name> -n <astronomer namespace>

    Resolve issues such as insufficient resources, pull errors, or missing secrets, then re-check the ReplicaSet until .status.availableReplicas matches .spec.replicas.

2. Examine Pods and namespace events

  1. List Pod status:

    $kubectl get pods -n <astronomer namespace>
  2. describe a failing Pod to view events, container status, and scheduling details:

    $kubectl describe pod <pod-name> -n <astronomer namespace>
  3. Review recent events in the namespace for additional context:

    $kubectl get events -n <astronomer namespace> --sort-by=.lastTimestamp

3. Inspect container logs

If a Pod continues to restart or stuck in CrashLoopBackOff, gather logs for each container:

$kubectl logs <pod-name> -c <container-name> -n <astronomer namespace>

If the container restarts quickly, use --previous to view logs from the last attempt:

$kubectl logs <pod-name> -c <container-name> -n <astronomer namespace> --previous

Use the collected errors to adjust your configuration, for example, by fixing database credentials or registry access. After remediation, re-run kubectl get pods to confirm all Pods report READY status. If problems persist, collect the relevant logs and events and contact Astronomer support.

Houston Pods stuck in CrashLoopBackOff

Houston (API) connects directly to the control-plane database during startup. If the Pods restart repeatedly:

  1. List Pods to verify their status:

    $kubectl get pods -n <astronomer namespace>
  2. Test connectivity to the database from inside the cluster:

    $kubectl run psql --rm -it --restart=Never --namespace <astronomer namespace> \
    > --image bitnami/postgresql --command -- \
    > psql $(kubectl get secret -n <astronomer namespace> <platform-release-name>-houston-backend \
    > --template='{{.data.connection | base64decode }}' | sed 's/?.*//g')

    If the connection times out, investigate networking or firewall rules between Kubernetes nodes and the Postgres host.

  3. Confirm the astronomer-bootstrap secret contains the correct connection string:

    $kubectl get secret astronomer-bootstrap -n <astronomer namespace> -o yaml

    Decode the connection value and fix any typos. After updating the secret, delete the Houston and Grafana Pods so they pick up the change.

x509 “certificate signed by unknown authority” while pulling images

If image pulls fail with a certificate error, such as when syncing registry certificates, restart the Houston Pods followed by the platform registry Pod. Ensure any custom certificate authorities are configured under global.privateCaCerts and applied via helm upgrade.

Houston worker showing NATS timeout errors after installation

After installing or upgrading APC, you might encounter issues where Deployments appear in the Astro CLI and database, but their Kubernetes namespaces are not created. Houston logs might show UnhandledPromiseRejectionWarning: NatsError: TIMEOUT.

This occurs when the NATS JetStream cluster has not yet elected a metadata leader before the Houston worker Pods attempt to set up streams and consumers.

To resolve:

  1. Verify Houston worker Pods are showing NATS timeout errors:
$kubectl logs -l component=houston-worker -n <astronomer namespace>
  1. Restart the Houston worker Pods to allow them to reconnect after the NATS leader election completes:
$kubectl rollout restart deployment <platform-release-name>-houston-worker -n <astronomer namespace>
  1. Confirm Deployment namespaces are created:
$kubectl get namespaces

After the Houston worker Pods restart, they successfully create the necessary Kubernetes resources for your deployments.