Troubleshoot data plane registration

Use this guide to diagnose and resolve errors that occur when you register a data plane cluster in Astro Private Cloud (APC) 1.x.

”Commander Metadata service unavailable”

Registration Failed
Cannot fetch dataplane metadata details. Reason: Commander Metadata service unavailable.

This error occurs when the control plane Houston API cannot reach Commander’s /metadata HTTP endpoint on the data plane during the registration handshake. Registration cannot complete until Houston successfully fetches and validates this metadata.

How registration works

When you register a data plane, Houston makes an outbound HTTPS GET request to Commander’s metadata endpoint and validates the JSON response before creating the cluster record. The failure can occur at any point along that path: DNS resolution, network connectivity, TLS trust, or ingress routing.

Correct metadata URL format

In APC 1.x, the chart creates two separate ingresses for Commander on the data plane:

IngressHostnamePortProtocolPurpose
commander-api-ingresscommander.<domainPrefix>.<baseDomain>443gRPCHouston-Commander control channel
commander-metadata-ingress<domainPrefix>.<baseDomain>443HTTPSRegistration metadata endpoint

The metadata URL you enter during registration must point to the metadata ingress (the bare data plane domain). Houston appends /metadata to this URL internally when it makes the fetch call, so don’t include the path in the registration form:

https://<domainPrefix>.<baseDomain>

Example: If your control plane base domain is apc.example.com and your data plane domainPrefix is dp-01:

  • Correct: https://dp-01.apc.example.com
  • Incorrect: https://commander.dp-01.apc.example.com (this is the gRPC API ingress and returns a 404)

Diagnose the error

Work through the following steps in order to isolate the cause.

Step 1: Verify Commander is running

On the data plane cluster, confirm Commander is healthy:

$kubectl get pods -n <data-plane-namespace> -l component=commander

Expected output: 1/1 Running. If Commander is crashlooping, check its logs:

$kubectl logs -n <data-plane-namespace> -l component=commander

Step 2: Verify DNS resolves to the correct IP

The data plane has its own NGINX ingress controller with its own load balancer IP, separate from the control plane’s load balancer. Confirm that the DNS for the data plane domain resolves to the data plane’s load balancer, not the control plane’s.

Find the data plane’s load balancer IP:

$kubectl get svc -n <data-plane-namespace> -l component=nginx

Look for the EXTERNAL-IP on the LoadBalancer-type service. In a co-located setup (control plane and data plane on the same cluster), the service name typically includes -dp-nginx and has a different IP from the control plane’s NGINX service.

Then verify DNS:

$dig +short <domainPrefix>.<baseDomain>

The returned IP must match the data plane’s load balancer EXTERNAL-IP. A mismatch (for example, the data plane subdomain pointing to the control plane load balancer IP) causes all requests to return 404 because the control plane’s NGINX has no ingress rules for the data plane’s hostnames.

Co-located deployments

When the control plane and data plane share a single cluster, they use separate namespaces and separate NGINX load balancers. DNS for *.<domainPrefix>.<baseDomain> must be a distinct wildcard record pointing to the data plane’s load balancer IP. Verify this record exists and doesn’t share the control plane’s IP.

Step 3: Test the metadata endpoint directly

After DNS resolves correctly, confirm the endpoint returns a valid 200 response:

$curl -sk -w "\nHTTP Status: %{http_code}\n" \
> "https://<domainPrefix>.<baseDomain>/metadata"

A healthy response returns JSON similar to the following:

1{
2 "kubernetesVersion": "v1.32.x",
3 "mode": "data",
4 "dataplaneChartVersion": "1.x.x",
5 "cloudProvider": "GCP",
6 "healthStatus": "HEALTHY",
7 "baseDomain": "dp-01.apc.example.com",
8 "commander": {
9 "version": "1.x.x",
10 "url": "commander.dp-01.apc.example.com:443",
11 "status": "HEALTHY",
12 "airflowChartVersion": "1.x.x"
13 }
14}

The cloudProvider field may return "local" on Kubernetes clusters and certain co-located environments. This is expected and doesn’t affect registration.

If this request returns a 404, continue to Step 4. If it times out or the connection is refused, skip to Step 5.

Step 4: Verify the metadata ingress exists and is configured correctly

Check that the metadata ingress exists in the data plane namespace:

$kubectl get ingress -n <data-plane-namespace> | grep metadata

Describe it to verify the hostname and path:

$kubectl describe ingress -n <data-plane-namespace> <release-name>-commander-metadata-ingress

Confirm the following:

  • Host matches <domainPrefix>.<baseDomain> exactly.
  • Path is /metadata.
  • Backend points to the Commander service on port 8880 (the HTTP port, not the gRPC port 50051).
  • Ingress class annotation (kubernetes.io/ingress.class) matches the data plane’s NGINX class name.

If the ingress is missing or misconfigured, re-run helm upgrade on the data plane release with the correct global.plane.domainPrefix and global.baseDomain values.

Step 5: Verify network connectivity from the control plane

Even if the endpoint works from your workstation, Houston on the control plane must also reach it. Test from inside the Houston Pod:

$kubectl exec -it -n <cp-namespace> deploy/<cp-release>-houston -- sh

Inside the Pod, run:

$wget -qO- https://<domainPrefix>.<baseDomain>/metadata

If this fails, check for firewall rules or network policies blocking HTTPS egress from the control plane to the data plane’s load balancer IP.

Step 6: Verify TLS certificate trust

If the data plane uses a private certificate authority (CA) or a certificate chain that Houston doesn’t trust, the HTTPS request fails with a TLS error. Test with certificate verification disabled:

$# Inside the Houston Pod
$wget --no-check-certificate -qO- https://<domainPrefix>.<baseDomain>/metadata

If this succeeds but the unmodified request fails, add the data plane’s CA certificate to the control plane’s global.privateCaCerts Helm values and re-deploy.

Summary checklist

Before retrying registration, verify all of the following:

  • Commander Pod is 1/1 Running in the data plane namespace.
  • dig <domainPrefix>.<baseDomain> returns the data plane’s NGINX load balancer IP (not the control plane’s).
  • curl https://<domainPrefix>.<baseDomain>/metadata returns HTTP 200 with valid JSON.
  • The metadata URL entered in the registration form is https://<domainPrefix>.<baseDomain> (no /metadata suffix, Houston appends that internally).
  • The metadata ingress exists and its backend targets Commander’s HTTP port (8880).
  • Houston on the control plane can reach the data plane’s load balancer over HTTPS.