Troubleshoot data plane registration

Use this guide to diagnose and resolve errors that occur when you register a data plane cluster in Astro Private Cloud (APC) 2.x.

”Commander metadata service unavailable”

Registration Failed
Cannot fetch dataplane metadata details. Reason: Commander Metadata service unavailable.

This error occurs when the control plane APC API can’t reach the deployment orchestrator’s /metadata HTTP endpoint on the data plane during the registration handshake. Registration can’t complete until the APC API successfully fetches and validates this metadata.

How registration works

When you register a data plane, the APC API makes an outbound HTTPS GET request to the deployment orchestrator’s metadata endpoint and validates the JSON response before creating the cluster record. The failure can occur at any point along that path: DNS resolution, network connectivity, TLS trust, or ingress routing.

Correct metadata URL format

In APC 2.x, the chart creates two separate ingresses for the deployment orchestrator on the data plane:

IngressHostnamePortProtocolPurpose
commander-api-ingresscommander.<domainPrefix>.<baseDomain>443gRPCAPC API–deployment orchestrator control channel
commander-metadata-ingress<domainPrefix>.<baseDomain>443HTTPSRegistration metadata endpoint

The metadata URL you enter during registration must point to the metadata ingress (the bare data plane domain). the APC API appends /metadata to this URL internally when it makes the fetch call, so don’t include the path in the registration form:

https://<domainPrefix>.<baseDomain>

Example: If your control plane base domain is apc.example.com and your data plane domainPrefix is dp-01:

  • Correct: https://dp-01.apc.example.com
  • Incorrect: https://commander.dp-01.apc.example.com (this is the gRPC API ingress and returns a 404)

Diagnose the error

Work through the following steps in order to isolate the cause.

Step 1: Verify the deployment orchestrator is running

On the data plane cluster, confirm the deployment orchestrator is healthy:

$kubectl get pods -n <data-plane-namespace> -l component=commander

Expected output: 1/1 Running. If the deployment orchestrator is crashlooping, check its logs:

$kubectl logs -n <data-plane-namespace> -l component=commander

Step 2: Verify DNS resolves to the correct IP

The data plane has its own NGINX ingress controller with its own load balancer IP, separate from the control plane’s load balancer. Confirm that the DNS for the data plane domain resolves to the data plane’s load balancer, not the control plane’s.

Find the data plane’s load balancer IP:

$kubectl get svc -n <data-plane-namespace> -l component=nginx

Look for the EXTERNAL-IP on the LoadBalancer-type service. In a co-located setup (control plane and data plane on the same cluster), the service name typically includes -dp-nginx and has a different IP from the control plane’s NGINX service.

Then verify DNS:

$dig +short <domainPrefix>.<baseDomain>

The returned IP must match the data plane’s load balancer EXTERNAL-IP. A mismatch (for example, the data plane subdomain pointing to the control plane load balancer IP) causes all requests to return 404 because the control plane’s NGINX has no ingress rules for the data plane’s hostnames.

Co-located deployments

When the control plane and data plane share a single cluster, they use separate namespaces and separate NGINX load balancers. DNS for *.<domainPrefix>.<baseDomain> must be a distinct wildcard record pointing to the data plane’s load balancer IP. Verify this record exists and doesn’t share the control plane’s IP.

Step 3: Test the metadata endpoint directly

After DNS resolves correctly, confirm the endpoint returns a valid 200 response:

$curl -sk -w "\nHTTP Status: %{http_code}\n" \
> "https://<domainPrefix>.<baseDomain>/metadata"

A healthy response returns JSON similar to the following:

1{
2 "kubernetesVersion": "v1.32.x",
3 "mode": "data",
4 "dataplaneChartVersion": "2.x.x",
5 "cloudProvider": "GCP",
6 "healthStatus": "HEALTHY",
7 "baseDomain": "dp-01.apc.example.com",
8 "commander": {
9 "version": "2.x.x",
10 "url": "commander.dp-01.apc.example.com:443",
11 "status": "HEALTHY",
12 "airflowChartVersion": "2.x.x"
13 }
14}

The cloudProvider field may return "local" on Kubernetes clusters and certain co-located environments. This is expected and doesn’t affect registration.

If this request returns a 404, continue to Step 4. If it times out or the connection is refused, skip to Step 5.

Step 4: Verify the metadata ingress exists and is configured correctly

Check that the metadata ingress exists in the data plane namespace:

$kubectl get ingress -n <data-plane-namespace> | grep metadata

Describe it to verify the hostname and path:

$kubectl describe ingress -n <data-plane-namespace> <release-name>-commander-metadata-ingress

Confirm the following:

  • Host matches <domainPrefix>.<baseDomain> exactly.
  • Path is /metadata.
  • Backend points to the deployment orchestrator service on port 8880 (the HTTP port, not the gRPC port 50051).
  • Ingress class annotation (kubernetes.io/ingress.class) matches the data plane’s NGINX class name.

If the ingress is missing or misconfigured, re-run helm upgrade on the data plane release with the correct global.plane.domainPrefix and global.baseDomain values.

Step 5: Verify network connectivity from the control plane

Even if the endpoint works from your workstation, the APC API on the control plane must also reach it. Test from inside the APC Pod:

$kubectl exec -it -n <cp-namespace> deploy/<cp-release>-houston -- sh

Inside the Pod, run:

$wget -qO- https://<domainPrefix>.<baseDomain>/metadata

If this fails, check for firewall rules or network policies blocking HTTPS egress from the control plane to the data plane’s load balancer IP.

Step 6: Verify TLS certificate trust

If the data plane uses a private certificate authority (CA) or a certificate chain that the APC API doesn’t trust, the HTTPS request fails with a TLS error. Test with certificate verification disabled:

$# Inside the Houston Pod
$wget --no-check-certificate -qO- https://<domainPrefix>.<baseDomain>/metadata

If this succeeds but the unmodified request fails, add the data plane’s CA certificate to the control plane’s global.privateCaCerts Helm values and re-deploy.

Summary checklist

Before retrying registration, verify all of the following:

  • deployment orchestrator Pod is 1/1 Running in the data plane namespace.
  • dig <domainPrefix>.<baseDomain> returns the data plane’s NGINX load balancer IP (not the control plane’s).
  • curl https://<domainPrefix>.<baseDomain>/metadata returns HTTP 200 with valid JSON.
  • The metadata URL entered in the registration form is https://<domainPrefix>.<baseDomain> (no /metadata suffix, the APC API appends that internally).
  • The metadata ingress exists and its backend targets the deployment orchestrator’s HTTP port (8880).
  • The APC API on the control plane can reach the data plane’s load balancer over HTTPS.