Astro Private Cloud data plane architecture
The Astro Private Cloud (APC) data plane hosts the execution layer for your Airflow deployments. When you set the global.plane.mode: data in your values.yaml file, the Helm chart deploys only the runtime-facing components while relying on a separate control plane for user management, configuration, registry orchestration, and token orchestration. This document summarizes the data plane’s responsibilities, the services that run in this mode, and how the services integrate with the control plane. For the management plane, see Control Plane Architecture, or review Unified Architecture if you want to run both in a single cluster.
Responsibilities
A data plane cluster focuses on:
- Running customer Airflow Deployments: The deployment orchestrator installs and upgrades each Deployment’s runtime chart using configuration synced from the APC API.
- Serving Airflow ingress: The ingress controller (Data plane NGINX) exposes
deployments.<domain-prefix>.<base-domain>(and any per-Deployment vanity hostnames) to route user traffic into the correct Airflow namespace. - Collecting telemetry: The metrics collector (Prometheus) scrapes Deployment namespaces and either exposes a federate endpoint or remote-writes metrics to the control plane. Optional logging stacks (log forwarder Vector and log store Elasticsearch) gather task logs.
- Handling image distribution: The platform registry (Registry) stores runtime and Dag images local to the data plane and syncs credentials using the secret distribution job (Config Syncer).
- Maintaining secure connectivity: Config Syncer distributes the APC API-issued tokens and certificates so Deployments can authenticate back to the APC API, the registry, and other platform services.
Core components in data plane mode
When global.plane.mode is set to data or unified, APC enables the following charts:
- The deployment orchestrator (
charts/astronomer/templates/commander/*): Polls the APC API for desired state and applies/rolls back Helm releases for each Deployment. - Config Syncer (
charts/astronomer/templates/config-syncer/*): Periodically mirrors platform secrets (registry credentials, the APC API tokens, runtime settings) into each Airflow namespace. - Data plane NGINX (
charts/nginx/templates/dataplane/*): Provides ingress for the per-Deployment Airflow UI and API as well as for platform components like the Registry, Prometheus, and Elasticsearch. This component isn’t installed when OpenShift is enabled. - Platform registry (Registry) (
charts/astronomer/templates/registry/*): Optional container registry for runtime/Dag images when the bundled registry is enabled. - Vector (
charts/vector/templates/*): Daemonset, or a sidecar container ifglobal.loggingSidecaris enabled, that tails task logs and ships them to Elasticsearch or an external destination. - Elasticsearch (
charts/elasticsearch/templates/*): Optional log storage for Deployments. Only deployed in data plane or unified modes when enabled. - Cluster state exporter (kube-state-metrics) (
charts/kube-state/templates/*): Scrapes namespace-level object metadata for Prometheus. - Metrics gateway (Prometheus federation/auth) (
charts/prometheus/templates/prometheus-federation-*): Adds the auth proxy, Service, and federation jobs necessary for the control plane to scrape metrics securely. - Auxiliary services: External-es proxy, namespace pool RBAC, and other helper charts that only make sense near the workloads.
Prometheus, Postgres, and other shared charts still exist. In data mode, Prometheus pushes or exposes metrics back to the control plane rather than aggregating globally.
Network endpoints
Data plane ingress typically includes:
deployments.<domain-prefix>.<base-domain>: Airflow web UIs and APIs for each Deployment. the deployment orchestrator configures path-based routing for these Deployments.registry.<domain-prefix>.<base-domain>: When hosting the registry in the data plane.- deployment orchestrator endpoints for registering clusters and creating and updating Deployments.
- Optional vanity hostnames per Deployment managed by the deployment orchestrator’s Helm releases.
- Data plane metrics collector (Prometheus) serves a federate endpoint scraped by the control plane.
Outbound connections include:
- To the APC API: the deployment orchestrator, Config Syncer, and cronjobs call the control plane API over TLS.
- To external registries/log stores: Depending on how runtime images and logs are hosted.
Control plane integration
Data planes authenticate against the APC API using service accounts and tokens that the control plane provisions. The typical workflow is:
- Registration: A platform admin registers a data plane entry in the APC API, which generates unique the deployment orchestrator and Config Syncer tokens.
- Secret distribution: During install you provide the APC API tokens via
astronomer.houston.config.dataplane.*values. The secret distribution job (Config Syncer) keeps runtime secrets fresh. - Deployment lifecycle: APC API pushes requests to the deployment orchestrator, which reconciles Deployments. If the control plane issues upgrades or scale instructions, the deployment orchestrator applies them locally.
- Telemetry forwarding: The metrics collector (Prometheus) and log forwarder (Vector) transport metrics and logs to the control plane (or third-party sinks) so administrators have a single-pane view.
Behavior during a control plane outage
In a split Deployment, data plane clusters operate independently from the control plane at runtime. If the APC API becomes unavailable, your existing Airflow workloads continue without interruption, but all management operations and external access to Airflow stop.
What stops working:
- Authentication: Sign-in to both the Astro Private Cloud UI and Airflow UIs fails because authentication flows depend on the APC API.
- Astro Private Cloud UI, APC API, and Astro CLI: All management interfaces become unavailable, including deployment creation, configuration changes, and user management.
- New data plane registration: Registration requires the APC API.
- Deployment changes: the deployment orchestrator can’t receive new instructions from the APC API, so you can’t create, update, or delete Deployments.
- External Airflow API access: Requests to the Airflow REST API from outside the data plane cluster fail because they route through the APC API-authenticated ingress.
What keeps running:
- Existing Airflow Deployments: All Airflow components (scheduler, webserver, workers, triggerer) remain running. the APC API isn’t in the runtime execution path for Dags.
- Scheduled Dag runs: The Airflow scheduler continues to trigger Dags on schedule, and in-flight tasks complete normally.
- Internal Airflow communication: Task execution, XCom, and connections to external data sources continue to function.
- Data plane infrastructure: the deployment orchestrator retains the last-known desired state and continues to reconcile existing Helm releases. Config Syncer, NGINX ingress, and other data plane components remain running.
- Local metrics collection: Prometheus on the data plane continues scraping Airflow and platform metrics locally.
While Airflow keeps running, you can’t make changes or access Airflow UIs through the standard ingress during a control plane outage. If you need emergency access to a running Deployment, use kubectl to interact directly with the data plane cluster.
When the control plane recovers, the deployment orchestrator and Config Syncer reconnect to the APC API automatically. Prometheus federation or remote-write resumes, and any metrics gap during the outage appears in your monitoring dashboards. No manual intervention is required on the data plane side unless tokens expired during the outage. See Register a data plane for token management details.
Monitoring and alerting
Data plane Prometheus scrapes:
- Airflow Deployments, including scheduler, webserver/API server, workers.
- Platform Pods, like the deployment orchestrator, Config Syncer, NGINX, and Vector.
- Kube-state-metrics for namespace-wide object counts.
Use Alertmanager rules provided by the chart or integrate with the control plane Alertmanager to drive notifications.
Watch for deployment orchestrator heartbeat failures, Config Syncer errors, and Prometheus remote-write issues—these often indicate connectivity problems back to the APC API.
Comparison to other modes
Split deployments keep the affected area confined to workload execution, letting you scale your data planes independently and enforce network boundaries to sensitive data resources.
- For the management plane overview see Control Plane Architecture.
- For a single-cluster footprint see Unified Architecture.
Next steps
- Deploy a data plane using the Install data plane guide.
- Register the data plane with your control plane and verify deployment orchestrator heartbeat.
- Configure DNS, TLS certificates, and networking policies based on the ingress endpoints.
- Integrate telemetry (metrics, logs) with your central observability tools.