Configure platform resources | Astronomer Documentation

Astro Private Cloud (APC) runs on Kubernetes and requires careful resource planning for both control plane and data plane components. This guide covers resource configuration for all platform components.

Architecture overview

APC uses a control plane/data plane architecture:

Control Plane: APC API, Astro UI, Registry, Config Syncer.
Data Plane: the deployment orchestrator, Registry, Airflow deployments.
Unified Mode: All components in a single cluster.

Control plane components

APC API

APC API is the GraphQL API that manages platform operations.

1 houston:
2   replicas: 2
3   resources:
4     requests:
5       cpu: "500m"
6       memory: "1Gi"
7     limits:
8       cpu: "1000m"
9       memory: "2Gi"

Scaling recommendations

Platform Size	Replicas	CPU Request	Memory Request
Small (< 10 deployments)	2	250m	512Mi
Medium (10-50 deployments)	2	500m	1Gi
Large (50+ deployments)	3	1000m	2Gi

APC worker

Background job processor for asynchronous operations. The worker uses the same houston.resources as the APC API.

1 houston:
2   resources:
3     requests:
4       cpu: "500m"
5       memory: "1Gi"
6     limits:
7       cpu: "1000m"
8       memory: "2Gi"
9   worker:
10     replicas: 2

Astro UI

Web interface for platform management.

1 astroUI:
2   replicas: 2
3   resources:
4     requests:
5       cpu: "100m"
6       memory: "256Mi"
7     limits:
8       cpu: "500m"
9       memory: "512Mi"

Data plane components

The deployment orchestrator

Manages Airflow deployment provisioning and Kubernetes operations.

1 commander:
2   replicas: 2
3   resources:
4     requests:
5       cpu: "250m"
6       memory: "512Mi"
7     limits:
8       cpu: "500m"
9       memory: "1Gi"

Registry

Docker image registry for Airflow deployments.

1 registry:
2   replicas: 1
3   resources:
4     requests:
5       cpu: "100m"
6       memory: "256Mi"
7     limits:
8       cpu: "500m"
9       memory: "512Mi"
10   persistence:
11     enabled: true
12     size: 100Gi

Storage backend options

Local PersistentVolume (default)
Google Cloud Storage (GCS)
Azure Blob Storage
Amazon S3

Ingress and networking

NGINX ingress controller

1 nginx:
2   replicas: 2
3   resources:
4     requests:
5       cpu: "500m"
6       memory: "1Gi"
7     limits:
8       cpu: "1000m"
9       memory: "2Gi"
10   serviceType: LoadBalancer

Database

PostgreSQL

APC API metadata database.

1 postgresql:
2   resources:
3     requests:
4       cpu: "250m"
5       memory: "256Mi"
6     limits:
7       cpu: "1000m"
8       memory: "1Gi"
9   persistence:
10     enabled: true
11     size: 8Gi

Production configuration

1 postgresql:
2   replication:
3     enabled: true
4     slaveReplicas: 2
5     synchronousCommit: "on"

Resource sizing examples

Development environment

1 houston:
2   replicas: 1
3   resources:
4     requests:
5       cpu: "100m"
6       memory: "256Mi"
7     limits:
8       cpu: "500m"
9       memory: "1Gi"
10 
11 astroUI:
12   replicas: 1
13   resources:
14     requests:
15       cpu: "50m"
16       memory: "128Mi"
17 
18 commander:
19   replicas: 1
20   resources:
21     requests:
22       cpu: "100m"
23       memory: "256Mi"
24 
25 nginx:
26   replicas: 1
27   resources:
28     requests:
29       cpu: "100m"
30       memory: "256Mi"

Production environment

1 houston:
2   replicas: 3
3   resources:
4     requests:
5       cpu: "1000m"
6       memory: "2Gi"
7     limits:
8       cpu: "2000m"
9       memory: "4Gi"
10 
11 astroUI:
12   replicas: 2
13   resources:
14     requests:
15       cpu: "250m"
16       memory: "512Mi"
17     limits:
18       cpu: "500m"
19       memory: "1Gi"
20 
21 commander:
22   replicas: 2
23   resources:
24     requests:
25       cpu: "500m"
26       memory: "1Gi"
27 
28 nginx:
29   replicas: 3
30   resources:
31     requests:
32       cpu: "1000m"
33       memory: "2Gi"
34 
35 postgresql:
36   resources:
37     requests:
38       cpu: "500m"
39       memory: "1Gi"
40   persistence:
41     size: 50Gi

High availability configuration

1 houston:
2   replicas: 3
3   podDisruptionBudget:
4     enabled: true
5     maxUnavailable: 1
6 
7 astroUI:
8   replicas: 3
9   podDisruptionBudget:
10     enabled: true
11     maxUnavailable: 1
12 
13 commander:
14   replicas: 3
15   podDisruptionBudget:
16     enabled: true
17     maxUnavailable: 1

Monitor resource usage

$ # View pod resource usage
$ kubectl top pods -n astronomer
$ 
$ # View node resource usage
$ kubectl top nodes

Troubleshooting

Out of memory (OOMKilled)

Symptom: Pods restart with OOMKilled status.

Solution: Increase memory limits:

1 houston:
2   resources:
3     limits:
4       memory: "4Gi"

CPU throttling

Symptom: Slow response times, high latency.

Solution: Increase CPU limits or add replicas:

1 houston:
2   replicas: 3
3   resources:
4     limits:
5       cpu: "2000m"

Pending pods

Symptom: Pods stuck in Pending state.

Solution:

Check node resources: kubectl describe nodes.
Reduce resource requests or add nodes.
Check for taints/tolerations mismatches.

Best practices

Set both requests and limits for predictable scheduling.
Use Pod Disruption Budgets for high availability.
Monitor resource usage before scaling.
Size based on workload not just component count.
Plan for growth with 20-30% headroom.
Use separate node pools for platform components.