Configure platform resources

Astro Private Cloud (APC) runs on Kubernetes and requires careful resource planning for both control plane and data plane components. This guide covers resource configuration for all platform components.

Architecture overview

APC uses a control plane/data plane architecture:

  • Control Plane: Houston API, Astro UI, Registry, Config Syncer.
  • Data Plane: Commander, Registry, Airflow deployments.
  • Unified Mode: All components in a single cluster.

Control plane components

Houston API

Houston is the GraphQL API that manages platform operations.

1houston:
2 replicas: 2
3 resources:
4 requests:
5 cpu: "500m"
6 memory: "1Gi"
7 limits:
8 cpu: "1000m"
9 memory: "2Gi"

Scaling recommendations

Platform SizeReplicasCPU RequestMemory Request
Small (< 10 deployments)2250m512Mi
Medium (10-50 deployments)2500m1Gi
Large (50+ deployments)31000m2Gi

Houston worker

Background job processor for asynchronous operations. The worker uses the same houston.resources as the Houston API.

1houston:
2 resources:
3 requests:
4 cpu: "500m"
5 memory: "1Gi"
6 limits:
7 cpu: "1000m"
8 memory: "2Gi"
9 worker:
10 replicas: 2

Astro UI

Web interface for platform management.

1astroUI:
2 replicas: 2
3 resources:
4 requests:
5 cpu: "100m"
6 memory: "256Mi"
7 limits:
8 cpu: "500m"
9 memory: "512Mi"

Data plane components

Commander

Manages Airflow deployment provisioning and Kubernetes operations.

1commander:
2 replicas: 2
3 resources:
4 requests:
5 cpu: "250m"
6 memory: "512Mi"
7 limits:
8 cpu: "500m"
9 memory: "1Gi"

Registry

Docker image registry for Airflow deployments.

1registry:
2 replicas: 1
3 resources:
4 requests:
5 cpu: "100m"
6 memory: "256Mi"
7 limits:
8 cpu: "500m"
9 memory: "512Mi"
10 persistence:
11 enabled: true
12 size: 100Gi

Storage backend options

  • Local PersistentVolume (default)
  • Google Cloud Storage (GCS)
  • Azure Blob Storage
  • Amazon S3

Ingress and networking

NGINX ingress controller

1nginx:
2 replicas: 2
3 resources:
4 requests:
5 cpu: "500m"
6 memory: "1Gi"
7 limits:
8 cpu: "1000m"
9 memory: "2Gi"
10 serviceType: LoadBalancer

Database

PostgreSQL

Houston metadata database.

1postgresql:
2 resources:
3 requests:
4 cpu: "250m"
5 memory: "256Mi"
6 limits:
7 cpu: "1000m"
8 memory: "1Gi"
9 persistence:
10 enabled: true
11 size: 8Gi

Production configuration

1postgresql:
2 replication:
3 enabled: true
4 slaveReplicas: 2
5 synchronousCommit: "on"

Resource sizing examples

Development environment

1houston:
2 replicas: 1
3 resources:
4 requests:
5 cpu: "100m"
6 memory: "256Mi"
7 limits:
8 cpu: "500m"
9 memory: "1Gi"
10
11astroUI:
12 replicas: 1
13 resources:
14 requests:
15 cpu: "50m"
16 memory: "128Mi"
17
18commander:
19 replicas: 1
20 resources:
21 requests:
22 cpu: "100m"
23 memory: "256Mi"
24
25nginx:
26 replicas: 1
27 resources:
28 requests:
29 cpu: "100m"
30 memory: "256Mi"

Production environment

1houston:
2 replicas: 3
3 resources:
4 requests:
5 cpu: "1000m"
6 memory: "2Gi"
7 limits:
8 cpu: "2000m"
9 memory: "4Gi"
10
11astroUI:
12 replicas: 2
13 resources:
14 requests:
15 cpu: "250m"
16 memory: "512Mi"
17 limits:
18 cpu: "500m"
19 memory: "1Gi"
20
21commander:
22 replicas: 2
23 resources:
24 requests:
25 cpu: "500m"
26 memory: "1Gi"
27
28nginx:
29 replicas: 3
30 resources:
31 requests:
32 cpu: "1000m"
33 memory: "2Gi"
34
35postgresql:
36 resources:
37 requests:
38 cpu: "500m"
39 memory: "1Gi"
40 persistence:
41 size: 50Gi

High availability configuration

1houston:
2 replicas: 3
3 podDisruptionBudget:
4 enabled: true
5 maxUnavailable: 1
6
7astroUI:
8 replicas: 3
9 podDisruptionBudget:
10 enabled: true
11 maxUnavailable: 1
12
13commander:
14 replicas: 3
15 podDisruptionBudget:
16 enabled: true
17 maxUnavailable: 1

Node affinity and tolerations

Dedicated platform nodes

1houston:
2 nodeSelector:
3 node-type: platform
4 tolerations:
5 - key: "dedicated"
6 operator: "Equal"
7 value: "platform"
8 effect: "NoSchedule"

Spread across zones

1houston:
2 affinity:
3 podAntiAffinity:
4 preferredDuringSchedulingIgnoredDuringExecution:
5 - weight: 100
6 podAffinityTerm:
7 labelSelector:
8 matchLabels:
9 component: houston
10 topologyKey: topology.kubernetes.io/zone

Monitor resource usage

$# View pod resource usage
$kubectl top pods -n astronomer
$
$# View node resource usage
$kubectl top nodes

Troubleshooting

Out of memory (OOMKilled)

Symptom: Pods restart with OOMKilled status.

Solution: Increase memory limits:

1houston:
2 resources:
3 limits:
4 memory: "4Gi"

CPU throttling

Symptom: Slow response times, high latency.

Solution: Increase CPU limits or add replicas:

1houston:
2 replicas: 3
3 resources:
4 limits:
5 cpu: "2000m"

Pending pods

Symptom: Pods stuck in Pending state.

Solution:

  1. Check node resources: kubectl describe nodes.
  2. Reduce resource requests or add nodes.
  3. Check for taints/tolerations mismatches.

Best practices

  • Set both requests and limits for predictable scheduling.
  • Use Pod Disruption Budgets for high availability.
  • Monitor resource usage before scaling.
  • Size based on workload not just component count.
  • Plan for growth with 20-30% headroom.
  • Use separate node pools for platform components.