The Airflow API server serves the Airflow UI, the Airflow REST API, and the task execution API that Astro and Celery workers use to fetch and report on tasks. By default, every Airflow 3 Deployment runs with two API server replicas. For workloads with high task concurrency, large Dags, or spikes in UI and REST API traffic, you can enable horizontal autoscaling so that Astro adds API server replicas when load is high and removes them when load decreases.
Use API server autoscaling to:
When you enable autoscaling, Astro provisions a Kubernetes Horizontal Pod Autoscaler (HPA) that tracks CPU utilization across the running API server replicas. The HPA adds or removes replicas to keep the average CPU utilization at 80 percent of the per-replica CPU request.
2 to 10 in the Astro UI. Setting API Server Max Replicas to 2 runs a fixed two-replica configuration and effectively prevents scaling.The HPA evaluates utilization continuously, and replica changes typically take effect within one to two minutes. The same high-availability and anti-affinity rules that apply to other Deployment components apply to API server replicas. See Enable high availability.
The following table shows the approximate API server replica counts you can expect at different levels of task concurrency. Actual counts can vary based on Airflow UI and REST API traffic, Dag complexity, and other workload characteristics.
API server autoscaling is supported on Airflow 3 Deployments running on Astro:
API server autoscaling works with all executors: Astro, Celery, and Kubernetes.
API server autoscaling is not available for Airflow 2 Deployments. Airflow 2 uses the Airflow webserver, which doesn’t support horizontal autoscaling on Astro.
In the Astro UI, select a Workspace, click Deployments, and then select a Deployment.
In the Advanced section, set the API Server Autoscaling toggle to on or off.
When enabling autoscaling, the API Server Max Replicas list becomes editable. Select a value from 2 to 10. The default is 10, which gives Astro the largest possible scaling headroom. Choose a lower value to cap your replica count at a known maximum.
When disabling autoscaling, Astro preserves the previous API Server Max Replicas value, so you can re-enable autoscaling later with the same configuration.
After you enable autoscaling, the Deployment Details page shows the configured range under API Server Autoscaling, for example Enabled (2-10 replicas).
API server replicas are sized the same as an A5 worker. Each Deployment includes two API server replicas at no additional cost. When autoscaling adds replicas above the included two, Astro charges for the additional replicas based on their uptime duration at the A5 unit rate, similar to KE/KPO chargeback. The charges appear on your invoice under Runtime Compute as API Server.
If autoscaling is enabled but the Deployment never exceeds two replicas, no additional API server charges apply.