Configure API server autoscaling

Preview

This feature is in Preview.

Airflow 3

This feature is only available for Airflow 3.x Deployments.

The Airflow API server serves the Airflow UI, the Airflow REST API, and the task execution API that Astro and Celery workers use to fetch and report on tasks. By default, every Airflow 3 Deployment runs with two API server replicas. For workloads with high task concurrency, large Dags, or spikes in UI and REST API traffic, you can enable horizontal autoscaling so that Astro adds API server replicas when load is high and removes them when load decreases.

Use API server autoscaling to:

Sustain workloads with thousands of concurrent tasks without saturating a fixed pair of API server replicas.
Support smooth Airflow 2 to Airflow 3 upgrades for environments that pushed the previous webserver to its limits.
Cap your spend by setting an explicit maximum replica count.

How autoscaling works

When you enable autoscaling, Astro provisions a Kubernetes Horizontal Pod Autoscaler (HPA) that tracks CPU utilization across the running API server replicas. The HPA adds or removes replicas to keep the average CPU utilization at 80 percent of the per-replica CPU request.

When average CPU utilization is consistently above the 80 percent target, Astro adds replicas, up to your configured API Server Max Replicas value.
When average CPU utilization is below the target, Astro removes replicas, down to the minimum of two.
The minimum replica count is fixed at two. This preserves API availability during a Pod restart and is the same as the default replica count when autoscaling is disabled.
The maximum replica count is configurable from 2 to 10 in the Astro UI. Setting API Server Max Replicas to 2 runs a fixed two-replica configuration and effectively prevents scaling.

The HPA evaluates utilization continuously, and replica changes typically take effect within one to two minutes. The same high-availability and anti-affinity rules that apply to other Deployment components apply to API server replicas. See Enable high availability.

Expected replica counts

The following table shows the approximate API server replica counts you can expect at different levels of task concurrency. Actual counts can vary based on Airflow UI and REST API traffic, Dag complexity, and other workload characteristics.

Concurrent tasks	API server replicas
~500	3
~1,000	4
~2,000	6
~3,000	7

Prerequisites

API server autoscaling is supported on Airflow 3 Deployments running on Astro:

Standard or dedicated clusters.
Remote Execution Deployments.

API server autoscaling works with all executors: Astro, Celery, and Kubernetes.

API server autoscaling is not available for Airflow 2 Deployments. Airflow 2 uses the Airflow webserver, which doesn’t support horizontal autoscaling on Astro.

A new Astro UI is here

Astronomer has redesigned the Astro UI. Try the new experience and switch your instructions using the New Astro UI and Legacy UI tabs on this page. Your selection is remembered across the docs.

Enable or disable API server autoscaling

New Astro UI

Legacy UI

Open the Deployment

In the Astro UI, click Deployments, then select a Deployment.

Edit the Deployment

Click the Deployment’s More actions menu (⋯), then select Edit Deployment.

Toggle API server autoscaling

In the Advanced section, set the API Server Autoscaling toggle to on or off.

When enabling autoscaling, the API Server Max Replicas list becomes editable. Select a value from 2 to 10. The default is 10, which gives Astro the largest possible scaling headroom. Choose a lower value to cap your replica count at a known maximum.

When disabling autoscaling, Astro preserves the previous API Server Max Replicas value, so you can re-enable autoscaling later with the same configuration.

Save the configuration

Click Update Deployment. When you enable autoscaling, Astro applies the new HPA without restarting the running API server replicas. When you disable autoscaling, Astro removes the HPA and the Deployment runs with two API server replicas.

After you enable autoscaling, the Deployment Details page shows the configured range under API Server Autoscaling, for example Enabled (2-10 replicas).

Billing

API server replicas are sized the same as an A5 worker. Each Deployment includes two API server replicas at no additional cost. When autoscaling adds replicas above the included two, Astro charges for the additional replicas based on their uptime duration at the A5 unit rate, similar to KE/KPO chargeback. The charges appear on your invoice under Runtime Compute as API Server.

If autoscaling is enabled but the Deployment never exceeds two replicas, no additional API server charges apply.

Limitations

The minimum replica count is fixed at two. You can’t scale the API server below two replicas.
The maximum replica count is capped at ten in the Astro UI. If your workload requires more than ten replicas, Astronomer recommends moving some workloads to another Deployment.