For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
      • AstroFully-managed data operations, powered by Apache Airflow.
      • Astro Private CloudRun Airflow-as-a-service in your environment.
      • Professional ServicesExpert Airflow services for your enterprise's success.
    • Tools
      • Cosmos
      • Orbiter
      • CLI
      • AI SDK
      • Agents
      • Blueprint
      • UpdatesThe State of Airflow 2026See the insights from over 5,800 data practitioners in the full report. Download Now ➔
  • Customers
  • Docs
    • Insights
      • Blog
      • Webinars
      • Resource Library
      • Events
    • Education
      • Academy
      • What is Airflow?
  • Pricing
Get Started Free
    • Overview
      • Create a Deployment
        • Overview
        • Deployment details
        • Deployment resources
        • API server autoscaling
      • Execution mode
      • Worker queues
      • Environment variables
      • Secrets backend
    • Book Office Hours

Product

  • Platform Overview
  • Astro
  • Astro Observe
  • Astro Private Cloud
  • Security & Trust
  • Pricing

Tools & Services

  • Cosmos
  • Docs
  • Professional Services
  • Product Updates

Use Cases

  • AI Ops
  • Data Observability
  • ETL/ELT
  • ML Ops
  • Operational Analytics
  • All Use Cases

Industries

  • Financial Services
  • Gaming
  • Retail
  • Manufacturing
  • Healthcare
  • All Industries

Resources

  • Academy
  • eBooks & Guides
  • Blog
  • Webinars
  • Events
  • The Data Flowcast Podcast
  • All Resources

Airflow

  • What is Airflow
  • Airflow on Astro
  • Airflow 3.0
  • Airflow Upgrades
  • Airflow Use Cases
  • Airflow 2.x End of Life

Company

  • Our Story
  • Customers
  • Newsroom
  • Careers
  • Contact

Support

  • Knowledge Base
  • Status
  • Contact Support
GitHubYouTubeLinkedInx
  • Legal
  • Privacy
  • Terms of Service
  • Consent Preferences

  • Do Not Sell or Share My Personal information
  • Limit the Use Of My Sensitive Personal Information

Apache Airflow®, Airflow, and the Airflow logo are trademarks of the Apache Software Foundation. Copyright © Astronomer 2026. All rights reserved.

LogoLogo
On this page
  • How autoscaling works
  • Expected replica counts
  • Prerequisites
  • Enable or disable API server autoscaling
  • Billing
  • Limitations
Manage DeploymentsDeployment settings

Configure API server autoscaling

Edit this page
Built with
Preview
This feature is in Preview.
Airflow 3
This feature is only available for Airflow 3.x Deployments.

The Airflow API server serves the Airflow UI, the Airflow REST API, and the task execution API that Astro and Celery workers use to fetch and report on tasks. By default, every Airflow 3 Deployment runs with two API server replicas. For workloads with high task concurrency, large Dags, or spikes in UI and REST API traffic, you can enable horizontal autoscaling so that Astro adds API server replicas when load is high and removes them when load decreases.

Use API server autoscaling to:

  • Sustain workloads with thousands of concurrent tasks without saturating a fixed pair of API server replicas.
  • Support smooth Airflow 2 to Airflow 3 upgrades for environments that pushed the previous webserver to its limits.
  • Cap your spend by setting an explicit maximum replica count.

How autoscaling works

When you enable autoscaling, Astro provisions a Kubernetes Horizontal Pod Autoscaler (HPA) that tracks CPU utilization across the running API server replicas. The HPA adds or removes replicas to keep the average CPU utilization at 80 percent of the per-replica CPU request.

  • When average CPU utilization is consistently above the 80 percent target, Astro adds replicas, up to your configured API Server Max Replicas value.
  • When average CPU utilization is below the target, Astro removes replicas, down to the minimum of two.
  • The minimum replica count is fixed at two. This preserves API availability during a Pod restart and is the same as the default replica count when autoscaling is disabled.
  • The maximum replica count is configurable from 2 to 10 in the Astro UI. Setting API Server Max Replicas to 2 runs a fixed two-replica configuration and effectively prevents scaling.

The HPA evaluates utilization continuously, and replica changes typically take effect within one to two minutes. The same high-availability and anti-affinity rules that apply to other Deployment components apply to API server replicas. See Enable high availability.

Expected replica counts

The following table shows the approximate API server replica counts you can expect at different levels of task concurrency. Actual counts can vary based on Airflow UI and REST API traffic, Dag complexity, and other workload characteristics.

Concurrent tasksAPI server replicas
~5003
~1,0004
~2,0006
~3,0007

Prerequisites

API server autoscaling is supported on Airflow 3 Deployments running on Astro:

  • Standard or dedicated clusters.
  • Remote Execution Deployments.

API server autoscaling works with all executors: Astro, Celery, and Kubernetes.

API server autoscaling is not available for Airflow 2 Deployments. Airflow 2 uses the Airflow webserver, which doesn’t support horizontal autoscaling on Astro.

Enable or disable API server autoscaling

1

Open the Deployment

In the Astro UI, select a Workspace, click Deployments, and then select a Deployment.

2

Edit the Deployment

Click the Options menu of the Deployment, and then select Edit Deployment.

3

Toggle API server autoscaling

In the Advanced section, set the API Server Autoscaling toggle to on or off.

When enabling autoscaling, the API Server Max Replicas list becomes editable. Select a value from 2 to 10. The default is 10, which gives Astro the largest possible scaling headroom. Choose a lower value to cap your replica count at a known maximum.

When disabling autoscaling, Astro preserves the previous API Server Max Replicas value, so you can re-enable autoscaling later with the same configuration.

4

Save the configuration

Click Update Deployment. When you enable autoscaling, Astro applies the new HPA without restarting the running API server replicas. When you disable autoscaling, Astro removes the HPA and the Deployment runs with two API server replicas.

After you enable autoscaling, the Deployment Details page shows the configured range under API Server Autoscaling, for example Enabled (2-10 replicas).

Billing

API server replicas are sized the same as an A5 worker. Each Deployment includes two API server replicas at no additional cost. When autoscaling adds replicas above the included two, Astro charges for the additional replicas based on their uptime duration at the A5 unit rate, similar to KE/KPO chargeback. The charges appear on your invoice under Runtime Compute as API Server.

If autoscaling is enabled but the Deployment never exceeds two replicas, no additional API server charges apply.

Limitations

  • The minimum replica count is fixed at two. You can’t scale the API server below two replicas.
  • The maximum replica count is capped at ten in the Astro UI. If your workload requires more than ten replicas, Astronomer recommends moving some workloads to another Deployment.