Airflow system components

A Deployment in Astro Private Cloud (APC) consists of multiple components that work together to orchestrate and execute your data pipelines. Each component has a specific role and configuration options.

Core components

Scheduler

The scheduler is the heart of Airflow. It monitors all Dags and tasks, triggers task instances when dependencies are complete, and submits tasks to the executor for running.

Default configuration:

1scheduler:
2 enabled: true
3 replicas: 1
4 terminationGracePeriodSeconds: 10
5 livenessProbe:
6 initialDelaySeconds: 10
7 timeoutSeconds: 20
8 failureThreshold: 5
9 periodSeconds: 60

Key responsibilities:

  • Schedule tasks based on dependencies and triggers.
  • Monitor task states and handle retries.
  • Manage pools and task queues.
  • Parse Dag files and create Dag runs. When the Dag processor is enabled, Dag parsing moves to the Dag processor and the scheduler handles only scheduling.

Webserver

The webserver provides the Airflow UI for monitoring Dags, viewing logs, triggering runs, and managing configurations.

Default configuration:

1webserver:
2 enabled: true
3 replicas: 1
4 terminationGracePeriodSeconds: 30
5 allowPodLogReading: true
6 livenessProbe:
7 initialDelaySeconds: 15
8 timeoutSeconds: 5
9 failureThreshold: 5
10 periodSeconds: 10

Key features:

  • Dag visualization and monitoring.
  • Task log viewing.
  • Variable and connection management.
  • User authentication and authorization.

Workers

Workers execute tasks. Workers run as a persistent deployment only when using Celery Executor. When using Kubernetes executor, Airflow launches ephemeral task pods instead.

Default configuration (Celery Executor):

1workers:
2 enabled: true
3 replicas: 1
4 terminationGracePeriodSeconds: 600

Triggerer

The triggerer handles deferrable operators, allowing tasks to release worker slots while waiting for external events.

Default configuration:

1triggerer:
2 enabled: true
3 replicas: 1
4 terminationGracePeriodSeconds: 60
5 livenessProbe:
6 initialDelaySeconds: 10
7 timeoutSeconds: 20
8 failureThreshold: 5
9 periodSeconds: 60

Dag processor

The Dag processor parses Dag files and updates the metadata database with Dag definitions. Available in Airflow 2.3+ and mandatory in Airflow 3+.

Default configuration:

1dagProcessor:
2 enabled: ~ # Auto-enabled for Airflow 3+
3 replicas: 1
4 terminationGracePeriodSeconds: 60
5 waitForMigrations:
6 enabled: true

In Airflow 3, the Dag processor is automatically enabled and required for Dag discovery.

API server (Airflow 3+)

The API server is a new component in Airflow 3 that provides the REST API, separated from the webserver for better scalability.

Default configuration:

1apiServer:
2 enabled: true
3 allowPodLogReading: true

The platform manages the number of API server replicas.

Supporting components

Redis

Message broker for Celery Executor. Handles task queue communication between scheduler and workers.

1redis:
2 enabled: true
3 persistence:
4 enabled: true
5 size: 1Gi

StatsD exporter

Collects and exports Airflow metrics for monitoring systems like Prometheus.

1statsd:
2 enabled: true
3 terminationGracePeriodSeconds: 30

PgBouncer (optional)

Connection pooler that sits between Airflow components and the metadata database. Reduces the number of direct database connections opened by the scheduler, webserver, and workers.

PgBouncer is only enabled when the cluster uses PostgreSQL and pgbouncer.enabled is set to true in your platform configuration. It is disabled when the cluster uses MySQL.

1pgbouncer:
2 enabled: true # depends on cluster database type and platform config

Flower

Web UI for monitoring Celery workers. Only active when using Celery Executor.

1flower:
2 enabled: true

Airflow 2 vs Airflow 3 components

ComponentAirflow 2Airflow 3
SchedulerRequiredRequired
WebserverRequired (includes API)Required (UI only)
API serverN/ARequired
Dag processorOptional (2.3+)Required
WorkersCelery Executor onlyCelery Executor only
TriggererOptional (2.2+)Optional
RedisCelery Executor onlyCelery Executor only
FlowerCelery Executor onlyCelery Executor only
StatsD exporterRequiredRequired
PgBouncerOptional (PostgreSQL only)Optional (PostgreSQL only)

Resource recommendations

Small workloads (< 50 Dags)

1scheduler:
2 resources:
3 requests:
4 cpu: "500m"
5 memory: "1Gi"
6 limits:
7 cpu: "500m"
8 memory: "1Gi"
9
10webserver:
11 resources:
12 requests:
13 cpu: "500m"
14 memory: "1920Mi"
15 limits:
16 cpu: "500m"
17 memory: "1920Mi"

Medium workloads (50-200 Dags)

1scheduler:
2 resources:
3 requests:
4 cpu: "1000m"
5 memory: "2Gi"
6 limits:
7 cpu: "1000m"
8 memory: "2Gi"
9
10dagProcessor:
11 resources:
12 requests:
13 cpu: "500m"
14 memory: "1Gi"
15 limits:
16 cpu: "500m"
17 memory: "1Gi"

Large workloads (200+ Dags)

1scheduler:
2 replicas: 2
3 resources:
4 requests:
5 cpu: "2000m"
6 memory: "4Gi"
7 limits:
8 cpu: "2000m"
9 memory: "4Gi"
10
11dagProcessor:
12 replicas: 2
13 resources:
14 requests:
15 cpu: "1000m"
16 memory: "2Gi"
17 limits:
18 cpu: "1000m"
19 memory: "2Gi"

Scaling components

Horizontal scaling

Components that support multiple replicas. Default limits apply unless your platform administrator overrides them in the platform configuration.

  • Scheduler: Up to 4 replicas by default.
  • API server: Up to 4 replicas by default.
  • Dag processor: Up to 3 replicas by default.
  • Workers: Up to 10 replicas by default.
  • Triggerer: Up to 2 replicas by default.

Vertical scaling

Increase resources for:

  • Scheduler: Complex dependencies or high task volume. When the Dag processor is enabled, the scheduler focuses on scheduling only.
  • Dag processor: Large number of Dag files or complex parsing requirements.
  • Workers: Memory-intensive tasks.