For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
      • AstroFully-managed data operations, powered by Apache Airflow.
      • Astro Private CloudRun Airflow-as-a-service in your environment.
      • Professional ServicesExpert Airflow services for your enterprise's success.
    • Tools
      • Cosmos
      • Orbiter
      • CLI
      • AI SDK
      • Agents
      • Blueprint
      • UpdatesThe State of Airflow 2026See the insights from over 5,800 data practitioners in the full report. Download Now ➔
  • Customers
  • Docs
    • Insights
      • Blog
      • Webinars
      • Resource Library
      • Events
    • Education
      • Academy
      • What is Airflow?
  • Pricing
Get Started Free
    • Astro Private Cloud overview
    • Astro Private Cloud features
        • Database architecture
        • Database connection behavior reference
        • Access the Airflow database
        • Bring your own Airflow database
        • Backfill permissions

Product

  • Platform Overview
  • Astro
  • Astro Observe
  • Astro Private Cloud
  • Security & Trust
  • Pricing

Tools & Services

  • Cosmos
  • Docs
  • Professional Services
  • Product Updates

Use Cases

  • AI Ops
  • Data Observability
  • ETL/ELT
  • ML Ops
  • Operational Analytics
  • All Use Cases

Industries

  • Financial Services
  • Gaming
  • Retail
  • Manufacturing
  • Healthcare
  • All Industries

Resources

  • Academy
  • eBooks & Guides
  • Blog
  • Webinars
  • Events
  • The Data Flowcast Podcast
  • All Resources

Airflow

  • What is Airflow
  • Airflow on Astro
  • Airflow 3.0
  • Airflow Upgrades
  • Airflow Use Cases
  • Airflow 2.x End of Life

Company

  • Our Story
  • Customers
  • Newsroom
  • Careers
  • Contact

Support

  • Knowledge Base
  • Status
  • Contact Support
GitHubYouTubeLinkedInx
  • Legal
  • Privacy
  • Terms of Service
  • Consent Preferences

  • Do Not Sell or Share My Personal information
  • Limit the Use Of My Sensitive Personal Information

Apache Airflow®, Airflow, and the Airflow logo are trademarks of the Apache Software Foundation. Copyright © Astronomer 2026. All rights reserved.

LogoLogo
On this page
  • Database components
  • APC database
  • Airflow metadata database
  • Database provisioning
  • Automatic provisioning
  • External database
  • Connection pooling (PgBouncer)
  • Configuration
  • Pool sizes
  • Kerberos support
  • High availability
  • PostgreSQL replication
  • Default and recommended for production
  • PostgreSQL
  • PgBouncer (platform level)
  • PgBouncer (per Deployment)
  • Backup and recovery
  • Connection strings
  • APC database
  • Airflow database
  • Monitoring
  • Best practices
Airflow administrationWork with the Airflow database

Database architecture

Built with

Astro Private Cloud (APC) uses PostgreSQL as the primary database system for both platform management (the APC API) and Apache Airflow deployments. The following sections describe the database architecture, provisioning, connection pooling, and high availability configuration.

Database components

APC database

The APC API uses a PostgreSQL database to store platform configuration and state.

Key tables:

  • User - System users with authentication credentials
  • Workspace - Organizational units for grouping deployments
  • Deployment - Airflow deployment configurations
  • Cluster - Data plane cluster properties including cloud provider, region, and configuration
  • RoleBinding - User and service account role assignments
  • ServiceAccount - API keys for programmatic access
  • DeployRevision - Deployment version history
  • TaskUsage - Task execution metrics

Airflow metadata database

Each Airflow Deployment gets its own isolated PostgreSQL database containing two schemas:

SchemaPurpose
airflowAirflow scheduler and webserver metadata
celeryCelery task results backend

The database name is derived from the Deployment release name:

deployment-name → deployment_name_airflow

Each schema has a separate database user with permissions scoped to that schema. the deployment orchestrator provisions the database and both schemas automatically when you create a Deployment. For more information about what Airflow stores in the metadata database, see Understanding the Airflow metadata database.

Database provisioning

Automatic provisioning

By default, the deployment orchestrator automatically provisions databases for new Deployments. No additional configuration is required.

External database

To use pre-existing or managed databases, set skipAirflowDatabaseProvisioning to true in the upsertDeployment mutation:

1mutation {
2 upsertDeployment(
3 workspaceId: "<workspace-id>"
4 label: "<my-deployment-label>"
5 skipAirflowDatabaseProvisioning: true
6 ) {
7 id
8 }
9}

When using external databases, provide the connection string in your Deployment configuration. For complete setup steps with connection string examples, see Bring your own Airflow database. For supported PostgreSQL versions, see the Apache Airflow database backend documentation.

Connection pooling (PgBouncer)

APC uses PgBouncer for connection pooling to reduce database connection overhead. Airflow can open many database connections due to its distributed nature, and each PostgreSQL connection creates a dedicated OS process. PgBouncer reduces this overhead by maintaining a pool of reusable server connections. For more detail on why PgBouncer is recommended, see the Apache Airflow Helm chart production guide.

PgBouncer is only used with the embedded PostgreSQL database. When you provide a metadataConnectionString URI through the upsertDeployment mutation, Airflow connects directly to the external database and bypasses PgBouncer. If you use an external database and need connection pooling, configure it through your managed database service or deploy a separate PgBouncer instance.

PgBouncer operates in transaction pool mode by default, which means server connections are returned to the pool after each transaction completes. This mode doesn’t support session-level features such as prepared statements or SET commands that persist across transactions.

Configuration

1pgbouncer:
2 enabled: true
3 port: 6543
4 resources:
5 requests:
6 memory: 256Mi
7 cpu: 250m
8 limits:
9 memory: 256Mi
10 cpu: 250m

Pool sizes

Pool sizes are configured per Deployment in the Airflow chart values:

1pgbouncer:
2 metadataPoolSize: 10 # Airflow metadata connections per pool
3 resultBackendPoolSize: 5 # Celery result backend connections per pool
4 maxClientConn: 100 # Maximum client connections to PgBouncer

These defaults match the Apache Airflow Helm chart defaults and work for most Deployments. To determine whether you need to adjust pool sizes, monitor PgBouncer’s cl_waiting metric. If clients consistently wait for connections, increase pool sizes or scale PgBouncer replicas.

Airflow also maintains its own SQLAlchemy connection pool (default pool_size: 5, max_overflow: 10) between each Airflow component and PgBouncer. With these defaults, each Airflow process can open up to 15 simultaneous database connections. The full connection chain is: Airflow component → SQLAlchemy pool → PgBouncer → PostgreSQL.

Kerberos support

PgBouncer in APC includes Kerberos support:

1pgbouncer:
2 image:
3 repository: quay.io/astronomer/ap-pgbouncer-krb
4 tag: 1.25.0-3

For complete Kerberos database setup instructions, see Configure Kerberos authentication for Airflow databases.

High availability

PostgreSQL replication

Astronomer doesn’t recommend using internal PostgreSQL instance. If you need production database resiliency, use an externally managed database service such as Amazon RDS, Google Cloud SQL, or Azure Database for PostgreSQL. Managed services provide built-in replication, automated failover, and backup capabilities that are more reliable than running replication within the Helm chart.

Replication is disabled by default. The following example shows a recommended production configuration that enables streaming replication with synchronous commit:

1postgresql:
2 replication:
3 enabled: true
4 slaveReplicas: 2
5 synchronousCommit: "on"
6 numSynchronousReplicas: 1
7 user: repl_user
8
9### Persistence
10
11```yaml
12postgresql:
13 persistence:
14 enabled: true
15 size: 8Gi
16 storageClass: "" # Use default storage class

Default and recommended for production

The following table compares the chart default values with starting points that Astronomer recommends for production environments. The Default column reflects the values shipped in the Helm chart. The Recommended for production column reflects general guidance based on Astronomer operational experience. Adjust these values based on the number of Deployments, workload scale, and observed resource utilization in your environment.

For additional production guidance, see the Apache Airflow Helm chart production guide. Astronomer also recommends using a managed database service such as Amazon RDS, Google Cloud SQL, or Azure Database for PostgreSQL rather than the embedded PostgreSQL container for production workloads.

PostgreSQL

SettingDefaultRecommended for production
postgresql.resources.requests.cpu250m1000m. Adjust based on query load
postgresql.resources.requests.memory256Mi2Gi. Adjust based on active connection count
postgresql.resources.limits.cpu1000m2000m. Adjust based on query load
postgresql.resources.limits.memory2Gi4Gi. Adjust based on active connection count
postgresql.persistence.enabledtruetrue
postgresql.persistence.size8Gi50Gi or more. Scale based on the number of Deployments and retention policy
postgresql.replication.enabledfalsetrue
postgresql.replication.slaveReplicas12
postgresql.replication.synchronousCommit"off""on"
postgresql.replication.numSynchronousReplicas01

PgBouncer (platform level)

SettingDefaultRecommended for production
pgbouncer.resources.requests.cpu250m250m
pgbouncer.resources.requests.memory256Mi256Mi
pgbouncer.resources.limits.cpu250m500m. Adjust based on connection throughput
pgbouncer.resources.limits.memory256Mi512Mi. Adjust based on connection count

PgBouncer (per Deployment)

These values match the Apache Airflow Helm chart defaults. No official formula ties pool sizes to a specific workload metric such as Dag count. Instead, adjust pool sizes based on observed connection utilization.

SettingDefaultRecommended for production
pgbouncer.metadataPoolSize1010. Increase if cl_waiting is consistently non-zero
pgbouncer.resultBackendPoolSize55. Increase for Deployments with high Celery task throughput
pgbouncer.maxClientConn100100. Increase if Deployments run many concurrent Airflow components

Backup and recovery

For database backup and restore procedures, including size estimation, pg_dump/mysqldump commands, and restore steps, see Access Airflow database.

Connection strings

APC database

The APC database connection depends on whether you use an external database or the in-cluster PostgreSQL. You configure this during control plane installation.

For external databases, provide the connection through houston.backendConnection in your Helm values:

1houston:
2 backendConnection:
3 user: <username>
4 pass: <password>
5 host: <external-database-host>
6 port: 5432
7 db: <database-name>

Alternatively, provide a pre-existing Kubernetes secret name through houston.backendSecretName. The secret must contain a connection key with the full connection URI.

To retrieve the active APC database connection string, read the astronomer-bootstrap secret:

$kubectl get secret -n astronomer astronomer-bootstrap \
> -o jsonpath='{.data.connection}' | base64 -d

The in-cluster PostgreSQL option (global.postgresql.enabled: true) is only for development or proof-of-concept environments and isn’t supported in production.

Airflow database

Both the metadata and result backend connections use the same database with different schemas and credentials:

# Metadata connection (airflow schema)
postgresql://<metadata-user>:<password>@<host>:5432/<deployment>_airflow
# Result backend connection (celery schema)
postgresql://<celery-user>:<password>@<host>:5432/<deployment>_airflow

When PgBouncer is enabled, Airflow components connect through PgBouncer on port 6543 instead of directly to PostgreSQL on port 5432.

To retrieve the connection string for a specific Deployment, see Access Airflow database.

Monitoring

Monitor the following metrics to track database and connection pool health:

MetricDescription
pg_stat_activityActive database connections
pg_database_sizeDatabase size on disk
pgbouncer_pools_*PgBouncer connection pool statistics, including cl_waiting

Best practices

  • Enable replication for production Deployments.
  • Use PgBouncer to manage connection overhead.
  • Monitor connection pools for cl_waiting and database size growth.
  • Configure regular backups. See Access Airflow database.
  • Size pools based on observed connection utilization, not Dag count.
  • Use separate credentials for each Deployment.
  • Enable SSL for database connections in production.
  • Use a managed database service for production workloads instead of the in-cluster PostgreSQL container.