Access Airflow database

Each Airflow Deployment in Astro Private Cloud (APC) has its own metadata database. APC supports both PostgreSQL and MySQL as the database backend. You can connect to the database to run queries, troubleshoot issues, or perform maintenance tasks.

Database architecture

Each Deployment has a single PostgreSQL database containing two schemas:

SchemaPurpose
airflowAirflow metadata, including Dags, task instances, connections, and variables
celeryCelery task results when using the Celery Executor

Retrieve the connection string

Each Deployment stores its database connection string in a Kubernetes secret named <release-name>-metadata. Run the following command to retrieve it:

$kubectl get secret -n <deployment-namespace> \
> <release-name>-metadata -o jsonpath='{.data.connection}' | base64 -d

This returns a connection URI in the following format:

postgresql://<username>:<password>@<host>:<port>/<database>?sslmode=prefer

Connect to the database

Use kubectl exec

Run the following command to open an Airflow metadata database shell from the scheduler pod:

$kubectl exec -it -n <deployment-namespace> \
> deployment/<release-name>-scheduler -c scheduler -- \
> airflow db shell

This command uses the Airflow metadata database connection that is already configured in the Deployment.

Connect from your local computer

If the database is accessible as a Kubernetes service, you can use kubectl port-forward to connect from your local computer. Identify the database service and namespace from the <host> field in the connection string, then forward the port:

$kubectl port-forward -n <database-namespace> \
> svc/<database-service> <local-port>:<database-port>

If PgBouncer is enabled, forward port 6543 from the PgBouncer service in the Deployment namespace instead.

Then connect using the credentials from the connection string:

$psql -h localhost -p <local-port> -U <username> -d <database>

If the database is an external service (for example, Amazon RDS or Google Cloud SQL), connect directly using the host and port from the connection string instead of port-forwarding.

Common queries

After you connect to the database, use the following queries to inspect Airflow metadata.

Dag information

1SELECT dag_id, is_active, is_paused, last_parsed_time
2FROM dag ORDER BY dag_id;
3
4SELECT dag_id, run_id, state, start_date
5FROM dag_run ORDER BY start_date DESC LIMIT 20;

Task instance status

1SELECT dag_id, task_id, state, start_date, try_number
2FROM task_instance
3ORDER BY start_date DESC LIMIT 50;

Task failures

1SELECT dag_id, task_id, state, start_date
2FROM task_instance
3WHERE state = 'failed'
4 AND start_date > NOW() - INTERVAL '24 hours';

External database configuration

To use an external database instead of the APC-managed database, create the Deployment using the upsertDeployment Houston API mutation with the following fields:

  • skipAirflowDatabaseProvisioning: Set to true so Commander does not provision a database for this Deployment.
  • metadataConnection or metadataConnectionJson: The connection string or JSON object pointing to your external database.
  • resultBackendConnection or resultBackendConnectionJson: The connection string or JSON object for the Celery result backend.

For a complete example of the upsertDeployment mutation payload with external database configuration, see Bring your own Airflow database.

Back up and restore

For production environments, run backup and restore commands from your local computer or a dedicated admin computer. Do not run backup or restore commands from inside Airflow containers.

Before you run a backup, estimate the database size and confirm that your local computer has enough free disk space.

1SELECT
2 schemaname,
3 pg_size_pretty(SUM(pg_total_relation_size(format('%I.%I', schemaname, tablename)::regclass))) AS total_size
4FROM pg_tables
5GROUP BY schemaname
6ORDER BY SUM(pg_total_relation_size(format('%I.%I', schemaname, tablename)::regclass)) DESC;

Back up the database

$pg_dump -h <host> -p <port> -U <username> -d <database> -n airflow > backup.sql

Restore the database

$psql -h <host> -p <port> -U <username> -d <database> < backup.sql

Replace the placeholders with values from the connection string. See Retrieve the connection string.

Security best practices

Direct database access bypasses Airflow’s security model. Use with caution and only for troubleshooting or maintenance tasks.

  • Use read-only access for monitoring.
  • Never expose database ports publicly.
  • Use SSL for all connections.
  • Rotate credentials regularly.