Documentation

Query the Airflow Database


Overview

At Astronomer, each Airflow deployment is eqiupped with a PostgreSQL database that serves as Airflow's underlying metadata database and your Airflow scheduler's source of truth.

Astronomer hosts that database for Astronomer Cloud customers (each securely isolated from the rest) and our team helps Enterprise customers get their own set up during our platform installation.

Access Your Airflow Deployment's Postgres

To successfully leverage Airflow's "Ad Hoc Query" feature (acessible from the Airflow UI), you'll need to confirm access to the database in the first place.

Local Deployment

When developing locally, you'll need to set up your local Postgres connection.

1. Add libstdc++ to packages.txt

Airflow uses Pandas to power the Ad Hoc Query engine, which requires this additional dependency.

libstdc++ is a top-level Alpine-level package that belongs in packages.txt, a file automatically generated upon initializing an Airflow project with Astronomer's CLI.

2. Find the airflow_db Connection Object

On the Airflow UI, navigate over to Admin > Connections.

The airflow_db connection is generated by default.

Connections Page

3. Edit the Connection

In the airflow_db connection object:

  • Change the Conn Type to Postgres

  • Add the following connection information:

    ```
    host: postgres
    schema: postgres
    login: postgres
    password: postgres
    port (optional): 5432
    ```

    Your connection should look something like:

Local Connections Details Page

The port will be set to 5342 by default but if you’ve set a custom port, you’ll need to update that here as well.

4. Connect to the DB via a PostgreSQL Client

With the connection information above, you should be able to connect to the Airflow database from any PostgreSQL client.

A Remote Deployment

To set up a connection for a remote deployment on Astronomer Cloud or Enterprise, start with Steps 1 + 2 above.

3. Edit the Connection

The same way you followed the steps above for a local deployment, add the following connection details to the airflow_db connection object:

host: {release-name}-pgbouncer
schema: {release-name}-metadata
login: {release_name}_airflow
port: 6543

For example, if your deployment name was quasaric-sun-9051, the connection information would be the following:

Remote Connections Details Page

4. Pull your Postgres Password

Astronomer Cloud: If you're a Cloud customer, reach out to us at support@astronomer.io for the Password to your Deployment's Database.

Astronomer Enterprise: Instructions here. Your deployment's Postgres credentials are stored as a Kubernetes Secret in your deployment's Namespace, which you can access via kubectl.

5. Run an Ad Hoc Query to Confirm Access

To test your connection, try to query your deployment's metadata database via Airflow's Ad Hoc Query page.

  • Navigate to Data Profiling > Ad Hoc Query on the Airflow UI
  • Select airflow_db as the database from the menu
  • Run a sample query

Ad Hoc Query Page

This particular query will render total completed task count for your deployment. For a list of handy queries to reference, check out Useful SQL queries for Apache Airflow.