Use a MySQL or PostgreSQL database for metadata or storage

You can create Astro Private Cloud Deployments with the Houston API that use pre-created databases, external to the Airflow Deployment, as both a metadata storage and result storage backend.

Prerequisites

  • Workspace Admin user privileges and a Workspace ID
  • (Optional) A MySQL or PostgreSQL database
  • (Optional) An existing Deployment
If you create a new connection to an external database from a Deployment with existing Dag data, you must migrate that historic data to the new database. Information about your historic Deployment activity, such as task instances and Dag runs, won’t be displayed as the database where you stored that information has changed.

Step 1: Enable manual connection strings

In Astro Private Cloud 1.x, manualConnectionStrings.enabled is a deployments.* setting. Cluster configuration is the final layer for deployments.* keys, so the cluster value wins when both values.yaml and a cluster override are set. See Configure Astro Private Cloud for the precedence rules.

Choose one of the following options based on the scope you need:

  • To enable manual connection strings for one data plane cluster, update the cluster’s Configuration Override.
  • To enable manual connection strings as the platform default for clusters that don’t have their own saved value for this key, update values.yaml and run a Helm upgrade.
  1. In the Astro UI, open Clusters, select your data plane cluster, then click Edit on Configuration Override. To automate this with the Houston API, use the updateCluster mutation with deploymentsConfigOverride. See Update data plane cluster configurations.

  2. Add the following to Configuration Override:

    1{
    2 "manualConnectionStrings": {
    3 "enabled": true
    4 }
    5}
  3. Click Update cluster to apply the change. The override is deep-merged with the cluster’s existing configuration.

Option B: Update values.yaml (platform default)

Use this option only when no cluster has saved an override for deployments.manualConnectionStrings.enabled. If a cluster already has a saved value for this key, the cluster value wins and you must update the cluster configuration as in Option A.

  1. Open your values.yaml file.

  2. Add the following under astronomer.houston.config:

    1astronomer:
    2 houston:
    3 config:
    4 deployments:
    5 manualConnectionStrings:
    6 enabled: true
  3. Push the configuration change. See Apply a config change. If the astronomer-houston pods don’t roll automatically after the Helm upgrade, restart them manually so they pick up the new configuration.

Already-installed clusters

The values.yaml settings in this step only take effect during the initial cluster installation. For clusters that are already registered, Houston resolves Deployment configuration directly from the cluster’s database record (Cluster.config.deployments) and ignores the Helm-derived ConfigMap. To enable manual connection strings on an existing cluster, apply the change through System Admin → Clusters → Edit → Cluster Deployment Configuration in Astronomer instead.

Step 2: (Optional) Create your database

Substitute astro-db-name with your own database name, if you need to create a new database.

1CREATE DATABASE astro-db-name;

Step 3: Add a user account to your database for the connection

Substitute astro-user-name and astro-user-password with your information. You can use an existing database for this step.

PostgreSQL usernames must be lowercase.
  1. Create a user with a password for Astro Private Cloud to use to access the database.
1CREATE USER astro-user-name WITH PASSWORD 'astro-user-password';
  1. Grant all privileges on the database to the user.
1GRANT ALL PRIVILEGES ON DATABASE postgreSQL_linked_DB TO astro-user-name;
  1. Grant USAGE and CREATE privileges on the public schema to astro-user-name:
1GRANT USAGE, CREATE ON SCHEMA public TO astro-user-name;

Now, go into the database you created, which is astro-db-name in this example, and run the following queries

  1. Grant all privileges on all tables, sequences, and functions to the user.
1GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO astro-user-name;
2GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO astro-user-name;
3GRANT ALL PRIVILEGES ON ALL FUNCTIONS IN SCHEMA public TO astro-user-name;
  1. Set default privileges for the user, so any new tables, sequences, or functions automatically have the user’s access.
1ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON TABLES TO astro-user-name;
2ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON SEQUENCES TO astro-user-name;
3ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON FUNCTIONS TO astro-user-name;
4GRANT USAGE, CREATE ON SCHEMA public TO astro-user-name;

Step 4: Retrieve database host information

Retrieve the connection information for your external database. For example, with AWS, you can retrieve your endpoint information by Finding the connection information for an RDS for MySQL DB instance.

Step 5: Compose a connection string for your database

You need connection strings that define how Astro Private Cloud configures the connection to your external databases from your Airflow Deployment. The values of these strings are used when you define your metadataConnection or resultBackendConnection when you create, update, or upsert your Deployment.

Use the values for your astro-user-name, astro-user-password, astro-db-name, and the host information you retrieved to compose the connection strings in the following format, depending on whether you want to define a result backend connection or a metadata database connection.

PgBouncer is enabled by default

For PostgreSQL Deployments, PgBouncer is enabled by default in Astro Private Cloud. When PgBouncer is enabled, URI-style connection strings (postgresql://...) are rejected during upsert and you must use the JSON format (metadataConnectionJson and resultBackendConnectionJson). Use the URI tabs below only if you have explicitly disabled PgBouncer for your Deployment.

With PGBouncer disabled

  • metadataConnection:

    postgresql://astro-user-name:astro-user-password@host:5432/astro-db-name
  • resultBackendConnection:

    db+postgresql://astro-user-name:astro-user-password@host:5432/astro-db-name
Celery Executor

The connection string format validation regex don’t cover the resultbackend connection string format, which includes db+. This is specifically required for the Celery executor worker. If the connection string doesn’t include db+, then Celery worker pod fails. The regex validation is not implemented because it adds the complications on format validation logic in different scenarios.

With PGBouncer enabled

If you have PGBouncer enabled, and are using Postgres, you must configure metadataConnectionJson and resultBackendConnectionJson instead. PgBouncer is enabled by default in Astro Private Cloud, so this is the typical path.

Use the values for your astro-user-name, astro-user-password, astro-db-name, and the host information you retrieved to compose the connection strings in the following format, depending on whether you want to define a result backend connection or a metadata database connection.

  • metadataConnectionJson:

    1"metadataConnectionJson": {
    2 "user": "astro-user-name",
    3 "pass": "astro-user-password",
    4 "protocol": "postgresql",
    5 "host": "host",
    6 "port": 5432,
    7 "db": "astro-db-name"
    8 },
  • resultBackendConnectionJson:

    1"resultBackendConnectionJson": {
    2 "user": "astro-user-name",
    3 "pass": "astro-user-password",
    4 "protocol": "postgresql",
    5 "host": "host",
    6 "port": 5432,
    7 "db": "astro-db-name"
    8 },

Step 6: Add to Deployment configuration

Use the Houston API to create your Deployment configuration.

Required: skipAirflowDatabaseProvisioning

When you point a Deployment at an external database, you must set skipAirflowDatabaseProvisioning: true in the upsertDeployment mutation. Without it, Commander overwrites the host you passed with the platform database host before running helm install, and the Deployment silently lands on the wrong database instance with no error.

The following example shows the mutation and queries for using upsertDeployment. See Houston API code examples for examples on how to use the update and upsert options for configuring your Deployment.

Set skipAirflowDatabaseProvisioning to true

When you provide manual connection input, also set skipAirflowDatabaseProvisioning: true in your upsert payload. Without this flag, Commander overwrites the host value in your metadataConnection (or metadataConnectionJson) with the data plane database URL before running helm install, regardless of the host you submitted, and your Deployment silently lands on the platform database instead of your external one. Setting skipAirflowDatabaseProvisioning: true skips automatic database provisioning and preserves the host you provided.

Create a new Deployment

1mutation upsertDeployment(
2 $workspaceUuid: Uuid!
3 $releaseName: String
4 $namespace: String!
5 $type: String!
6 $label: String!
7 $description: String
8 $version: String
9 $airflowVersion: String
10 $runtimeVersion: String
11 $executor: ExecutorType
12 $workers: Workers
13 $webserver: Webserver
14 $scheduler: Scheduler
15 $triggerer: Triggerer
16 $config: JSON
17 $properties: JSON
18 $dagDeployment: DagDeployment
19 $astroUnitsEnabled: Boolean
20 $rollbackEnabled: Boolean
21 $metadataConnection: String
22 $resultBackendConnection: String
23 $metadataConnectionJson: JSON
24 $resultBackendConnectionJson: JSON
25 $skipAirflowDatabaseProvisioning: Boolean
26) {
27 upsertDeployment(
28 workspaceUuid: $workspaceUuid
29 releaseName: $releaseName
30 namespace: $namespace
31 type: $type
32 label: $label
33 airflowVersion: $airflowVersion
34 description: $description
35 version: $version
36 executor: $executor
37 workers: $workers
38 webserver: $webserver
39 scheduler: $scheduler
40 triggerer: $triggerer
41 config: $config
42 properties: $properties
43 runtimeVersion: $runtimeVersion
44 dagDeployment: $dagDeployment
45 astroUnitsEnabled: $astroUnitsEnabled
46 rollbackEnabled: $rollbackEnabled
47 metadataConnection: $metadataConnection
48 resultBackendConnection: $resultBackendConnection
49 metadataConnectionJson: $metadataConnectionJson
50 resultBackendConnectionJson: $resultBackendConnectionJson
51 skipAirflowDatabaseProvisioning: $skipAirflowDatabaseProvisioning
52 ) {
53 id
54 config
55 urls {
56 type
57 url
58 __typename
59 }
60 properties
61 description
62 label
63 releaseName
64 namespace
65 status
66 type
67 version
68 workspace {
69 id
70 label
71 __typename
72 }
73 airflowVersion
74 runtimeVersion
75 desiredAirflowVersion
76 dagDeployment {
77 type
78 nfsLocation
79 repositoryUrl
80 branchName
81 syncInterval
82 syncTimeout
83 ephemeralStorage
84 dagDirectoryLocation
85 rev
86 sshKey
87 knownHosts
88 __typename
89 }
90 createdAt
91 updatedAt
92 __typename
93 }
94}

JSON Query example

1{
2 "workspaceUuid": "cm3g0cjd2000008l74jigb54y",
3 "skipAirflowDatabaseProvisioning": true,
4 "metadataConnectionJson": {
5 "user": "astro-user-name",
6 "pass": "astro-password",
7 "protocol": "postgresql",
8 "host": "host",
9 "port": 5432,
10 "db": "astro-db-name"
11 },
12 "resultBackendConnectionJson": {
13 "user": "astro-user-name",
14 "pass": "astro-password",
15 "protocol": "postgresql",
16 "host": "postgres-db-lb.external-postgres.svc.cluster.local",
17 "port": 5432,
18 "db": "astro-db-name"
19 },
20 "namespace": "",
21 "type": "airflow",
22 "config": {
23 "executor": "CeleryExecutor",
24 "workers": {},
25 "webserver": {},
26 "scheduler": {
27 "replicas": 1
28 },
29 "triggerer": {}
30 },
31 "executor": "CeleryExecutor",
32 "workers": {},
33 "webserver": {},
34 "scheduler": {
35 "replicas": 1
36 },
37 "triggerer": {},
38 "label": "Rt1160-Celery-Pgbouncer-Enabled-Json-5",
39 "description": "",
40 "runtimeVersion": "11.6.0",
41 "properties": {
42 "extra_capacity": {
43 "cpu": 1000,
44 "memory": 3840
45 }
46 },
47 "astroUnitsEnabled": false,
48 "rollbackEnabled": true,
49 "dagDeployment": {
50 "type": "dag_deploy",
51 "nfsLocation": "",
52 "repositoryUrl": "",
53 "branchName": "",
54 "syncInterval": 1,
55 "syncTimeout": 120,
56 "ephemeralStorage": 2,
57 "dagDirectoryLocation": "",
58 "rev": "",
59 "sshKey": "",
60 "knownHosts": ""
61 }
62}

Example query string variables

1{
2 "workspaceUuid": "cm3g0cjd2000008l74jigb54y",
3 "skipAirflowDatabaseProvisioning": true,
4"metadataConnection": "postgresql://astro-user-name:astro-user-password@host:5432/astro-db-name"
5"resultBackendConnection": "db+postgresql://astro-user-name:astro-user-password@host:5432/astro-db-name"
6 "namespace": "",
7 "type": "airflow",
8 "config": {
9 "executor": "CeleryExecutor",
10 "workers": {},
11 "webserver": {},
12 "scheduler": {
13 "replicas": 1
14 },
15 "triggerer": {}
16 },
17 "executor": "CeleryExecutor",
18 "workers": {},
19 "webserver": {},
20 "scheduler": {
21 "replicas": 1
22 },
23 "triggerer": {},
24 "label": "Rt1160-Celery-Pgbouncer-Enabled-Json-5",
25 "description": "",
26 "runtimeVersion": "11.6.0",
27 "properties": {
28 "extra_capacity": {
29 "cpu": 1000,
30 "memory": 3840
31 }
32 },
33 "astroUnitsEnabled": false,
34 "rollbackEnabled": true,
35 "dagDeployment": {
36 "type": "dag_deploy",
37 "nfsLocation": "",
38 "repositoryUrl": "",
39 "branchName": "",
40 "syncInterval": 1,
41 "syncTimeout": 120,
42 "ephemeralStorage": 2,
43 "dagDirectoryLocation": "",
44 "rev": "",
45 "sshKey": "",
46 "knownHosts": ""
47 }
48}