Create a Databricks connection in Airflow
Databricks is a popular unified data and analytics platform built around Apache Spark that provides users with fully managed Apache Spark clusters and interactive workspaces.
This guide provides the basic setup for creating a Databricks connection. For a complete integration tutorial, see Orchestrate Databricks jobs with Airflow.
Prerequisites
- An Airflow environment with the Airflow Databricks provider (
apache-airflow-providers-databricks) installed. - A Databricks account.
Astro users can also create connections using the Astro Environment Manager, which stores connections in an Astro-managed secrets backend. These connections can be shared across multiple deployed and local Airflow environments. See Create Airflow connections in the Astro UI.
Connect with an OAuth Connection
An OAuth connection from Airflow to Databricks requires the following information:
- Host: Databricks URL
- Service Principal Client ID / Login: Service Principal Client ID
- Service Principal Client Secret / Password: Service Principal Client Secret
Complete the following steps to retrieve these values:
- In the Databricks Cloud UI, copy the URL of your Databricks workspace. It should be formatted as either
https://dbc-75fc7ab7-96a6.cloud.databricks.com/orhttps://your-org.cloud.databricks.com/. - Create a service principal in Databricks and copy the Client ID and Client Secret, see Authorize service principal access to Databricks with OAuth.
Connect with a Personal Access Token
A Personal Access Token (PAT) connection from Airflow to Databricks requires the following information:
- Host: Databricks URL
- Personal Access Token / Password: Personal access token
Complete the following steps to retrieve these values:
- In the Databricks Cloud UI, copy the URL of your Databricks workspace. It should be formatted as either
https://dbc-75fc7ab7-96a6.cloud.databricks.com/orhttps://your-org.cloud.databricks.com/. - To use a personal access token for a user, follow the Databricks documentation to generate a new token. To generate a personal access token for a service principal, see Manage personal access tokens for a service principal. Copy the personal access token.
See also
- Apache Airflow Databricks provider package documentation
- Databricks modules in the Astronomer Registry