Tip
The key information from this and other Snowflake guides is available as an Astronomer Cheat Sheet.
Snowflake is one of the most commonly used data warehouses, and orchestrating Snowflake queries as part of a data pipeline is one of the most common Airflow use cases. Two Airflow provider packages, the Snowflake Airflow provider and the Common SQL provider contain hooks and operators that make it easy to interact with Snowflake from Airflow.
This tutorial covers an example of executing Snowflake operations with Airflow, including:
Additionally, More on the Airflow Snowflake integration offers further information on general best practices and considerations when interacting with Snowflake from Airflow.
This tutorial takes approximately 20 minutes to complete.
To get the most out of this tutorial, make sure you have an understanding of:
Use the Astro CLI to create and run an Airflow project on your local machine.
Create a new Astro project:
In the requirements.txt file, add the Snowflake Airflow provider and the Common SQL provider.
Run the following command to start your Airflow project:
There are different options to authenticate to Snowflake. The SnowflakeAPIOperator used in this tutorial requires you to use key-pair authentication, which is the preferred method. This method requires you to generate a public/private key pair, add the public key to your role in Snowflake, and use the private key in your Airflow connection.
Info
For Astro customers, Astronomer recommends to take advantage of the Astro Environment Manager to store connections in an Astro-managed secrets backend. These connections can be shared across multiple deployed and local Airflow environments. See Manage Astro connections in branch-based deploy workflows.
In your terminal, run the following command to generate a private RSA key using OpenSSL. Note that while there are other options to generate a key pair, Snowflake has specific requirements for the key format and may not accept keys generated with other tools. Make sure to write down the key passphrase as you will need it later.
Generate the associated public key using the following command:
Format the private key. Version 6.3.0+ of the Airflow Snowflake provider requires the private key to be base64 encoded. You can create a base64 encoded key with the following script:
Note
If you on version 6.2.2 or older of the Airflow Snowflake provider, you need to provide the private key without any coding conversions but with newlines encoded as
\n. You can use the script below to format the key correctly:
In the Snowflake UI, run the following SQL command to add the public key to your user. You can paste the public key directly from the rsa_key.pub file without needing to modify it.
In the Airflow UI, go to Admin -> Connections and click + to create a new connection. Choose the Snowflake connection type and enter the following information:
Connection ID: snowflake_conn
Schema: Your Snowflake schema. The example DAG uses DEMO_SCHEMA.
Login: Your Snowflake user name. Make sure to capitalize the user name as the SnowflakeAPIOperator requires it.
Password: Your private key passphrase.
Extra: Enter the following JSON object with your own Snowflake account identifier, database, your role in properly capitalized format, and your warehouse.
Tip
When using JSON format to set your connection use the following parameters:
The DAG you will create in Step 4 runs multiple SQL statements against your Snowflake data warehouse. While it is possible to add SQL statements directly in your DAG file it is common practice to store them in separate files. When initializing your Astro project with the Astro CLI, an include folder was created. The contents of this folder will automatically be mounted into the Dockerfile, which makes it the standard location in which supporting files are stored.
Create a folder called sql in your include folder.
Create a new file in include/sql called insert_data.sql and copy the following code:
This file contains one SQL statement that inserts a row into a table. The database, schema and table names are parameterized so that you can pass them to the operator at runtime.
The SnowflakeSqlApiOperator can run multiple SQL statements in a single task. Create a new file in include/sql called multiple_statements_query.sql and copy the following code:
This file contains two SQL statements that insert multiple rows into a table.
Tip
When running SQL statements from Airflow operators, you can store the SQL code in individual SQL files, in a combined SQL file, or as strings in a Python module. Astronomer recommends storing lengthy SQL statements in a dedicated file to keep your DAG files clean and readable.
Create a new file in your dags directory called my_snowflake_dag.py.
Copy and paste the code below into the file:
The DAG completes the following steps:
ID column and that it contains at least 3 distinct values. To learn more about SQL check operators, see Run data quality checks using SQL check operators.The chain() method at the end of the DAG sets the dependencies. This method is commonly used over bitshift operators (>>) to make it easier to read dependencies between many tasks.
In the Airflow UI, click the play button to manually run your DAG.
Open the logs for the data_quality_check task to see the results of the data quality check, confirming that the table was created and populated correctly.
This section provides additional information on orchestrating actions in Snowflake with Airflow.
Several open source packages contain operators used to orchestrate Snowflake in Airflow.
The Common SQL provider package contains operators that you can use with several SQL databases, including Snowflake:
SnowflakeOperator.The Snowflake provider package contains:
deferrable=True.prefix parameter will be added to the full stage path defined in Snowflake.The following are some best practices and considerations to keep in mind when orchestrating Snowflake queries from Airflow:
USE Warehouse XYZ; statements within your queries. If you are an Astro customer, use the Astro Environment Manager to define your base connection and add overrides for specific deployments and tasks.Congratulations! You’ve connected Airflow to Snowflake and executed Snowflake queries from your Airflow DAGs. You’ve also learned about best practices and considerations when orchestrating Snowflake queries from Airflow.