ETL with Snowflake: How to Build Reliable Data Pipelines at Scale
Learn how to build reliable ETL and ELT pipelines into Snowflake. Watch the on-demand webinar for production-ready patterns and orchestration best practices.
Recorded on November 13, 2025
What you'll learn
A practical walkthrough of building ELT pipelines into Snowflake
ETL and ELT into Snowflake are some of the most common things data teams build, but also one of the most common things that quietly break. In this session, Volker builds a full pipeline live: ingesting data from an API, loading it into Snowflake, validating it with quality checks, and pausing for a human review before it lands in production.
ETL or ELT for Snowflake?
Volker shares most Snowflake pipelines aren't pure ETL or pure ELT - they're a hybrid. The session covers when each pattern makes sense and how teams blend them without making a mess of the architecture.
Getting data into Snowflake, cleanly
The demo walks through a reliable loading pattern: stage files in S3, point an external Snowflake Stage at the bucket, and use Airflow's CopyFromExternalStageToSnowflakeOperator to bring data in. From there, deduplication and validation logic runs before anything reaches a production table.
Orchestration that earns your trust
Snowflake handles storage and compute brilliantly, but it doesn't orchestrate. Volker shows how Airflow fits on top with explicit dependencies, asset-aware scheduling that triggers downstream Dags the moment upstream data is ready, and observability that reveals failures before they cause downstream issues.
Catching bad data before dashboards do
Airflow's built-in quality check operators run as gates between load and production. The webinar shows column-level and table-level checks in action, including a deliberately failing run that fires a Discord alert with the exact column and value that broke.
Adding a human in the loop
Airflow 3.1's Human-in-the-Loop operator pauses a Dag mid-run and surfaces a form in the UI. Volker uses it to let a stakeholder add or correct rows before data lands without custom UI, spreadsheet uploads, or shadow processes.
FAQ
Snowflake ETL FAQ
What is ETL with Snowflake?
Extract data from your sources, transform it, and load it into Snowflake.
Most teams now run ELT instead which means you load raw data first, transform it inside Snowflake. In practice, real pipelines are usually a hybrid.
What is the best ETL tool for Snowflake?
Apache Airflow, run on Astronomer.
It's the most widely adopted orchestrator for Snowflake pipelines because it gives you:
- Native Snowflake operators
- Dependency management between tasks
- Built-in data quality checks
- End-to-end observability
What is the difference between ETL and ELT in Snowflake?
Where the transform happens.
ELT is more common today because Snowflake's compute handles transformations at scale.
- ETL — transform before loading into Snowflake
- ELT — load raw, then transform inside Snowflake (usually in SQL)
How do you load data into Snowflake?
A reliable pattern looks like this:
- Stage files in S3
- Point an external Snowflake Stage at that location
- Use Airflow's
CopyFromExternalStageToSnowflakeOperatorto load - Run dedup and validation before promoting to production tables
Does Snowflake have built-in ETL?
Snowflake handles ingestion, but not orchestration.
For dependencies, quality gates, and alerting, most teams add Airflow on top. Snowflake handles storage and compute; Airflow handles coordination.
- Snowpipe handles continuous loading
- Stages handle file staging
How do you orchestrate Snowflake pipelines?
With Airflow Dags — each step a task with explicit dependencies.
Airflow 3's asset-aware scheduling triggers downstream Dags automatically when upstream data is ready. More reliable than time-based scheduling, which breaks the moment a step runs long.
What are best practices for Snowflake ETL in production?
A short list that prevents most outages:
- Plan the data flow before writing any Dag code
- Keep SQL in separate files from orchestration logic
- Use driver-level parameter substitution to prevent SQL injection
- Run Airflow's quality check operators before data hits production
- Authenticate Snowflake with key pair or OAuth — not passwords
- Use asset-aware scheduling so dependencies stay explicit
Get started free.
OR
By proceeding you agree to our Privacy Policy, our Website Terms and to receive emails from Astronomer.