DAG Writing Best Practices in Apache Airflow

WATCH ON DEMAND

Summary:

Learn the best practices for writing DAGs in Apache Airflow with a repo of example DAGs that you can run with the Astro CLI. In Airflow, pipelines are called directed acyclic graphs (DAGs). We want to share the best practices with you when writing DAGs with Apache Airflow. Understanding these best practices at a high level will give you the knowledge to help you build your data pipelines correctly.

In this webinar we covered:

  • High Level Best Practices When Writing DAGs
  • Idempotency
  • Use Airflow as an Orchestrator
  • Incremental Record Filtering
  • And More...

Missed the Webinar? Sign up for the Recap

Recap Preview

High Level Best Practices When Writing DAGs

Idempotency

Data pipelines are a messy business with a lot of various components that can fail.

Idempotent DAGs allow you to deliver results faster when something breaks and can save you from losing data down the road.

Use Airflow as an Orchestrator

Airflow was designed to be an orchestrator, not an execution framework.

In practice, this means:

  • DO use Airflow to orchestrate jobs with other tools
  • DO offload heavy processing to execution frameworks (e.g. Spark)
  • DO use an ELT framework wherever possible
  • DO use intermediary data storage
  • DON’T pull large datasets into a task and process with Pandas (it’s tempting, we know)

Hosted By

Kenten Danas

Kenten Danas

Field Engineer

Viraj Parekh

Viraj Parekh

Field CTO