Webinar Recap

Intro to Apache Airflow

Topics that will be discussed:

  • What is Airflow
  • Core Components
  • Core Concepts
  • Flexibility of Pipelines as Code
  • Getting Airflow up and Running
  • Demo DAGs

What is Airflow?

Definition: Apache Airflow is a way to programmatically author, schedule and monitor data pipelines.

  • Airflow is the De-facto Standard for Data Orchestration
  • Born inside AirBnB, open-sourced, and graduated to a Top-Level Apache Software Foundation Project
  • Leveraged by 1M+ data engineers around the globe to programmatically author, schedule, and monitor data pipelines
  • Deployed by 1000s of companies as the unbiased data control plane, translating business rules to power their data processing fabric

Core Components

Executors

  • Local- good for local and development environments
  • Celery- good for high volume of short tasks
  • Kubernetes- good for autoscaling and task-level configuration

Plus Sequential, but no parallelism with this one

Task

  • Instance of an Operator

Task Instance

  • Represents a specific run of a task: DAG + TASK + Point in time

Flexibility of Data Pipelines-as-Code

Getting Started with Apache Airflow

The easiest way to get started with providers and Apache Airflow 2.0 is by using the Astronomer CLI. To make it easy you can get up and running with Airflow by following our Quickstart Guide.

Join the 1000’s of other data engineers who have received theAstronomer Certificationfor Apache Airflow Fundamentals. This exam assesses an understanding of the basics of the Airflow architecture and the ability to create basic data pipelines for scheduling and monitoring tasks.

To access the demo DAGs used in the Intro to Airflow webinar, visit this Github repository

Never miss an update from us.