Topics that will be discussed:
- What is Airflow
- Core Components
- Core Concepts
- Flexibility of Pipelines as Code
- Getting Airflow up and Running
- Demo DAGs
What is Airflow?
Definition: Apache Airflow is a way to programmatically author, schedule and monitor data pipelines.
- Airflow is the De-facto Standard for Data Orchestration
- Born inside AirBnB, open-sourced, and graduated to a Top-Level Apache Software Foundation Project
- Leveraged by 1M+ data engineers around the globe to programmatically author, schedule, and monitor data pipelines
- Deployed by 1000s of companies as the unbiased data control plane, translating business rules to power their data processing fabric
- Local- good for local and development environments
- Celery- good for high volume of short tasks
- Kubernetes- good for autoscaling and task-level configuration
Plus Sequential, but no parallelism with this one
- Instance of an Operator
- Represents a specific run of a task: DAG + TASK + Point in time
Flexibility of Data Pipelines-as-Code
Getting Started with Apache Airflow
The easiest way to get started with providers and Apache Airflow 2.0 is by using the Astronomer CLI. To make it easy you can get up and running with Airflow by following our Quickstart Guide.
Join the 1000’s of other data engineers who have received theAstronomer Certificationfor Apache Airflow Fundamentals. This exam assesses an understanding of the basics of the Airflow architecture and the ability to create basic data pipelines for scheduling and monitoring tasks.
To access the demo DAGs used in the Intro to Airflow webinar, visit this Github repository