Skip to main content
Version: Airflow 3.x

Apache Airflow® Quickstart - Learn Airflow

Learning Airflow: An introduction to Airflow's lean and dynamic pipelines-as-Python-code.

Step 1: Clone the Astronomer Quickstart repository

  1. Create a new directory for your project and open it:

    mkdir airflow-quickstart-learning && cd airflow-quickstart-learning
  2. Clone the repository and open it:

    git clone -b learning-airflow-3 --single-branch https://github.com/astronomer/airflow-quickstart.git && cd airflow-quickstart/learning-airflow

    Your directory should have the following structure:

    .
    ├── Dockerfile
    ├── README.md
    ├── dags
    │ └── example_astronauts.py
    │ └── example_extract_astronauts.py
    ├── include
    ├── packages.txt
    ├── requirements.txt
    ├── solutions
    │ └── example_astronauts_solution.py
    └── tests
    └── dags
    └── test_dag_integrity.py

Step 2: Start up Airflow and explore the UI

  1. Start the project using the Astro CLI:

    astro dev start

    The CLI will let you know when all Airflow services are up and running.

tip

At this time, Safari will not work properly with the UI. If Safari is your default browser, use Chrome to open Airflow 3.0.

  1. If it doesn't launch automtically, navigate your browser to localhost:8080 and sign in to the Airflow UI using username admin and password admin.

  2. Explore the Home screen and DAGs page to get a sense of the metadata available about the DAG, run, and all task instances. For a deep-dive into the UI's features, see An introduction to the Airflow UI.

    For example, the Home screen will look like this screenshot:

    Airflow UI Home View

    And the DAGs page will look like this screenshot:

    Airflow UI DAGs view

    As you start to trigger DAG runs on the example_astronauts dag, the DAG view will look like this screenshot:

    Example Astronauts DAG run view

    You can clearly see information such as the Schedule, Last Run (and status), Next Run, and a visual of the recent runs.

  3. Once you have triggered a few runs of the example_astronauts DAG, you should notice that it has also triggered runs of the example_extract_astronauts DAG. If you go into the Assets screen, you'll be able to see the current_astronauts Asset which has 1 consuming DAG and 1 producing Task.

    Example Asset View

Step 3: Explore the project

This Astro project introduces you to the basics of orchestrating pipelines with Airflow. You'll see how easy it is to:

  • Get data from data sources.
  • Generate tasks automatically and in parallel.
  • Trigger downstream workflows automatically.

You'll have a lean, dynamic pipeline serving a common use case: extracting data from an API and loading it into a database!

warning

This project uses DuckDB, an in-memory database. Although this type of database is great for learning Airflow, your data is not guaranteed to persist between executions!

For production applications, use a persistent database instead (consider DuckDB's hosted option MotherDuck or another database like Postgres, MySQL, or Snowflake).

Pipeline structure

An Airflow instance can have any number of DAGs (directed acyclic graphs), your data pipelines in Airflow. This project has two:

example_astronauts

This DAG queries the list of astronauts currently in space from the Open Notify API, prints assorted data about the astronauts, and loads data into an in-memory database.

Tasks in the DAG are Python functions decorated using Airflow's TaskFlow API, which makes it easy to turn arbitrary Python code into Airflow tasks, automatically infer dependencies, and pass data between tasks.

  • get_astronaut_names and get_astronaut_numbers make a JSON array and an integer available, respectively, to downstream tasks in the DAG.

  • print_astronaut_craft and print_astronauts make use of this data in different ways. The third task uses dynamic task mapping to create a parallel task for each Astronaut in the list retrieved from the API. Airflow lets you do this with just two lines of code:

    print_astronaut_craft.partial(greeting="Hello! :)").expand(
    person_in_space=get_astronaut_names()
    ),

    The key feature is the expand() function, which makes the DAG automatically adjust the number of tasks each time it runs.

  • create_astronauts_table in duckdb and load_astronauts_in_duckdb create a DuckDB database table for some of the data and load the data, respectively.

example_extract_astronauts

This DAG queries the database you created for astronaut data in example_astronauts and prints out some of this data. Changing a single line of code in this DAG can make it run automatically when the other DAG completes a run.

DAG Dependencies

Airflow makes it easy to to create cross-workflow dependencies. Assets are a collection of logically related data that you define with a Python function in Airflow reducing the code required to create cross-DAG dependencies. For example, with an import and a single line of code, you can schedule a DAG to run when another DAG in the same Airflow environment has updated an Asset.

The example_astronauts DAG creates the Asset with the needed data that example_extract_astronauts uses as a run trigger, instead of a standard schedule. The lines of code in the example_astronauts DAG to create the Asset for the trigger is:

from airflow.sdk import Asset

And

@task(
outlets=[Asset(_DUCKDB_TABLE_NAME)]
)
def get_astronaut_names(**context) -> list[dict]:

The line of code used to create that schedule trigger inside the example_extract_astronauts DAG is:

schedule=[Asset(_DUCKDB_TABLE_NAME)]

Next Steps:

Run Airflow on Astro

The easiest way to run Airflow in production is with Astro. To get started, create an Astro trial. During your trial signup, you will have the option of choosing the same template project you worked with in this quickstart.

Further Reading

Here are a few guides that may help you learn more about the topics discussed in this quickstart:

  • Find more info about DAGs in Airflow 3.0 here.
  • Check out our guide on DAG Versioning to see how you can easily manage new iterations of your DAG code in Airflow 3.
  • Now that you've gone throug the basics of learning airflow, here is more detailed information about the Airflow Components.

Was this page helpful?