Airflow vs. Luigi
Created by Airbnb Data Engineer Maxime Beauchemin, Airflow is an open source workflow management system designed for authoring, scheduling, and monitoring workflows as DAGs, or directed acyclic graphs. All workflows are designed in python and it is currently the most popular open source workflow management tool on the market.
Luigi is a python package developed by ex-Spotify engineer Erik Bernhardsson. Originally created to run complex pipelines that powered Spotify's music recommendation engine, Luigi's design philosophy is to generalize complexities in workflow orchestration as much as possible, allowing it to be extended with other tasks such as Hive queries, Spark jobs, and more. Luigi is based in Python and allows you to parallelize workflows. It doesn't have a scheduler and users still have to rely on cron for scheduling jobs.
As pointed out by Quora user Angela Zhang, Airflow and Luigi have a few key differences that are worth noting.
- Easy-to-use UI (+)
- Built in scheduler (+)
- Easy testing of DAGs (+)
- Separates output data and task state (+)
- Strong and active community (+)
- Creating and testing tasks is difficult (-)
- The UI is challenging to navigate (-)
- Not scalable due to tight coupling with cron jobs; the number of worker processes is bounded by number of cron workers assigned to a job (-)
- Re-running pipelines is not possible
While both Luigi and Airflow are viable options for workflow management, the Airflow community has grown to be much stronger than that of Luigi in recent years. As a result, Airflow features have been developing as a much quicker pace, and we've seen a "snowball effect" of companies migrating from Luigi to Airflow in order to reap the benefits of the strong community. Check out the photos below for code contributions to the two projects, and note the scale of the y axis in each:
We also interviewed Luigi creator Erik Bernhardsson for our Airflow Podcast. He had some interesting thoughts on the directions of Luigi and Airflow, so definitely check that out here if you haven't heard it yet.
Ready to build your data workflows with Airflow?
Astronomer is the data engineering platform built by developers for developers. Send data anywhere with automated Apache Airflow workflows, built in minutes...