Deploying Kedro Pipelines to Apache Airflow


Kedro is an open-source Python framework for creating reproducible, maintainable, and modular data science code. It borrows concepts from software engineering and applies them to machine learning code.

While Kedro is an excellent option for data engineers and data scientists looking to author their data pipelines and projects with software engineering practices, it can extend even further to integrate with Apache Airflow for distributed scheduling and execution of the resultant pipelines.

The Plugin

In close partnership with the team at Kedro, we’ve recently extended the kedro-airflow plugin to accommodate a significantly improved developer experience. With this plugin, you can translate your Kedro pipeline into a clean, legible, and well-structured Apache Airflow DAG with one simple command:

kedro airflow create

This makes for a super clean experience for anyone looking to deploy their Kedro pipelines to a distributed scheduler for workflow orchestration.


To use the plugin, you’ll need the following running on your machine or a fresh virtual environment:

Try it Out

We’ve added some additional functionality to the plugin that makes for a great integration with Astronomer. To give it a try, we’ll use the astro-airflow-iris starter that’s included in the Kedro project; The steps below walk through spinning up a fresh Kedro project and running your pipelines as DAGs on a local Airflow environment.

Create an Astro-Kedro project

  1. kedro new --starter astro-airflow-iris to build your starter directory.
  2. cd <kedro-project-directory>
  3. kedro install
  4. kedro package

Prepare and run the project in Astro

  1. cp src/dist/*.whl ./

  2. kedro catalog create --pipeline=__default__

  3. Edit your conf/base/catalog/__default__.yml and configure datasets to be persisted, e.g.

        type: pickle.PickleDataSet
        filepath: data/05_model_input/example_train_x.pkl
        type: pickle.PickleDataSet
        filepath: data/05_model_input/example_train_y.pkl
        type: pickle.PickleDataSet
        filepath: data/05_model_input/example_test_x.pkl
        type: pickle.PickleDataSet
        filepath: data/05_model_input/example_test_y.pkl
        type: pickle.PickleDataSet
        filepath: data/06_models/example_model.pkl
        type: pickle.PickleDataSet
        filepath: data/07_model_output/example_predictions.pkl`
  4. Make sure you have the kedro-airflow plugin installed, then run pip install kedro-airflow

  5. kedro airflow create -t dags/

  6. Make sure you have the Astro CLI installed and have Docker running on your machine, then run astro dev start to fire up a local Airflow instance and visualize your DAGs.

We’re proud to partner with the Kedro team on bringing this plugin experience into the world and look forward to extending it to improve the developer experience even more. Please get in touch if you’d like to talk to us about how you use Kedro and Airflow together!

Modern,Cloud-Native Data Orchestration powered by Apache Airflow

Start building your next-generation data platform with Astro.

Get Started