Deploying Kedro Pipelines to Apache Airflow
PluginsIntegrations
Kedro is an open-source Python framework for creating reproducible, maintainable, and modular data science code. It borrows concepts from software engineering and applies them to machine learning code.
While Kedro is an excellent option for data engineers and data scientists looking to author their data pipelines and projects with software engineering practices, it can extend even further to integrate with Apache Airflow for distributed scheduling and execution of the resultant pipelines.
In close partnership with the team at Kedro, we've recently extended the kedro-airflow
plugin to accommodate a significantly improved developer experience. With this plugin, you can translate your Kedro pipeline into a clean, legible, and well-structured Apache Airflow DAG with one simple command:
kedro airflow create
This makes for a super clean experience for anyone looking to deploy their Kedro pipelines to a distributed scheduler for workflow orchestration.
To use the plugin, you'll need the following running on your machine or a fresh virtual environment:
We've added some additional functionality to the plugin that makes for a great integration with Astronomer. To give it a try, we'll use the astro-iris
starter that's included in the Kedro project; The steps below walk through spinning up a fresh Kedro project and running your pipelines as DAGs on a local Airflow environment.
kedro new --starter astro-iris
to build your starter directory.cd <kedro-project-directory>
kedro install
kedro package
cp src/dist/*.whl ./
kedro catalog create --pipeline=__default__
Edit your conf/base/catalog/__default__.yml
and configure datasets to be persisted, e.g.
example_train_x:
type: pickle.PickleDataSet
filepath: data/05_model_input/example_train_x.pkl
example_train_y:
type: pickle.PickleDataSet
filepath: data/05_model_input/example_train_y.pkl
example_test_x:
type: pickle.PickleDataSet
filepath: data/05_model_input/example_test_x.pkl
example_test_y:
type: pickle.PickleDataSet
filepath: data/05_model_input/example_test_y.pkl
example_model:
type: pickle.PickleDataSet
filepath: data/06_models/example_model.pkl
example_predictions:
type: pickle.PickleDataSet
filepath: data/07_model_output/example_predictions.pkl`
kedro-airflow
plugin installed, then run pip install kedro-airflow
kedro airflow create -t dags/
astro dev start
to fire up a local Airflow instance and visualize your DAGs.We're proud to partner with the Kedro team on bringing this plugin experience into the world and look forward to extending it to improve the developer experience even more. Please get in touch if you'd like to talk to us about how you use Kedro and Airflow together!
Do Airflow the easy way.