Integrate OpenLineage and Airflow
Data lineage is the concept of tracking and visualizing data from its origin to wherever it flows and is consumed downstream. Lineage is growing in importance as companies must rely on increasingly complex data ecosystems to make business-critical decisions. Data lineage can help with everything from understanding your data sources, to troubleshooting job failures, to managing PII, to ensuring compliance with data regulations.
It follows that data lineage has a natural integration with Apache Airflow. Airflow is often used as a one-stop-shop orchestrator for an organization’s data pipelines, which makes it an ideal platform for integrating data lineage to understand the movement of and interactions within your data.
In this guide, you’ll learn about core data lineage concepts and understand how lineage works with Airflow.
Astro offers robust support for extracting and visualizing data lineage. To learn more, see Data lineage on Astro.
There are multiple resources for learning about this topic. See also:
- Webinar: OpenLineage and Airflow: A Deeper Dive.
Assumed knowledge
To get the most out of this guide, make sure you have an understanding of:
- Airflow fundamentals, such as writing DAGs and defining tasks. See Get started with Apache Airflow.
- Airflow operators. See Operators 101.