July 17, 2024

Astronomer Adopts DAG Factory to Democratize Writing Data Pipelines

Viraj Parekh Co-Founder, Field CTO Astronomer
Julian LaNeve CTO Astronomer

Over the last decade, Apache Airflow has established itself as the de facto standard for data pipelines for everything from analytics dashboards to machine learning and GenAI use cases. Its flexibility and scalability have made it the go-to orchestration tool for data engineers and analysts alike, allowing them to automate complex workflows and ensure data consistency and reliability.

Today, we’re excited to announce an investment to make Airflow more accessible, built on great work started by the community. We strongly believe that full participation in the orchestration process allows businesses to get the most out of their data, and are committed to making that easy and intuitive.

Expanding Airflow’s Reach: The Growing Need for Orchestration

At most companies, Airflow adoption starts with the data engineer. As reliable data delivery becomes core to more processes within companies, the range of personas and users who need to orchestrate workflows grows. This includes data analysts more comfortable with SQL and tools such as dbt, data scientists who have strong math and statistical backgrounds, and IT teams that want to modernize from legacy orchestration tools. All of these different personas need to be able to easily take queries, models, scripts, and other forms of data centric business logic and create production quality pipelines to meet new use cases and demands of the business.

As a result, data teams often create their own abstraction layers to help users write Airflow DAGs without needing to learn Airflow. These abstraction layers democratize orchestration by exposing a “low code” interface that lets users create Airflow DAGs without necessarily knowing Airflow (or even Python!).

While they offer immediate benefits, maintaining a custom built, proprietary abstraction layer over a fast moving open source project is not an easy task.

At an Airflow Meetup in 2023, the team at the New York Times said this best:

Screenshot of a presentation slide that says Do not build a custom layer on top of Airflow unless you're ready to expand and maintain it long-term.

Reinforcing Commitment to an Open Standard with DAG Factory

We are thrilled to announce that Astronomer is officially taking over Adam Boscarino’s DAG Factory, an open source project that allows DAGs to be generated from YAML files. DAG Factory has been downloaded nearly 10 million times to date and is used by organizations big and small, to democratize DAG writing in their organizations.

By bringing DAG Factory under the Astronomer umbrella, we are reinforcing our commitment to the open source community and the ongoing health and development of this project. We’ll be working with our customers and community members to enhance and modernize the DAG Factory with the latest and greatest from Airflow.

Future Integration with Astro

DAG Factory will always be open source. We are also developing a more tightly integrated development experience for our customers on Astro. In the coming months, you’ll hear more from us about how DAG Factory will be integrated with our CLI, GitHub Integration, Cloud IDE, and our product experience.

In the meantime, if you’re using DAG Factory, or you’re maintaining a similar tool internally, reach out to us – shoot us an email or open an issue in the repo! We’d love to hear how your organization is thinking about abstractions, about any bugs you’ve run into with DAG factory, and where you’d like to see the project go.

Astronomer Adopts DAG Factory to Democratize Writing Data Pipelines

Expanding Airflow’s Reach: The Growing Need for Orchestration

Reinforcing Commitment to an Open Standard with DAG Factory

Future Integration with Astro

Build, run, & observe your data workflows. All in one place.

Build, run, & observe your data workflows.
All in one place.