The New, Faster Way to Deploy Airflow DAGs to Astro
A new feature that allows you to deploy only DAGs to Astro is now in Public Preview and available to all Astro customers in the latest version of the Astro CLI. This is yet another exciting way Astro is enabling teams to be more productive with data.
For many data teams, the ability to push Airflow DAGs separate from environment files means:
- Speed – Deploying only DAGs is faster and results in a feedback loop that quickly helps you answer the questions, “Is my DAG code live?” and “Are there any errors?”
- Reliability – Deploying only DAGs does not require a restart to your workers and schedulers. This means a more efficient use of your resources and no downtime for your Deployment.
- Flexibility in CI/CD – You can now have one set of users pushing DAGs and another set of users managing the rest of the environment. Or not — it’s up to you.
This post covers the problem that we solved for, why we solved it, how this new feature works, and where we’re going next. We’re excited about the DAG-only deploy feature — and even more thrilled about what we can build on top of it.
To skip the story and get started, see Deploy DAGs only in the Astro documentation. Once you try it, give us feedback. We’d love to hear from you.
Astro’s Deployment Model (Before)
Historically, deploying code to Astro has looked something like this:
- You create an Astro project with your DAGs and “everything else” — all of the files that define your Airflow environment on Astro. That includes your base Astro Runtime image and Airflow version, your Python packages, OS-level packages, plugins, and more.
- You then run
astro deploy
with the Astro CLI. - The Astro CLI builds your Astro project into a Docker image and pushes that image to a Docker registry in the Astro control plane.
- Astro restarts all Airflow components in your Deployment and applies the new Docker image to each of them. (Tasks are not interrupted and workers have always had a graceful restart).
If you know anything about software and you read these four steps, it probably doesn’t take long for you to wonder: “Do you really have to restart all components of a Deployment to apply a quick change to a Python file?”
And indeed, this deployment model has resulted in a few obvious challenges:
- Building a Docker image and reinstalling every single Python or OS-level package in your environment every time you push code can take 5+ minutes. That’s 5 minutes too long.
- For teams that manage more than 100 DAGs and push changes to those DAGs multiple times a day, Airflow components might be constantly terminating and restarting — and thus operating at half capacity. This is a waste of resources and limits how many tasks you can run at once.
- Teams with more rigorous development practices or that migrate from previous Airflow environments have less flexibility in where they store code and how they push it to Astro.
As active users of Astro and close partners to our customers, we at Astronomer understand how painful this can be. And we’re excited to share the solution we came up with.
Deploying Only DAGs
With Astro CLI 1.7, you can now enable DAG-only deploys for your Deployment and run astro deploy –dags
to push only DAGs to Astro.
With DAG-only deploys enabled:
- DAGs are bundled into a tarball file, pushed to Astro, and mounted as a volume to the Airflow workers and schedulers. This now happens in as little as 30 seconds.
- The rest of your Astro project is still bundled into a Docker image. When you add a Python package or make a change to these files, the Astro CLI still builds a Docker image. But only when you need it to.
Because 95% of code that’s pushed to Astro is automated, we’ve built example CI/CD scripts with GitHub Actions that run one command or the other depending on which file is changed. When someone on your team modifies a DAG file, the Astro CLI runs astro deploy –dags
. When someone on your team changes any other directory, the CLI triggers a process that pushes your DAGs (as a tarball) and the rest of your Astro project (as a Docker image).
Currently, our CI/CD scripts assume that:
- You have one GitHub repository for each Astro Workspace.
- You have two to three branches in that repository that each correlate to an Astro Deployment in that Workspace. For example, your
dev
branch corresponds with your “Dev” Deployment. - Utility files, such as custom operator code or reusable SQL functions, are not included in your DAGs directory and are built with the rest of your Astro project.
Everything else, Astro takes care of. And if you’re not already familiar with Astro, we’d love to show you a demo.
Coming Soon
We’re excited about the foundation we’ve delivered in the Astro CLI, and we plan to enhance the developer experience further over the next six months with the following features:
- Operational observability on the status of your code push, so that you never have to ask, “Is my new DAG code live yet?”
- Easier-to-find CI/CD scripts in the GitHub Actions Marketplace.
- Policies that allow admins to restrict code pushes to be triggered only by CI/CD.
- In-product branch deploys that allow you to natively tie feature branches in GitHub to Deployments on Astro.
- The ability to deploy multiple DAG directories to Astro instead of only one.
As we make these advancements in Astro, we’d love to hear from you. If your data team is running Airflow and struggling with how to organize your code and develop automated deployment processes that are easy, fast, and secure — talk to us. And stay tuned for more from our product team.