Meet the Astronomer Data Team

  • Taylor Merrick

Hi everyone! We’re Astronomer’s in-house data team. We center around two main goals:

  1. Make data valuable and reliable at Astronomer
  2. Do it all with Airflow!

We partner with teams across the company—not just to deliver data products, but also to show off how we use Airflow and our own tooling to make it happen. Along the way, we’ve learned from our internal Airflow experts and all our customers, some of whom run the world’s largest Airflow deployments. In this blog series, we’re sharing what’s been working for us because there’s a good chance other data teams will relate to our highs and lows: unclogging pipelines, scrubbing SFDC data, and explaining why that “one simple dashboard metric” isn’t actually so simple.

Who We Are

We’re a centralized team of five, serving the data needs of the entire company. As Astronomer has grown, so has our scope and everyone on the team has touched every corner of our data model and pipeline infrastructure. For our fellow Airflow data teams, we maintain 200 DAGs, ingest data from over 25 data sources and run over a million tasks each month. We like to think we’re ‘small but mighty’.

When we’re not building new things, we are prioritizing our developer experience. Why? Because let’s face it, data engineers are always looking for efficiency gains (hey, we're lazy in a good way). We’ve made it easy for us to build pipelines with testing and documentation ‘for free’. There is a real satisfaction in making things more reliable and easier to maintain. Also, if we don’t build it into our workflow, we will never have time to do it later (read: Our tech debt will always be debt).

What We Do

Our work supports every part of the company. Here’s a snapshot:

  • Reporting: We own internal dashboards for execs, finance, sales, product usage, trials, and more.
  • Product Embedded Dashboards: Our internal dashboards have become so helpful that we now maintain embedded dashboards in Astro for customers to monitor their usage and costs.
  • Operational Analytics: We integrate data across systems to support day-to-day operations.
    • Billing Data: We deliver metered metrics to billing systems to ensure accurate invoicing.
    • Reverse ETL (rETL): We push product usage data into Salesforce and Zendesk, enhancing the context for sales and support teams.
  • Data Mesh Enablement: Enabling data engineers across the organization to build their own pipelines
  • Customer Bot: Slack-based tool to help anyone get up-to-date information on our customers
  • Dogfooding: We use our own products internally and provide feedback to the product team on how real data teams would use them.

But none of that matters if the foundation isn’t solid. Reliable data doesn’t just appear—it’s the result of a lot of (often invisible) work.

The Iceberg Analogy: What’s Seen vs. Unseen

"Data work is like an iceberg." It’s a favorite analogy for a reason. The dashboards and reports everyone sees? Just the tip. Beneath the surface is where most of the effort happens: infrastructure, monitoring, QA, and development.

That’s where we shine. It’s easy to forget how much unseen work goes into making data reliable, until something breaks. And when it does, other teams can lose trust fast. And then, regaining that trust takes even longer, especially with teammates who don’t see the complexity behind the scenes.

When data teams are stuck in reactive mode, focused on ‘keeping the lights on’, there’s no room for innovation. But when we’re given time to invest in the invisible layers, we not only prevent future issues, we create space for strategic, high-leverage work.

So yes, fortify the iceberg. Strong foundations enable everything else.

What’s Next: A Sneak Peek Into Upcoming Blog Posts

Data quality, visibility, and reliability are essential to scaling any organization and Astronomer is no exception. By continuously improving our data processes and making our work more transparent, we help teams across the company make smarter decisions faster.

As we go about our work making sure Astronomer is as data-driven an organization as possible, we’re going to share best practices and lessons-learned along the way that might be helpful to other data teams to help simplify the unseen work of their iceberg.

Some upcoming topics we might blog about include:

  • Pipeline Reliability - How we’ve tried to decrease the stress every morning of failed pipelines
  • Data Quality - How we’ve prioritized data quality without adding additional lift from our Data Engineers
  • Task Groups/DAG Factories - How we make development easy and are able to focus on what the business needs from the data
  • Customer Bot - How a mini-hackathon turned into a useful tool that everyone can use to get up-to-speed on any customer

Stay tuned!

Build, run, & observe your data workflows. All in one place.

Build, run, & observe
your data workflows.
All in one place.

Try Astro today and get up to $500 in free credits during your 14-day trial.