Laurent Paris, Astronomer’s senior vice president of R&D, recently spoke with analyst Eric Kavanagh on DMRadio — a weekly, data- and analytics-themed podcast — about the combination of orchestration and observability: How it improves the quality and reliability of dataflows, is a precondition for optimizing business processes, and can encourage a sense of shared responsibility among teams.
The podcast episode is below; here are some highlights:
Observability lets you optimize orchestration for collaboration and innovation across teams.
As an enterprise gets larger, Laurent says, you have growing numbers of tools and of teams — “and then you have the handoffs” between teams: “That’s where things break.”
“What is really interesting is when you layer observability on top of an orchestration layer like Airflow, you actually have interesting actionable capabilities, not just because you can control that flow and make it visible, but because you can use that understanding to optimize your orchestration layer. It’s a virtuous circle. That is the magic of combining Airflow with a layer of observability.”
As the locus of orchestration, Airflow is uniquely positioned to extract observability metadata.
“Airflow is doing the orchestration, it’s the thing that is orchestrating the data-production assembly line, and we are starting to build visibility and transparency into the orchestration process itself. So people can get that trust in how their data was produced.”
Observability metadata lets you quickly diagnose problems and go right to root causes.
“Let’s say some data feed starts to be wrong, and then all of your downstream datasets start to get corrupted. The thing is, you don’t want a bunch of alerts waking up the ten different teams who are in charge of this data, you want basically the root cause. You want to be able to say to users: ‘We are aware there’s a problem, we know which team is in charge, and they’re already working on the problem. You just have to wait until it magically gets repaired.’”
Observability in orchestration is a game-changer, beyond diagnostics and troubleshooting.
“Having that observability from end-to-end, from the ETL process down to the dashboard, understanding all of the steps when the data is being transformed, understanding the quality of the data being produced at every step of the process is important. It’s like an assembly line, you want to understand how, from the raw materials, you end up with the finished product.”
Observability lets you visualize, understand, and optimize your business processes.
“The understanding of the processes and the flow of data is the precondition for basically optimizing and improving all of your processes. That’s how the business derives value — it’s a continuous improvement loop.”
Observability helps correct flawed assumptions.
“Everybody builds a mental map of dataflows in their brain, except everybody has a different conception of what the actual dataflow is. The first time you show them an objective representation of this, it’s like Google Maps. Their reaction is, ‘I didn’t realize I was also depending on this!’ I’ve always been amazed by the reaction of the head of data when he sees, for the first time, the Google Maps-objective view of his ecosystem, because he had that set of assumptions — some are true, some are wrong.”
Observability gives rise to a common language across all your teams.
“You need to understand how the data is flowing across all of your teams, and if you can make that visible to everybody, it’s like everybody starts to have a common language: ‘Oh, I understand — I am this piece, and the downstream consumer is this team, and I depend on that upstream team. That observability is really key if we want to avoid all of the silos that happen naturally in a large enterprise.”
Observability helps shut down the blame game and cultivate a sense of shared responsibility.
“The person who is looking at the system at a high level can observe everything and can try to optimize different processes, but with observability, even the teams that are a part of those processes gain a better understanding of the consequences of their actions. If you’re the upstream guy controlling an ETL process, and you know that if that process fails, the targeted marketing and targeted ad campaign will basically stop working, you feel like you have an ownership stake in their success.”
Observability helps you be smarter about what data you collect and why.
“In the early days of the data revolution, people said, ‘Collect the data, and then we’ll figure out what to do with it.’ But when you start to see a petabyte of data getting accumulated, then you sort of say, ‘Well, we need to be a little smarter about which data matters and which doesn’t matter.’ And I think it leads to questions like, ‘what is the SLA, what is the business value associated with a piece of data that is being stored, processed, collected?’ Part of the context and the metadata around data is actually maybe speaking to the question, ‘is this really good for the business?’”