AI Agents Have a Context Problem. It's Hiding in Your Pipelines.
8 min read |
There's a moment most data teams hit a few months into AI deployment, where something quietly breaks. Not spectacularly, not in a way that triggers alerts. An agent answers a question confidently, a stakeholder acts on it and someone eventually notices the number is wrong, or that the decision was based on a metric that means something different to that team than it does to the one who built the data pipeline.
The post-mortem almost always lands in the same place: the model wasn't the problem, the context was. This keeps happening and the industry hasn't fully reckoned with why it's so much harder to fix than it sounds.
The Gap Isn't in the Model
Consider what a seasoned analyst brings to a simple question: "What was our churn rate last quarter?" They know "churn" has three definitions in the organization. They know which one the board uses, which one will get challenged in the Thursday business review, which table is authoritative, and which one a contractor built six months ago and no one has touched since.
An agent has none of that. It has a schema, a question, and a language model that is very good at generating plausible-sounding answers. The problem isn't that the agent is wrong. It's that correctness, not just coherence, requires context the agent can't see.
That context lives in the history of how data was produced, and in what it means, who governs it, and how it relates to everything else in the data estate. And most agents have no way to reach it. That history doesn't live in a dashboard or a data catalog. It lives in the orchestrated workflows and pipelines that every business already runs on, recorded with every execution, and largely invisible to the agents now being asked to act on it.
Pipelines vs Agents: Why Closing the Gap Between Them is Hard
Your data pipelines were built to be predictable. A developer writes the logic. The pipeline runs. The output is auditable. If something breaks, an engineer reads the logs and fixes it. The whole model assumes humans author the logic and humans diagnose the failures.
Agents work differently. They receive a question, reason about which data to retrieve and the tools needed, formulate a strategy, and decide whether the answer is good enough to return. The logic is inferred, not authored. The output depends on a reasoning path that can differ across invocations even with identical inputs.
This asymmetry is what makes governed context non-negotiable. A deterministic pipeline can tolerate a gap in documentation. A probabilistic agent will fill that gap with inference. Sometimes the inference is right. Often it isn't. And it will always sound confident either way. Scale that across the many agents most enterprises are now deploying, each with its own partial view of the data, and the context problem compounds.
Orchestration: Where Operational Context Lives, Evolves, and Compounds
For most enterprises, the richest source of context about their data isn't a catalog or a wiki. It's their orchestration layer.
Apache Airflow is the de facto standard for data orchestration at enterprise scale. Whenever data moves between systems, gets transformed, checked for quality, has features extracted, or has permissions enforced, that work flows through orchestration. The majority of enterprise data operations already run this way. Its integration footprint spans 120+ provider packages and over 2,100 integrations across every major cloud, warehouse, and SaaS tool. That's not incidental. The orchestration layer is uniquely positioned to build operational context because it sits in the execution path across every system in your data platform. It's the one place where decisions about how data moves, transforms, and gets delivered are actually made, not reconstructed after the fact. A catalog tells you what data exists. A warehouse tells you what it contains. Only the orchestration layer was present when the decision happened. And every one of those executions leaves a record: which table was read, when it last refreshed, whether quality checks passed, how long the run took, whether it failed two weeks ago and what resolved it.
That operational history is institutional memory encoded in execution. It tells an agent not just what the data is, but whether it can be trusted right now.
A data engineer can look at a pipeline run and immediately know whether the downstream metric is reliable. They've watched the pipeline long enough to know which upstream sources degrade on Monday mornings, which transformations are fragile, and what a suspicious run duration looks like. An agent querying the same table sees a row count and a timestamp. It has no idea whether the pipeline that produced that table ran clean this morning or silently fell apart three days ago.
This is the context gap and it isn't solved by better prompting or a larger context window. It's solved by making the history of data production legible to the agents, and connecting that operational signal to the business meaning, governance, and relationships that give it context.
Why Static Metadata Alone Doesn't Close It
The instinct is to treat this as a metadata problem. Better data dictionaries, a more complete catalog, descriptions on every dbt model. That work matters, but it doesn't close the gap.
Static metadata doesn't reflect runtime reality. A description written three months ago doesn't tell an agent that the SLA for that table degraded last Tuesday, or that an upstream source started sending nulls in a column that used to be fully populated. The agent needs context that is current, not just accurate at the time someone wrote it down.
The most valuable context is also the hardest to document. Why was this metric defined the way it was? What business decision drove that transformation logic? Those answers live in design docs, Slack threads, and the heads of people who may no longer be at the company. Getting that into a form an agent can use requires more than a text field in a catalog.
And governance doesn't automatically follow from description. An agent knowing what a table contains is different from an agent knowing whether it is authorized to use it for a specific task. That distinction, between describing data and governing its consumption, is where a lot of enterprise AI deployments are currently exposed.
A catalog tells you what an asset is, orchestration tells you what state it's in right now. Agents need both, and neither substitutes for the other.
But even if you solve for all of this, there's a deeper reason this stays hard: context decays. Engineers make corrections that never propagate to the rest of the team. Conventions drift as new people join and old ones leave. A platform team codifies standards in a wiki that's outdated within a quarter. The problem isn't just that context is difficult to capture once, it's that context is alive, and keeping it current requires infrastructure most enterprises have never built. That's why teams that treat this as a documentation sprint keep ending up back at square one. The context layer isn't a one-and-done project. It's a system that has to accumulate, correct, and compound continuously, or it falls behind the organization it's supposed to serve.
What the Enterprises Getting This Right Have in Common
They've stopped treating context as a metadata project and started treating it as infrastructure. They recognize that agents need context that is governed, current, and traceable. And they've recognized that the orchestration layer, the same one already producing their data, is where operational context already lives. The infrastructure question becomes how to connect it to all other enterprise context.
Critically, they aren't choosing between a separate catalog and an orchestration layer. They're connecting them. Governed definitions, lineage, and ownership from the catalog flow into the pipelines agents consume. The orchestration layer enforces those definitions at execution time and surfaces the runtime signals, freshness, quality, and history that a catalog can’t produce on its own. Each layer does what the other cannot.
That means business logic that is codified and enforced, not just described. Operational history surfaced alongside the data itself. Governance that travels with context rather than living in a separate system. And feedback from agent behavior flowing back into the pipelines that produce context, so the environment improves over time.
None of this is glamorous. It doesn't make for a good demo. But it is what separates agents that earn trust from ones that get quietly sidelined after the third confident wrong answer.
The Work Starts Before the Model
A lot of enterprise AI investment right now is flowing toward the model layer: inference, agents, MCP servers. That investment isn't wrong, it is incomplete.
The model is only as good as the context it operates within. In most enterprises, that context is fragmented, ungoverned, and invisible to agents at the point of decision. Closing that gap is a data engineering problem. It is the work of making the infrastructure that produces data legible, governed, and continuously current.
The organizations that build this foundation won't just have better AI. They'll have a context layer that compounds in value with every pipeline run, every agent interaction, and every correction. That's the moat, and it starts well before the model.
The context layer conversation is happening across the entire data stack right now, and we’re excited to be building toward it alongside partners like Atlan. See how we’re approaching it together at Atlan Activate, so you can close the gap between AI pilots and AI that ships.
Get started free.
OR
By proceeding you agree to our Privacy Policy, our Website Terms and to receive emails from Astronomer.