Long live data engineering.
The shift from human-consumed data to agent-consumed data changes everything about how organizations should think about their data infrastructure.
The Insufficient Investment
The obvious response to AI transformation is to invest in better models, more compute, and tooling that lets knowledge workers prompt their way to productivity. This is wrong; or at least, it’s insufficient in a way that will prove expensive. The first wave of enterprise AI initiatives has a failure rate that should alarm anyone paying attention. The bottleneck isn’t intelligence, there’s plenty. It’s context: whether your AI systems have the architectural foundation to make decisions that compound value rather than compound errors.
Consider an AI agent handling customer support escalations. It needs customer history spanning support tickets, billing records, product usage, and previous interactions. Each lives in a different system with a different refresh schedule, a different definition of “customer,” and a different notion of what constitutes a complete record. A human support agent navigates this through judgment and institutional knowledge. An AI agent processes whatever data it receives and acts with confidence.
The error doesn’t announce itself as an error. It presents as a decision.
Error Multiplication
This distinction matters because of error multiplication.
In the BI era, bad data surfaced as weird dashboard numbers. An analyst would squint at a chart, say “that doesn’t look right,” and trace it back to a broken ETL job or a schema change that didn’t propagate. The feedback loop was slow but self-correcting. Humans were the error-detection layer.
AI agents don’t squint. They don’t have institutional intuition about what “looks right.” When context degrades, they don’t slow down; they keep making decisions, each one building on the last, each one potentially wrong in ways that compound. By the time someone notices, you’ve offered a refund for a product that was never purchased, or routed a whale account through your lowest-touch support tier, or made a hundred smaller mistakes that individually look like normal operations.
The architectural problems that were annoying in the dashboard era become existential in the agent era.
Why Data Engineers, Why Now
Data engineers have spent the last decade building systems optimized for getting data into warehouses so analysts could query it and build dashboards. Essential work, but organizationally invisible; the corporate equivalent of plumbing.
The agent era inverts this entirely.
The question is no longer “can we get this data into a dashboard?” It’s “can we get this context into a decision?” That requires solving a different class of problems: entity resolution across systems, handling late-arriving data, determining acceptable staleness thresholds for different decision types, maintaining lineage when the same entity exists in six different schemas. These are data engineering problems. Not data science problems, not analytics problems, not ML engineering problems.
And the failure modes are categorically different. When an analyst builds a flawed dashboard, someone eventually notices the numbers look wrong. When an AI agent makes decisions based on degraded context, the system continues operating, with compounding errors that may not surface until the damage is substantial.
Data engineers know how to build reliability into data systems. They understand idempotency, schema evolution, backfill strategies, data quality contracts. In a world where data systems make decisions rather than inform them, this expertise isn’t operational overhead. It’s the whole game.
The Strategic Mistake
Most organizations are treating AI transformation as a layer on top of existing data infrastructure. Add an LLM here, bolt on an agent there, connect it to the data warehouse, ship the demo.
This works for POCs. It breaks when you deploy agents into production workflows where decisions have consequences and errors compound.
The organizations that get this right will do three things differently:
-
Treat data engineering as a strategic investment, not a cost center. Not in the abstract “people are our most important asset” sense. In the concrete “we’re reallocating budget to rebuild our data architecture around agent consumption patterns” sense. The metric isn’t “can we reduce data engineering headcount?” It’s “what is our architectural capability, and are we investing to expand it?”
-
Recognize orchestration as the context layer. The reason orchestration platforms matter in the agent era isn’t that they move data around. It’s that they encode the meaning behind how data flows through an organization — the dependencies, the business logic, the quality gates, the timing constraints. This operational context is exactly what determines whether agents make good decisions or expensive mistakes.
-
Let meaning live in code. Semantic layers that live outside the codebase drift from reality immediately. Data quality rules in spreadsheets become outdated the day they’re written. Code is the only source of truth that scales and the only place where the gap between intention and execution can be kept small enough to trust.
What Comes Next
The organizations that treat data engineers as strategic assets, as the architects of their AI strategy rather than its executors, will be the ones that successfully deploy agents into production. Everyone else will keep shipping demos. Agents don’t just need data. They need context. And context layers don’t emerge from better prompts or more powerful models. They emerge from deep architectural work, maintained by people who understand that reliability isn’t a feature, it’s the foundation everything else depends on.