AI for Data Needs a Context Layer: Lessons from AI for Code

I keep getting asked when AI for Data will have its Claude Code moment.

The question is backwards. Claude Code happened because the world it walked into had been deliberately designed for twenty years before any model showed up. Git, typed interfaces, package managers, dependency graphs, code review, design documents, CI/CD. Engineers built every one of those for themselves, and AI inherited the surface.

I've spent my career building the equivalent for data. At Hortonworks I built Apache Atlas, the first serious attempt to make enterprise metadata machine-addressable. At Uber I was the chief architect for data, and I built Databook because no commercial catalog could keep up with the scale or the rate of change. Then I co-founded OpenMetadata because every one of those efforts hit the same wall: a missing layer of context underneath every tool. Every time I look at where AI for Data is today, I see the same shape of the problem we hit at Hortonworks and at Uber, only now it costs more.

The world AI for Code already lived in

Code is a designed system, and that design predates the LLM by two decades. A function has a signature. A dependency is declared. A commit has an author and a diff. A type tells you what's allowed and what isn't. Every one of those is a contract a machine can read. When Claude Code, Cursor, or Copilot opens your repo, it's reading a structure that an industry built on purpose.

That foundation is what made the productivity gains real. As my co-founder and our CTO Sriharsha Chintalapani put it in our recent launch:

"The data team has been the last persona in the enterprise to get a real AI workflow. Engineering got Claude Code, Cursor, and Copilot because code already lives inside a designed system of context. Data did not."

The data team has been watching this happen from the outside.

AI for Code vs AI for Data comparison

The world AI for Data actually walks into

From the outside, Uber looked like a world-class data company. From the inside, the data was broken. Multiple teams each kept their own definition of ARR. Finance computed it one way for the board; sales computed it another way for quotas; the warehouse team had its own version baked into a table. Nobody flagged the conflict for months. Every executive question that touched ARR returned a different number depending on who you asked, and the disagreement was invisible because everyone was right inside their own context.

That was a decade ago, and the data analyst asking what Q3 revenue is today still gets four different answers depending on which dashboard she opens. The CDO trying to deploy AI agents at scale still watches them give confident wrong answers on the simplest business questions. Better documentation didn't close the gap. A second governance tool didn't close it. Another data quality vendor didn't close it. Those approaches treat the symptom. The disease is that the data team's "context system" is a hand-stitched assembly of dbt for metrics, a separate governance tool for policies, Confluence for documentation, a hand-rolled MCP server somewhere for the AI bridge, and Slack for the decisions nobody ever wrote down. None of those pieces share a data model, an ontology, or an audit trail.

When an AI agent walks into that landscape, it returns whichever answer it hits first. Confident, fast, but incorrect.

The benchmark that should end the model debate

There's a benchmark every data leader should know about before approving the next AI initiative. Spider 1.0 is the academic text-to-SQL benchmark, with clean tables and well-documented schemas. On Spider 1.0, frontier models score above 90%.

Then someone built Spider 2.0, which uses real enterprise schemas with the ambiguity, the overlapping definitions, and the implicit business logic that actual production data carries. On Spider 2.0, GPT-4o drops to 10.1% accuracy. Sonnet 4.5 drops to 10.8% accuracy. Same models. Same intelligence. An eighty-point accuracy collapse on the same task.

Spider 2.0 benchmark accuracy collapse

When the same queries are grounded in a semantic context graph that encodes business meaning, governed metric definitions, and the relationships between concepts, accuracy on Spider 2.0 lifts back into the high seventies. Same models. Different layer underneath. The lift comes from the layer the model can reason over, not from a smarter model.

Collate enables significant accuracy improvements, as shown on the Spider 2.0-Snow text-to-SQL benchmark.

Intelligence without the context layer grounding is guessing at scale, and no future model release fixes that.

Gartner reached the same conclusion independently. At their Data and Analytics Summit, Distinguished VP Analyst Rita Sallam stated, “without context — a clear understanding of the relationships and rules within an organization's data — AI agents cannot operate accurately and are far more likely to hallucinate, introduce bias, and produce unreliable results.” Traditional schema-based data models alone no longer suffice for agentic AI because they lack business context and data meaning. Gartner's prediction: by 2027, organizations that prioritize semantics in AI-ready data will increase agentic AI accuracy up to 80% and cut costs up to 60%. We have been making this argument for two years. It is good to have the confirmation.

We've been here before. Twice.

This is where my pattern-match gets uncomfortable, because I've watched the data industry hit the same wall three times.

Yesterday, the problem was context for people. Metadata was scattered across legacy catalogs, modern catalogs, lineage tools, DQ tools, and governance silos. Every team stitched together its own view. We spent a decade building metadata platforms to fix this. Apache Atlas was an early answer. OpenMetadata is where that arc landed. We made real progress with a unified metadata graph.

Today, the same fragmentation pattern is repeating in semantics. dbt has a semantic layer. Looker has LookML. Cube and AtScale build their own. Snowflake Cortex ships its own. Every AI tool ships its own. This leads to the same word potentially meaning different things in five different places, where business meaning is duplicated and drifting, and nobody can tell which definition the next AI agent will use. This is rebuilding the fragmented metadata problem in semantics, and the cost is higher this time because AI doesn't pause to ask a colleague what the field means.

Tomorrow, the same thing happens to memory. Every agent your team builds keeps its own preferences. Its own user history. Its own feedback loop. Nothing is shared across agents. The correction your analyst made on Tuesday doesn't carry over to the agent that ships on Friday. Six agents in, no two of them agree on what "revenue" means, and a single change to one metric forces six rebuilds. There is no compounding asset and every agent starts from zero.

The picture is the same in every organization I talk to. Context is scattered across catalogs, warehouses, and lineage tools. Semantics live separately in dbt, BI tools, and analytic apps. Memory is isolated per agent — Agent 1, Agent 2, Agent N — none of them learning from each other. What enterprises actually need is one brain: shared memory that learns continuously, context that is complete and connected, and one shared meaning across every system. The question every data leader has to answer is which side they are building toward.

The lesson from yesterday is the lesson for today and the lesson for tomorrow. You solve it once, at a unified layer, and not at every tool.

Why every vendor suddenly sounds the same

If you've been to a data conference in the last twelve months, you've watched the word "context" replace the word "metadata" in nearly every vendor pitch. Atlan calls itself a context platform. Databricks positions Unity Catalog as a context layer. Snowflake Horizon, Glean, and most modern catalog and warehouse vendors have rebranded around context.

This is a healthy development for our industry. For the first time, the conversation has stopped being about whose model is bigger/faster and started being about what surrounds the model. We've collectively figured out that the bottleneck is the context layer.

The trouble is that "context," as most vendors define it, is some version of what data already describes about itself: schema, lineage, quality signals, governance metadata, freshness, ownership. That tells you what data is, while leaving out what it means and what your organization has learned about it.

AI for Data needs three pieces. Context is one of them. Semantics is the formal modeling of business meaning, so an agent reasoning about customer lifetime value arrives at the governed metric definition rather than the closest column name. Memory is the auditable record of every correction and decision your organization has accumulated, so the next agent inherits what the last one learned.

The three pieces are interdependent. Context without semantics is a richly described inventory that AI still doesn’t understand. Semantics without context is an abstraction with no data to bind to. Memory without the other two is a log of corrections nobody can act on. The picture only comes together when context, semantics, and memory are designed as one foundation.

Context, semantics, and memory designed as one foundation

A platform that ships only context is half the problem solved with a new name. A context-only platform breaks at the third or fourth AI agent reasoning over conflicting definitions. The ones that hold up treat context, semantics, and memory as three required pieces of the same foundation, designed to work together from the start.

Build the open context layer. Then build on top of it.

The architecture we've been working toward — across Apache Atlas, Databook, and OpenMetadata — comes down to two halves, and I want to name them clearly, because we are deliberately building them as two distinct things.

The open context layer is the foundation. It connects every data source in your estate into one machine-readable graph. It formalizes what your business actually means across metrics, entities, hierarchies, and relationships. This way, an agent reasoning about "revenue" arrives at the governed definition rather than the closest-matching column name. And it captures every decision your team makes about that meaning including corrections, classifications, and approvals as a permanent, reusable object. Three primitives, designed together: context, semantics, and memory. The whole thing sits on the open standards that made the web work — RDF, OWL, JSON-LD, MCP — so any model and any tool can read it without a proprietary adapter. We're building this in OpenMetadata, in the open, because the substrate underneath enterprise AI cannot belong to one vendor.

AI for Data is the platform that runs on top of the foundation with key attributes such as:

Conversation as the primary interface instead of navigation.
Agents that classify, document, and govern at scale, instead of stewards manually tagging assets.
Analysts asking questions in natural language and getting answers grounded in the governed graph instead of in the closest-matching column name.
Memory that compounds across humans and agents, so the correction your analyst made on Tuesday flows into the agent that ships on Friday.

This is the layer data and analytics teams use for their day-to-day work. This is the productivity gain, the thing that looks like the Claude Code moment everyone is waiting for. We're putting it to work in Collate, because at some point the open context layer has to translate into the experience the data team actually has.

The order is what matters most. Code did the substrate first, painstakingly, over two decades. Then the AI workflow, in the last three years, on top of it. That sequence is what made Claude Code, Cursor, and Copilot work. The eighty-point Spider 2.0 collapse is what happens when AI for Data is built without an open context layer underneath.

Why this has to be open

The infrastructure choice that defines the next decade of enterprise AI is whether the open context layer is open or proprietary. I made my call early. OpenMetadata is Apache 2.0. Today it has more than 13,000 practitioners, 3,000 enterprise deployments, 450 contributors, and millions of downloads.

Your AI strategy should not be held hostage to one vendor's solution or one model vendor's runtime. The LLMs and the clouds are choices you should make on cost and capability, year to year. The open context layer is the constant underneath. Built right, it travels with you across every model release and every cloud migration. Sapir Hirshberg, Senior Data Product Manager at Wix made the point well:

“OpenMetadata gives us a trusted foundation for AI-driven decision-making, letting our teams innovate faster and more confidently across the business.”

What leading AI innovators had to build to make AI work

If you want a builder-to-builder forcing function, look at what OpenAI did internally.

A team there built Kepler, an internal data agent that now serves more than 4,000 employees across 600 petabytes of data, 70,000 datasets, and 15 different data tools. It runs on GPT-5.2, the same model their customers use. What makes Kepler work is the six layers of context the team wrapped around the model: schema metadata and lineage, curated descriptions from domain experts, code-derived definitions extracted from pipeline source code, institutional knowledge from Slack and Docs, a memory layer that captures corrections and applies them to future answers, and live runtime validation. OpenAI's context layer runs on OpenMetadata.

How OpenAI's Kepler data agent works

If the company that pioneered the world's best models had to build a context system around the model to get it to work on their own data, no enterprise gets to skip that step.

As Bonnie Xu of OpenAI shared with us, “Memory and a self-learning process are crucial to the agent continuously improving."

Also consider Anthropic’s work in this area. They published how they use Claude for self-service analytics internally. Ninety-five percent of business analytics queries automated, at roughly 95% accuracy. Their diagnosis of why agentic analytics fails matches OpenAI's exactly: entity ambiguity, data staleness, and retrieval failure. Their fix is also the same: canonical governed datasets, a semantic layer of compiled metrics, and structured procedural knowledge layered on top.

The number worth sitting with: without that structured context layer, Claude's analytics accuracy on their own internal data was 21%. With it, consistently above 95%. They also tested the obvious shortcut, giving the agent grep access to their entire SQL corpus, every dashboard query, every transformation. Accuracy moved less than one point. The bottleneck was never access to prior queries. It was the structure that let the agent map a question to the right governed answer.

Two of the most sophisticated AI builders in the world, working independently, arrived at the same architecture. But instead of building it yourself, use Collate to get the results you expect without starting from scratch.

The pattern doesn't change. The order does.

Every successful enterprise AI deployment I've seen built the open context layer first, and then built AI for Data on top of it. Every stalled one tried to do them at the same time, on a foundation that wasn't ready, or skipped the substrate entirely and pointed a model at raw schema. The order in which code did this with putting the substrate first and the workflow second, is not optional. It's what makes the workflow work.

Code took twenty years to build its foundation. We do not have twenty years. But we do know the shape of the thing we have to build, and we know the order in which to build it. The open context layer is step one. AI for Data is step two. Every team that reversed that order is living with the consequences. Every team that got the order right is shipping.

See how Collate puts the open context layer to work across your data estate. Schedule a demo:https://www.getcollate.io/contact-sales

AI for Code Got Its Context Layer. AI for Data Is Still Waiting.