Stop Letting AI Guess: The Case for Semantic Intelligence

Mar 16, 2026
James Nguyen
Stop Letting AI Guess: The Case for Semantic Intelligence

AI doesn't know what your data means. It infers. Anyone who has used OpenAI or Claude has experienced hallucinations and rolled their eyes when they realize that although LLMs aim to please (i.e., are sycophantic), they are often wrong. And when the inference is wrong, when "revenue" means something different to finance than it does to sales, or a vector database and a relational warehouse are drawing from contradictory definitions, the model doesn't flag the uncertainty. It returns a confidently stated answer that happens to be wrong.

That tension was the starting point for our March 2026 webinar, From Data Governance to Semantic Intelligence, where we gathered data leaders and practitioners from across the globe to ask an important question: are AI initiatives connected to the metadata and governance infrastructure that would make them more trustworthy at scale?

The answer, from a room full of practitioners who are living this problem, wasn't a comfortable one. Here's what we learned, and why the gap is smaller than it looks if you start with meaning instead of access.

Few (If Any) Are Where They Need to Be: What the Poll Data Revealed

As the session began, we ran a live poll with one question: How integrated are your AI initiatives with your metadata and governance layer? The options ranged from "not integrated at all" to "fully orchestrated," with AI agents pulling context, semantics, and policies from a unified governance layer.

The poll results were not surprising given we're still in the early innings of AI adoption in large enterprises, but also concerning.

100% of respondents were in the bottom two tiers. Not a single attendee, including data leaders from Fortune 500 companies across energy, manufacturing, and food services, reported that their AI was even moderately integrated with their metadata and governance layer.

This isn't a criticism of the audience. They're precisely the ones closest to the problem. One attendee, a product manager at a healthcare data company, put the timeline pressure plainly in the chat:

"It takes at least 3–5 years to get to this stage for a large org without agents. I'm not sure about the timeline with agents."

Whether agents can compress that timeline (while increasing AI and data trust) is exactly the question the rest of the session was built to answer.

Why AI Governance Has Officially Outgrown Human Capacity

Mike Ferguson, CEO of Intelligent Business Strategies and conference chair for Big Data LDN, opened with a diagnosis that anyone running a distributed data estate will recognize.

Data complexity has exploded. What used to be a handful of databases and a data warehouse is now a fully distributed estate: SaaS vendors on third-party clouds, multi-cloud infrastructure, on-premises systems, and edge data across every format, all growing simultaneously. Organizations responded by buying best-of-breed tools for every governance discipline. The result is siloed metadata repositories, policy drift across systems, and an army of administrators manually enforcing standards they can no longer see end-to-end.

"We end up doing firefighting — far more reactive than proactive." — Mike Ferguson, CEO, Intelligent Business Strategies

Now the same fragmentation pattern is repeating with AI. More than 50 agent-building tools are proliferating across enterprise environments. Every platform vendor ships its own semantic layer, its own knowledge graph, its own understanding of what your business data means, with no interoperability guarantees between any of them. Mike's warning, drawn from a client conversation years ago, has aged into a mandate:

"Everyone's blindly integrating data with no attempt to share what they create. We can't do that again."

His prescription: governance that is always on: continuous monitoring, continuous triage, and continuous action across a data estate too large and dynamic for weekly dashboards or quarterly audits to meaningfully cover. And at this scale, that requires AI agents, because the complexity has surpassed what people alone can manage.

"I need to continually govern data security, privacy, data quality, retention, sharing, and usage. It's not periodically, once a day or once a week. Always on is what I really want — and I need agents to help me do that. We are way beyond the point where humans can handle this."

AI Agents Don't Call Anyone

For the past decade, the primary consumer of organizational data was a human analyst: someone with domain context, business intuition, and the ability to track down Harsha in Slack to ask what a field actually means. That era is giving way to something fundamentally different.

As Collate CMO Steve Wooledge put it, agents are becoming the dominant consumer of data, and they can't call anyone (though some OpenClaw.ai advocates may debate this). They infer. And without sufficient context about what data means, including its lineage, business definitions, governance policies, and relationships to other data, they'll infer wrong. Confidently.

"Every frontier model has access to the same training dataset. What we need to make sure is that we're training our agents on our IP — the data that's unique to our specific business." — Steve Wooledge, CMO, Collate

Semantic intelligence is the mechanism for that.

"Semantic Intelligence unifies definitions, rules, relationships, and context so both people and AI interpret data the same way. It is the foundation that makes AI trustworthy, explainable, and safe."

That means capturing not just technical metadata (what the data is, where it came from, who owns it), but shared meaning: common definitions, controlled vocabularies, RDF ontologies, rules, and constraints, exposed in a machine-readable form that agents can reliably act on. Collate's vision states is to, "Make data people and AI-ready, by turning metadata into shared meaning."

What 7 Agents Running on Autopilot Looks Like

Harsha Chintalapani, CTO and co-founder of Collate, walked the audience through a live Collate platform demo that brought what's possible to life.

The moment you connect a data source, whether Snowflake, ClickHouse, BigQuery, or any of 120+ native connectors, eight agents activate in parallel, automatically:

  1. Metadata Ingestion Agent — pulls technical metadata: tables, columns, schemas

  2. Usage and Lineage Agent — analyzes query logs, maps table relationships, surfaces your most-used assets

  3. Profile Agent — checks data quality: nulls, primary key violations, missing dimensions

  4. Auto-Classification Agent — tags sensitive data (PII and beyond), learns custom classification rules

  5. Tiering Agent — uses usage and lineage signals to rank your most critical assets

  6. Documentation Agent — auto-generates documentation from query analysis, schema structure, and sample data

  7. Quality Agent — attaches appropriate data quality tests based on profile findings

The agents collectively build and maintain a Semantic Metadata Graph — storing every operation as RDF triples in real time, creating a live SPARQL-queryable map of your entire data estate. No manual setup or consultant engagement required. Connect a service and your data estate is governed on autopilot.

"Any metadata operation that is happening is updating the knowledge graph in real time. And what does that give? It gives a SPARQL endpoint — an ability to reason through the relations. The same tooling is now available to LLMs to query those metadata relationships directly." — Harsha Chintalapani, CTO & Co-Founder, Collate

OpenMetadata ships 700+ specifications in JSON Schema and RDF/JSON-LD, covering tables, columns, users, teams, pipelines, dashboards, agents, and more. These are openly available, standardized, and serve as the foundation for cross-system semantic consistency.

Can Metadata Actually Enforce Policies, Not Just Describe Them?

One of the most technically engaged attendees was a VP of Data Enablement at a Fortune 500 hospitality company, who asked:

"Can we articulate metadata-powered actionable/enforceable policies within Collate/OpenMetadata?"

The short answer is yes, and enforcement goes back to the source.

Tags and governance policies applied within Collate automatically sync back to the originating systems: Snowflake, BigQuery, ClickHouse, Redshift, and more. Governance is centralized in Collate but enforced natively where data lives. This is a key distinction from catalog tools that are merely read-only observers, able to describe what a policy should be but unable to drive enforcement.

"One thing unique about the Collate platform is we automatically sync back the data to your source system. A tag or a policy you've applied in Collate goes back to your Snowflake, your BigQuery, your ClickHouse, your Redshift — so that you can enforce policies and centralize governance." — Harsha Chintalapani

For governance actions that shouldn't be fully automated, glossary approval workflows add a human-in-the-loop layer. When an AI agent proposes a new business term definition, a designated reviewer approves it before it joins the shared ontology. The agent handles the drafting but a human has to sign off.

Can Metadata Align Relational and Vector Databases?

Eric Kavanagh, CEO of The Bloor Group, asked one of the most architecturally important questions of the session:

"Can metadata be used to align relational and vector databases?"

RAG pipelines run on vector databases. Analytical workloads still run on relational systems. Left ungoverned, these become two parallel data universes with no shared understanding of what their content means, which is the exact fragmentation problem that's plagued structured data for years, now repeating at AI speed.

The Collate approach addresses this through the knowledge graph, a common semantic layer that sits above storage. Whether data lives in Snowflake, a vector store, a document database, or a BI dashboard, the metadata platform maintains a unified ontological view of what that data represents, how it relates to other data, and what governance policies apply.

In the live demo, Harsha illustrated this directly. When a data scientist queried the conversational interface for customer data, the system didn't simply return a table name. It surfaced the concept, its lineage, its business glossary definition, its quality profile, and its governance tier, then generated a storage-native query specific to where that data actually lives.

"Because we know where this table is coming from, what data it contains, the data quality and the governance — we can let the LLM create a platform-specific query. The LLM is aware this is a Snowflake table and creates code compatible with Snowflake rather than some generic SQL that may not work." — Harsha Chintalapani

The metadata layer acts as translator, and the semantic layer provides the common language across storage paradigms. Together, they make complex, heterogeneous states governable.

Open Standards and Where OpenMetadata Fits

The same attendee pushed further in Q&A, asking whether OpenMetadata expresses entities through formal ontological specifications, specifically referencing OWL/RDF and the Linux Foundation's Open Data Product Standard and Open Data Contract Standard.

Harsha's answer confirmed a standards-grounded architecture:

  • All 700+ specifications are available in JSON Schema and RDF/JSON-LD

  • The knowledge graph is stored as RDF triples, queryable via SPARQL

  • The ontology builds on DCAT (W3C Data Catalog Vocabulary), adding domain-specific nuance: tables are structured schemas, pipelines are data creators, dashboards are analytical views consumed by users

  • Glossary term relationships can be defined with custom RDF predicates, cardinality, and associative, hierarchical, or inverse relationship types

"We define around 700 specifications across the data infrastructure — covering your tables, columns, all the way to users, teams, how they're operating on data, including agents. All of these specifications are available in JSON Schema and RDF/JSON-LD. You can interact with the metadata through SPARQL to traverse relationships — just like I showed you, where you can see that a table belongs to a database, and those relationships are well-defined." — Harsha Chintalapani

On the Linux Foundation Open Data Product and Contract Standards: the spirit is aligned. Data products in Collate carry metadata contracts governing usage, sharing, and quality. Harsha noted the team will publish more detail on formal standard alignment.

How the AI SDK Opens the Platform Architecture

One of the most forward-looking moments of the demo came at the end, when Harsha introduced the Collate AI SDK.

Think of it as an API layer for agents. The same agents running inside Collate for documentation, quality, classification, and governance triage, can be called by external agents across your organization in natural language. Any team building an AI application can integrate Collate's semantic intelligence capabilities without rebuilding them from scratch.

"You can call the agents that you've built in AI Studio in natural language, delegate certain roles and responsibilities, and get back a response. You're using the libraries we've built to make calls to AI agents living in Collate — so you can integrate this across your organization, not just within Collate itself." — Harsha Chintalapani

The question from the chat, "Can we chat with all of this via MCP?", got a strong confirmation. OpenMetadata ships an MCP (Model Context Protocol) server in open source, and enterprise customers are already using it for agent integration.

Where to Go From Here

The poll said it plainly: most organizations have AI initiatives that can see their data but don't fully understand it. One step above flying blind, and potentially worse when AI confidently hands back a wrong answer. The questions from this audience made clear that they already know and that they’re looking for a path forward

Shipping trustworthy, production-ready AI isn't a tooling problem. Most organizations already have enough tools. It's a data and business meaning alignment problem. Getting shared context defined, governed, and machine-readable is the work that unlocks everything else, including agents that earn trust rather than just generating output.


Watch the full webinar recording here or request a demo to see the semantic intelligence platform in action for your own data estate. For a deeper technical treatment of the architecture, download Mike Ferguson's Semantic Intelligence White Paper.

Ready for trusted intelligence?
See how Collate helps teams work smarter with trusted data

Keep Reading