How much did ontology grounding improve Cortex agent accuracy?

In Snowflake's biomedical benchmark, a baseline semantic-view agent scored 50% success. Adding a knowledge graph raised it to 60%, GraphRAG to 70%, and GraphRAG with curated term mappings to roughly 78%.

What is the difference between a semantic layer and an ontology?

A semantic layer maps tables, columns, and joins to business entities and metrics. An ontology adds a deeper layer of meaning: class inheritance, transitive and inverse relationships, equivalence mappings, and synonyms that a semantic layer alone may not capture.

What is GraphRAG and why did it outperform a knowledge graph in the benchmark?

GraphRAG precomputes a denormalized profile for each concept (its name, definition, synonyms, and aggregated descendant data) and indexes those profiles for search. In Snowflake's test it beat the seven-tool knowledge graph agent with only two tools, partly because a smaller decision space produced fewer orchestration errors.

How does SemanticOS relate to ontology-grounded agents?

SemanticOS is a knowledge-graph and AI-search layer that connects fragmented enterprise tools so people and AI agents can reason over institutional knowledge. It applies the same principle Snowflake demonstrates: ground agents in typed relationships, not raw schemas.

Ontology-Grounded Reasoning for Snowflake Cortex Agents

Q: What is ontology-grounded reasoning in Snowflake Cortex agents?

Ontology-grounded reasoning means a Snowflake Cortex agent answers questions using a domain model of typed entities and relationships (class hierarchies, synonyms, and constraints) layered on top of raw tables, instead of relying on column names and joins alone.

TL;DR: Snowflake tested whether grounding Cortex agents in an ontology (typed entities, class hierarchies, and synonyms) beats letting them reason over raw tables and joins. It does. On a 22-question biomedical benchmark, a plain semantic-view agent answered 50% of questions correctly; adding a knowledge graph pushed that to 60%, GraphRAG to 70%, and GraphRAG plus curated term mappings to roughly 78% (Snowflake, 2026). The lesson for anyone building enterprise AI: structure your knowledge first, then point the agent at it.

When a data platform the size of Snowflake publishes a benchmark showing its own agents answer business questions better when grounded in an ontology, the message is hard to miss. The schema you already have is not enough. Ontology grounded reasoning in Snowflake Cortex agents works because most of the meaning in enterprise data never appears in a column name.

This post walks through what Snowflake built, the numbers it reported, and why the result matters well beyond healthcare data.

Why do raw schemas fail AI agents?

Most enterprise AI still operates over relational abstractions: tables, columns, keys, and joins (Snowflake, 2026). That works for “sum revenue by region.” It breaks the moment a question depends on meaning the schema does not encode.

An ontology is a formal model of a domain: classes, typed relationships, constraints, and canonical identifiers. Healthcare uses SNOMED CT and the Gene Ontology; supply chain uses GS1; financial services uses FIBO (Snowflake, 2026). These assets define things a table cannot: that a capacitor is a kind of electronic component, that two terms mean the same thing, that one disease category contains dozens of subtypes.

Snowflake frames the gap with a plain example. Ask “show total spend across all electronic components,” where “electronic component” sits atop a taxonomy of capacitors, resistors, ICs, and dozens of subcategories. A raw schema has no idea those rows roll up to one concept. Someone has to teach the agent the hierarchy, or it answers the wrong question.

What Snowflake actually tested

Snowflake built a deliberately hard benchmark on public biomedical data. It combined the Cell Ontology (33,651 terms connected by roughly 50,000 hierarchical and relational edges) with the PRISM drug-repurposing dataset: 4,518 drugs tested across 578 cancer cell lines, producing more than 2.6 million viability measurements (Snowflake, 2026).

The hard part is a mismatch the schema cannot resolve on its own. PRISM labels data by tissue (lung, breast); the ontology organizes cells by lineage (epithelial, stromal). To answer a question like “show drug efficacy for PD-1 inhibitors across epithelial-derived cancer cell lines,” an agent has to bridge those two vocabularies. Snowflake wrote 22 questions targeting exactly this kind of reasoning and ran each agent configuration five times to measure consistency (Snowflake, 2026).

Four setups were compared, each adding structure on top of the last.

The baseline: semantic view alone

The control was a Cortex Analyst agent over a semantic view: Snowflake’s governed layer that maps physical tables to entities, relations, facts, dimensions, and metrics. It gives the agent a domain-friendly model instead of raw schemas (Snowflake, 2026). It scored 0.93 out of 2.0, with a 50% success rate.

Knowledge graph plus recursive CTEs

The second setup added a knowledge graph: a representation of knowledge as interconnected entities and typed relationships (Snowflake, 2026). Snowflake stored it as two ordinary tables: KG_NODE for entities and KG_EDGE for relationships like “treats,” “targets,” or “expressed_in”, so no separate graph database was needed.

To walk those relationships, the agent used recursive common table expressions (CTEs): SQL that joins an edge table to itself repeatedly, expanding outward hop by hop until it runs out of new nodes. Unlike a fixed join, the path length stays data-driven, so the agent can find a connection without knowing in advance whether it is two hops away or five (Snowflake, 2026). This agent had seven tools and scored 1.14 out of 2.0, a 60% success rate, roughly a 10-point lift over baseline.

Its weakness was orchestration. Seven tools meant the agent had to pick the right sequence, and the stored procedures needed exact concept names with no synonym resolution (Snowflake, 2026).

Flattened GraphRAG

The third setup, GraphRAG, took a different route. Instead of traversing the graph at query time, it precomputes one denormalized profile per concept (name, definition, synonyms, immediate neighbors, and data categories aggregated from all descendants) and indexes those profiles for hybrid keyword and vector search (Snowflake, 2026). At runtime the agent retrieves the right profile and passes that grounding into SQL generation.

This agent used just two tools, one search and one SQL. It scored 1.38 out of 2.0, a 70% success rate, and showed the lowest run-to-run variance of any configuration (Snowflake, 2026). Synonym resolution came along for free: “flat epithelial” resolves to “squamous epithelial cell” through semantic search.

Curated term mappings on top

The final setup embedded a thin layer of hand-curated mappings directly in the agent’s prompt: eight authoritative cell-type-to-tissue mappings plus composite-term definitions that act as overrides when they apply (Snowflake, 2026). It scored 1.55 out of 2.0, roughly 78% success, the highest of the four. The cost: that static knowledge has to be maintained by hand, and it only covers the terms someone wrote down.

What ontology grounded reasoning in Snowflake Cortex agents proved

Here is the full picture from Snowflake’s benchmark (Snowflake, 2026):

Agent setup	Tools	Mean score (/2.0)	Success rate
Semantic view (baseline)	1	0.93	50.0%
+ Knowledge graph	7	1.14	60.0%
+ GraphRAG	2	1.38	70.0%
+ GraphRAG and term mappings	2	1.55	78.2%

Three findings travel well beyond biomedical data.

First, fix the data layer before adding rules. GraphRAG with aggregated descendant attributes captured roughly 80% of the total gain over baseline with no static mappings at all (Snowflake, 2026). Precomputing the right structure did more than any clever prompt.

Second, fewer tools can win. The two-tool GraphRAG agent matched the seven-tool knowledge graph agent and was more consistent. More tools give more raw capability but also more surface area for the agent to choose wrong (Snowflake, 2026).

Third, the gains correlated more with structural context than with extra computational reasoning (Snowflake, 2026). The model did not get smarter. The data got better organized. That is the whole argument for grounding agents in ontologies and typed relationships rather than handing them a pile of tables.

A concrete example

Picture Vantage Health, a mid-size health-analytics company whose research team runs a Cortex-style agent over screening data. An analyst asks: “Which compounds show efficacy across epithelial-derived cell lines?”

With raw tables, the agent has no concept of “epithelial-derived.” It matches a literal column, misses the cell types filed under tissue labels, and returns a confident, incomplete answer. The analyst does not catch it because the SQL looks reasonable.

Now give the same agent a knowledge graph plus a precomputed concept profile. “Epithelial cell” expands to its 693 descendants across more than ten hierarchy levels, the lineage-to-tissue gap is bridged by aggregated data already baked into the profile, and the query returns the full cohort (Snowflake, 2026). Same model, same question, far better answer, because the agent reasoned over relationships instead of column names.

That is the pattern SemanticOS is built around. Enterprise knowledge is scattered across tools that do not share context, and the fix is a connective layer (a knowledge graph plus AI search) that links entities so a person or an agent can traverse them in one query. Snowflake’s benchmark is a clean, measured argument for why that layer earns its keep.

Key takeaways

Snowflake’s benchmark shows ontology-grounded Cortex agents climbing from a 50% baseline to roughly 78% success as structure is added (Snowflake, 2026).
A semantic layer maps tables to entities; an ontology adds hierarchy, synonyms, and typed relationships the schema cannot express on its own.
GraphRAG, which precomputes enriched concept profiles, beat a seven-tool knowledge graph with only two tools and the lowest variance.
Most of the accuracy gain came from better-structured data, not more reasoning, fix the data layer first.
The takeaway generalizes: ground AI agents in typed relationships, not raw schemas, whether the domain is drug discovery or fragmented enterprise knowledge.

Ontology-Grounded Reasoning for Snowflake Cortex Agents

Why do raw schemas fail AI agents?

What Snowflake actually tested

The baseline: semantic view alone

Knowledge graph plus recursive CTEs

Flattened GraphRAG

Curated term mappings on top

What ontology grounded reasoning in Snowflake Cortex agents proved

A concrete example

Key takeaways

Frequently asked questions

What is ontology-grounded reasoning in Snowflake Cortex agents?

How much did ontology grounding improve Cortex agent accuracy?

What is the difference between a semantic layer and an ontology?

What is GraphRAG and why did it outperform a knowledge graph in the benchmark?

How does SemanticOS relate to ontology-grounded agents?

Sources

Put a governed semantic graph behind your stack

Request early access

Related reading

Neo4j NODES 2025 Recap: Graph + GenAI in Production

Knowledge Graph 230% ROI: The IDC Neo4j Study

Scaling Semantic Layer Rollout with Cortex Code Agent SDK