# Google Cloud Knowledge Catalog Google Cloud Knowledge Catalog (formerly Dataplex) is an AI-powered data catalog and metadata platform that Google pitches as a "universal context engine" for the enterprise. The job it is built for: ground AI agents in trusted business data so they stop hallucinating SQL joins and guessing at semantics. It launched on 2026-04-22, and it is the paid serving layer that ingests the [[Open Knowledge Format (OKF)]] and exposes it to agents. If OKF is the file format, Knowledge Catalog is the warehouse, the search engine, and the access-control guard around those files. ## Positioning - A rebrand and evolution of **Dataplex** (previously "Dataplex Universal Catalog"). The API, client library, CLI, and IAM names stay the same, so it is backward-compatible for existing users - Google's critique of legacy catalogs: they were built as manual inventories for technical users, focused on table structures rather than the deeper context AI agents need - The pitch: when agents lack business semantics and data relationships, you get hallucinations, high latency, and stale insights. Knowledge Catalog is meant to be the trusted context layer that prevents that ## Three pillars **1. Aggregation (unified context)** - Native metadata from BigQuery, AlloyDB, Spanner, Cloud SQL (GA); Firestore and Looker (Preview) - Third-party catalogs: Atlan, Collibra, Datahub, Ab Initio, Anomalo (GA) - Enterprise federation across Palantir, Salesforce Data360, SAP, ServiceNow, Workday (Preview) - BigQuery measures and LookML folded into one governed semantic foundation (Preview); self-contained data products with built-in SLAs and governance (GA) **2. Enrichment (AI-generated meaning)** - Smart Storage and an Object Context API that tag and embed files in Cloud Storage on arrival (Preview) - Deep multimodal extraction via [[Gemini]], pulling entities and relationships out of unstructured content (Preview) - Automated curation: natural-language descriptions, business glossaries, inferred relationships (Preview) - Verified queries and semantic guardrails to stop hallucinated joins (Preview) **3. Search (high-precision retrieval for agents)** - Hybrid semantic search at sub-second latency, built on Google Search technology (GA) - Access-control-aware: results respect source-system permissions, so an agent cannot retrieve what the user cannot see - A context-evaluation framework, to make context construction measurable rather than a guessing game ## Relationship to OKF - Knowledge Catalog was "updated to ingest Open Knowledge Format and serve it to our agents." OKF is the portable interchange format; the Catalog stores, enriches, governs, and serves it - The bridge is `kcmd`, a TypeScript CLI and MCP server that syncs local OKF files to and from Knowledge Catalog - Read the two together as the classic open-format-plus-paid-platform play: the [[Open Knowledge Format (OKF)]] is free and Apache 2.0; the serving layer that makes it useful at enterprise scale is the Google Cloud product ## AI, agents, and MCP - Grounding agents is the whole point. The [[Large Language Models (LLMs)|LLM]] gets trusted enterprise context instead of guessing - The **Gemini Enterprise Deep Research Agent** (Preview) is natively powered by Knowledge Catalog, synthesizing live business data, internal documents, and web research with citations - Two [[Model Context Protocol (MCP)]] paths: a remote server at `https://dataplex.googleapis.com/mcp` and a local MCP Toolbox binary for IDEs. Gemini Code Assist and Gemini CLI bundle the capability - MCP tools include `search_entries`, `lookup_entry`, `search_aspect_types`, and `lookup_context` (Preview) - Documented MCP clients include Gemini CLI, Gemini Code Assist, [[Claude Code]], Claude Desktop, Cline, Cursor, VS Code Copilot, and Windsurf. So yes, you can point Claude Code at it ## Availability and pricing - Generally available: broad metadata aggregation (BigQuery, AlloyDB, Spanner, Cloud SQL), data products, high-precision semantic search, the main third-party integrations - In Preview: enterprise federation (SAP, Workday, Palantir, Salesforce, ServiceNow), BigQuery measures, Smart Storage / Object Context API, multimodal extraction, automated curation, verified queries, the Deep Research Agent, Firestore and Looker connectors, and the `lookup_context` MCP tool - Knowledge-Catalog-specific pricing was not published at launch. Related Gemini Enterprise Agent Platform pricing exists (for example Memory Bank storage at $0.30 per GiB-month), with billing phasing in over mid-to-late 2026 ## Stated differentiators - Context-first architecture aimed at agents, not just technical users - Continuous enrichment that learns from schemas, query logs, and BI models instead of relying on manual curation - Multimodal intelligence via Gemini over unstructured content - Federation across legacy systems (SAP, Workday) and modern data platforms - Security-native: access control embedded in search results - Measurable context quality, plus sub-second search at enterprise scale Bloomberg Media is the launch customer reference: William Anderson (CTO, Bloomberg Media) credits it with launching their Data Access AI Agent by "grounding our AI in trusted institutional context." ## My take The technically interesting move is decoupling the format from the platform. Give away [[Open Knowledge Format (OKF)]] so the whole industry can write portable knowledge bundles, then sell the layer that enriches, governs, and serves them at scale. For anyone not already deep in Google Cloud, the more durable takeaway is OKF itself: the format is free and runs anywhere, so you can adopt the open standard without buying the catalog. Keep the knowledge portable; rent the serving layer only if and when you need it. ## References - Introducing the Knowledge Catalog: https://cloud.google.com/blog/products/data-analytics/introducing-the-google-cloud-knowledge-catalog - Product page: https://cloud.google.com/products/knowledge-catalog - How OKF can improve data sharing: https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing/ - Repo (tools and samples): https://github.com/GoogleCloudPlatform/knowledge-catalog - Google Research announcement: https://x.com/GoogleResearch/status/2065475343205740911 ## Related - [[Open Knowledge Format (OKF)]] - [[Model Context Protocol (MCP)]] - [[AI Agents]] - [[Large Language Models (LLMs)]] - [[Gemini]] - [[Vertex AI]] - [[Claude Code]] - [[Open source]]