# Open Knowledge Format (OKF) The Open Knowledge Format (OKF) is an open specification from Google Cloud for representing the metadata, context, and curated knowledge that AI systems need, in a portable, vendor-neutral, human- and agent-readable form. Version 0.1 shipped on 2026-06-12, authored by Sam McVeety and Amir Hormati of the Google Cloud Data team. The interesting part for me: it is basically the [[Obsidian]] vault plus `AGENTS.md` pattern turned into a standard. Plain [[Markdown]] files with YAML frontmatter, living in [[Git]], readable by a human with `cat` and by an [[Large Language Models (LLMs)|LLM]] verbatim. If you already keep a knowledge base this way, you already speak most of OKF. ## Why it matters - Every agent builder is solving the same context-assembly problem from scratch, and every catalog vendor is reinventing the same data model. The knowledge itself stays locked behind whichever tool created it - OKF formalizes the "LLM wiki" pattern that emerged organically: [[Andrej Karpathy]]'s LLM-wiki gist (April 2026, ~16M views), the `AGENTS.md` / `CLAUDE.md` convention used across 60,000+ open-source projects, and Obsidian vaults wired to coding agents - It picks the lowest-friction substrate that already exists (Markdown + frontmatter + Git) instead of inventing a new one. No JSON schema registry, no protobuf, no required SDK ## What it is - An **open spec**, not a product or platform. Apache 2.0, published in the `GoogleCloudPlatform/knowledge-catalog` repo - A **format**, deliberately. "Not tied to any specific cloud, database, model provider, or agent framework. It will never require a proprietary account or SDK" - v0.1 is small on purpose: ~451 lines, fits on one page. Explicitly "a starting point, not a finished standard," versioned for backward-compatible growth ## File format and structure - **A bundle is a directory** of UTF-8 Markdown files. Ship it as a tarball, host it in any repo, mount it from any filesystem - Example layout: ``` sales/ ├── index.md # directory listing (reserved) ├── log.md # chronological history (reserved) ├── tables/ │ ├── index.md │ ├── orders.md │ └── customers.md └── metrics/ └── weekly_active_users.md ``` - **Reserved filenames**: `index.md` (a directory listing, for progressive disclosure) and `log.md` (chronological update history). Neither is a concept document - **Each concept** is one Markdown file with YAML frontmatter. The only **required** field is `type` (a producer-defined string consumers use for routing, filtering, presentation). Consumers must tolerate unknown types and preserve unknown fields - Recommended optional fields: `title`, `description`, `resource` (canonical URI for the underlying asset), `tags`, `timestamp` (ISO 8601) - Conventional body sections when applicable: `# Schema`, `# Examples`, `# Citations` - **Relationships are Markdown links.** A link from concept A to B asserts a directed, untyped edge, so a bundle is a graph, not just a folder tree. Broken links are tolerated - **Conformance (v0.1)**: every non-reserved `.md` file has parseable YAML frontmatter; every frontmatter block has a non-empty `type`; reserved filenames follow their structures when present Example concept document: ```yaml --- type: BigQuery Table title: Orders description: One row per completed customer order. resource: https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders tags: [sales, revenue] timestamp: 2026-05-28T14:30:00Z --- # Schema | Column | Type | Description | |--------|------|-------------| | `order_id` | STRING | Globally unique order identifier. | | `customer_id` | STRING | FK to [customers](/tables/customers.md). | ``` ## Design principles - **Minimally opinionated**: OKF requires exactly one thing of every concept, a `type` field. Everything else is up to the producer - **Producer / consumer independence**: who writes the knowledge is cleanly separated from who consumes it - **Format, not platform**: no lock-in, no SDK, no account ## Why it improves data sharing - **Human- and agent-readable**: no SDK or query language between the reader and the content - **Version-controllable out of the box**: bundles live in Git, so pull requests, diffs, blame, and review just work. Knowledge curation becomes normal software engineering - **Portable and lock-in free**: a bundle is a directory - **Structured where it helps, prose where it matters**: frontmatter for the few fields you query or index on, Markdown body for the schemas, prose, and example queries humans and LLMs actually read - **Composes with existing tools**: Notion, Obsidian, MkDocs, Hugo, and Jekyll already speak Markdown plus YAML frontmatter - **Progressive disclosure built in**: auto-generated `index.md` files let an agent walk the hierarchy one level at a time instead of loading the whole bundle into context - **Graph-shaped**, via normal Markdown links between concepts ## Relationship to other standards - **Complements [[Model Context Protocol (MCP)]]**, it does not compete with it. MCP governs an agent's access to tools and data; OKF describes the knowledge itself. An MCP server can expose an OKF bundle as a knowledge source - **Does not replace domain schemas** like Avro, Protobuf, or OpenAPI. OKF references them rather than subsuming them - It is positioned as formalizing an emerging practice, not as a rival to older open-data standards (schema.org, DCAT are not addressed) ## Tooling and repo - Repo: `GoogleCloudPlatform/knowledge-catalog` (Apache 2.0, primary language Python, created 2026-05-04, ~3.3K stars at first read). README disclaims it is "not an official Google product" - The `okf/` directory holds the `SPEC.md`, three sample bundles (GA4 e-commerce, Stack Overflow, Bitcoin public datasets), and reference code - Reference implementations shipped with it: - **Enrichment agent**: walks BigQuery datasets and drafts OKF concept docs with citations, schemas, and join paths. Built on Google's Agent Development Kit (ADK) with [[Gemini]] as the model - **Static HTML visualizer** (`viz.html`): a self-contained graph view using Cytoscape.js and marked.js, no backend - **`kcmd`**: a TypeScript CLI plus MCP server for bidirectional sync between local OKF files and [[Google Cloud Knowledge Catalog]] ## Adoption and caveats - At launch (2026-06-12) every producer and consumer was built by Google. A v0.1 spec from a single vendor is an invitation, not yet a standard - Whether OKF becomes a common format depends on producers outside Google adopting it. Catalog vendors like Atlan, Alation, and Collate are the ones to watch - Worth noting the commercial shape: OKF gives away the cheap part (a file any editor can open) and points the demand it creates toward the part Google sells, [[Google Cloud Knowledge Catalog]]. That is a smart play, not a knock; open formats with a paid serving layer have a long track record ## My take This validates the approach I have used for years. Markdown notes, YAML frontmatter, Git underneath, and an `AGENTS.md` / `CLAUDE.md` at the root telling agents how to behave. OKF is, more or less, the standardized version of an OSK vault used as an LLM wiki. If it gets traction, the knowledge bases people already keep in [[Obsidian]] become directly consumable by any agent that speaks OKF, with no export step. That is worth keeping an eye on, and worth structuring my own knowledge so it would convert cleanly. ## References - Introducing the Knowledge Catalog: https://cloud.google.com/blog/products/data-analytics/introducing-the-google-cloud-knowledge-catalog - How OKF can improve data sharing: https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing/ - OKF SPEC.md: https://github.com/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/SPEC.md - OKF directory: https://github.com/GoogleCloudPlatform/knowledge-catalog/tree/main/okf - Repo root: https://github.com/GoogleCloudPlatform/knowledge-catalog - Google Research announcement: https://x.com/GoogleResearch/status/2065475343205740911 ## Related - [[Google Cloud Knowledge Catalog]] - [[Model Context Protocol (MCP)]] - [[AI Agents]] - [[Large Language Models (LLMs)]] - [[Obsidian]] - [[Markdown]] - [[Git]] - [[Andrej Karpathy]] - [[Open source]]