# Typed Markdown Collections Specification The **mdbase-spec** (also known as mdbase) is a specification for treating folders of markdown files with YAML frontmatter as typed, queryable data collections. It formalizes the "vault-as-database" model where a collection of markdown notes functions as a structured database with schemas, queries, and expressions defined entirely in plain text. ## Motivation Tools like Obsidian have demonstrated the power of building on folders of markdown files with YAML frontmatter. Once a vault contains not just prose but structured data distributed across hundreds of files—meetings, journal entries, contacts, literature notes—it functions as a database. However, the tooling for managing different types of notes is limited. To enforce that all meeting notes have specific fields (date, attendees, status), options are sparse: templates help at creation time but do nothing after, and Linter plugin rules require technical configuration. There is no schema file that lives alongside notes declaring what a "meeting note" looks like. The result is that each tool interacting with a vault—Obsidian itself, VSCode, Templater, AI agents—must independently figure out types, or users must encode the same structural assumptions in templates, Dataview queries, Linter configs, and agent prompts, hoping they stay in sync. ### Why a Specification A specification is the right artifact here rather than a library or plugin. With AI agents capable of producing working implementations from a spec, what becomes expensive and worth human effort is getting the specification right—the thing that determines whether independently-built tools can interoperate. It is now easy for AI agents to write one-off code for reading and writing frontmatter, but each will make slightly different assumptions about types, validation, and link resolution. A spec prevents that fragmentation. The implementations don't need to be hand-written, but they do need to agree on what a "meeting note" means and how links are resolved. ## Design Principles The specification is built on five core principles: 1. **Files as source of truth** — The filesystem is primary; indexes and caches are derived 2. **Human-readable format** — No proprietary syntax; plain text editors suffice for modifications 3. **Progressive strictness** — Works without schema; validation is optional and incremental 4. **Portability** — Collections function across any compliant tools 5. **Git-friendly** — All state uses text files suitable for version control ## Core Concepts ### Type Definitions Types are defined as markdown files in a `_types/` folder (configurable), where the frontmatter declares the schema and the body documents the type in prose. A template becomes a consequence of a type definition rather than a substitute for one. Example type definition for a meeting note: ```yaml --- name: meeting fields: date: type: date required: true attendees: type: list items: type: link target: person status: type: enum values: [scheduled, completed, cancelled] default: scheduled --- A meeting note. The `attendees` field links to person-type notes. ``` ### Type Matching Files are matched to types through several mechanisms (in precedence order): - **Explicit declaration**: `type: task` or `types: [task, urgent]` in frontmatter - **Path globs**: Match files by path pattern (e.g., `tasks/**/*.md`) - **Field presence**: Match when specific fields exist - **Conditions**: Match based on field value conditions using operators like `exists`, `eq`, `gte`, `contains`, `matches` A single file can match multiple types; field constraints merge by taking the most restrictive intersection. ### Field Types **Primitive types**: `string`, `integer`, `number`, `boolean`, `date`, `datetime`, `time`, `enum`, `any` **Composite types**: `list` (ordered collection with typed items), `object` (nested structure with field definitions) **Reference type**: `link` (points to other files with optional target type constraint) All types support common options: `required`, `default`, `description`, `deprecated`, `unique` (cross-file uniqueness checking). **Special capabilities**: - `generated` fields auto-populate via strategies like `ulid`, `uuid`, `now`, `now_on_write`, or derivation - `computed` fields evaluate expressions at read-time and are not persisted ### Expression Language The expression language supports filtering, sorting, computed fields, and link traversal. It is designed for compatibility with Obsidian Bases syntax. **Operators**: `==`, `!=`, `<`, `>`, `<=`, `>=`, `&&`, `||`, `!` **Methods**: `.contains()`, `.startsWith()`, `.endsWith()`, `.length`, `.isEmpty()` **Date functions**: `today()`, `now()`, arithmetic with duration strings like `"7d"` **Link traversal**: `assignee.asFile().team` (navigate through linked records) ### Link Resolution The specification handles three link formats: - Wikilinks: `[[target]]`, `[[./relative]]`, `[[target|alias]]` - Markdown links: `[text](path.md)` - Bare paths Links are resolved relative to the containing file or collection root, with support for anchors referencing headings or blocks within targets. ## Conformance Levels The specification defines six progressive conformance levels, allowing implementations to declare partial support: | Level | Name | Capabilities | |-------|------|--------------| | 1 | Basic CRUD | File discovery, YAML parsing, create/read/update/delete with basic validation | | 2 | Type Matching | Explicit type declarations and match rules for automatic type association | | 3 | Querying | Expression-based filtering, sorting, pagination, and computed fields | | 4 | Links | Link parsing, resolution across the collection, and reference validation | | 5 | Reference Updates | Automatic link updating when files are renamed | | 6 | Caching & Advanced | Indexing, backlink computation, batch operations, watch mode | This matters because tools that might conform vary in ambition: an Obsidian plugin, an LSP server, a simple CLI, or an AI agent needing structured persistence have different needs. ## Implementations Several implementations exist at various stages of development: - **mdbase** (TypeScript) — Reference implementation with full query support, SQLite-backed caching, and conformance test runner - **mdbase-rs** (Rust) — Rust implementation - **mdbase-lsp** (Rust) — [[Language Server Protocol (LSP)]] server for markdown vaults - **mdbase-cli** (TypeScript) — CLI tool with support for Obsidian `.base` files ## References - Official website: https://mdbase.dev - Specification: https://mdbase.dev/spec.html - Specification source code: https://github.com/callumalpass/mdbase-spec - TypeScript implementation: https://github.com/callumalpass/mdbase - Rust LSP server: https://github.com/callumalpass/mdbase-lsp - Rust CLI: https://github.com/callumalpass/mdbase-rs - Node CLI: https://github.com/callumalpass/mdbase-cli - Announcement post: https://www.reddit.com/r/ObsidianMD/comments/1qs1juw/a_cli_tool_for_reading_bases_files_mdbasespec_a ## Related - [[callumalpass]] - [[Language Server Protocol (LSP)]] - [[Agentic Knowledge Management (AKM)]] - [[Portent]] (adjacent open spec — content model rather than schema)