# CocoIndexCode **CocoIndexCode is a CLI tool that gives AI coding agents semantic, AST-aware search over a codebase, and claims to cut their token use by ~70%.** It's the code-intelligence product built on top of [[CocoIndex]]. Apache 2.0. Why it exists: a coding agent that doesn't know exact symbol names ends up loading whole files or grepping broadly, burning tokens on noise. CocoIndexCode lets the agent ask in plain language ("authentication logic") and get back the precise chunks that match by meaning, not string. ## How it works - **AST indexing.** Code is parsed with [[Tree-sitter]] across 28+ languages (Python, JS/TS, Rust, Go, Java, C/C++, C#, SQL, and more), so chunks follow real structure instead of arbitrary line windows. Custom chunkers allow per-language strategies. - **Semantic search.** Queries become [[Embeddings]] and match indexed chunks by similarity. Returns fuzzy/conceptual hits, ranked, not lexical matches. - **Background daemon.** A long-lived daemon keeps the embedding model loaded and the index warm; it auto-starts, upgrades transparently, and shuts down in under a second. - **No database setup.** Uses embedded LMDB for the index. Nothing to provision. - **Pluggable embeddings.** Local [[SentenceTransformers]] (no API key needed) or 100+ cloud providers via [[LiteLLM]]. Models with asymmetric retrieval (e.g. Cohere v3) can use different indexing vs query parameters. ## How an agent uses it - **Skill integration.** The agent invokes search automatically when useful; a user can force it with `/ccc`. - **MCP server.** Exposes a `search` tool over [[Model Context Protocol (MCP)]] with natural-language queries, language filters, and pagination. Works with Claude, Cursor, Codex. ## Usage ```bash pipx install 'cocoindex-code[full]' # [full] bundles local embeddings ccc init # write settings ccc index # build the index ccc search "authentication logic" # semantic search ``` Config splits in two: global (`~/.cocoindex_code/global_settings.yml`) for provider/model/device/keys, project (`.cocoindex_code/settings.yml`) for include/exclude patterns and language overrides. Docker ships `:latest` (~450 MB, cloud embeddings) and `:full` (~5 GB, local embeddings); keep a persistent container to hold the model in memory across sessions. The pitch: 1-minute setup, zero config needed. ## References - https://github.com/cocoindex-io/cocoindex-code - https://github.com/cocoindex-io - https://cocoindex.io/ ## Related - [[CocoIndex]] - [[CodeGraph]] - [[Semantic Search]] - [[Embeddings]] - [[Tree-sitter]] - [[AI Retrieval Patterns]] - [[qmd]] - [[Model Context Protocol (MCP)]] - [[LiteLLM]]