# Discrawl
Discrawl is a CLI tool that mirrors [[Discord]] guilds into a local [[SQLite]] database so you can grep, query, and run analytics on community memory without depending on Discord's native search.
It treats a Discord server as a knowledge base. Channels, threads, members, message history, and attachments are pulled in via the bot API and indexed for fast full-text retrieval. Discord's built-in search is shallow and stateless; Discrawl turns the conversation log into a structured, queryable artifact you actually own.
## Why it matters
Discord communities accumulate enormous amounts of context: design decisions, support threads, links, code snippets, micro-tutorials. That memory is functionally inaccessible — search is weak, history is paginated, exports are crippled, and nothing leaves the platform. For anyone running a community (e.g. [[Knowii Community]]), this is org memory locked behind a vendor.
Discrawl flips the relationship: the canonical store is local. Discord becomes the input stream, not the database.
## How it works
- **Bot-based sync.** Authenticate a Discord bot, run `discrawl init`, then `discrawl sync --full` to backfill channels, threads, members, messages, and attachments into SQLite.
- **FTS5 indexing.** Built-in [SQLite FTS5](https://sqlite.org/fts5.html) tables back fast literal search via `discrawl search "query"`.
- **Optional [[Semantic Search]].** Pluggable [[Embeddings]] backends (OpenAI or local Ollama) enable semantic recall on top of the FTS index.
- **Desktop cache import.** Reads the local Discord Desktop client cache, so DM history can be archived without holding a user token.
- **Live tailing.** Subscribes to the Discord gateway to keep the local copy current, with periodic repair passes for missed events.
- **Git-backed snapshots.** Archives can be committed and shared as offline, version-controlled artifacts.
- **Read-only SQL access.** Once data is in SQLite, anything that speaks SQL (BI tools, scripts, notebooks) can analyze it.
## Design choices worth noting
- **Local-first.** No SaaS layer. The archive is a file on disk you can `grep`, back up, or commit.
- **Multiple ingest paths.** Bot API, desktop cache, and Git subscriptions cover the cases where one path alone is insufficient (notably DMs, which the bot API can't reach).
- **Composable.** SQLite + FTS5 + optional embeddings is a small, boring, durable stack — no proprietary index format, no required cloud service.
- **Bot-scoped by default.** The primary path uses a bot token rather than scraping a user account, which keeps it inside Discord's ToS for the channels the bot is authorized in.
## Where it fits
For community operators, Discrawl is the missing layer between Discord (the chat surface) and a proper knowledge base. Pipe its FTS index into a dashboard for support analytics, feed embeddings into a RAG assistant trained on community Q&A, or just `sqlite3` your way to the answer Discord refused to surface.
## References
- Project site: <https://discrawl.sh/>
- Source: <https://github.com/openclaw/discrawl>
- SQLite FTS5: <https://sqlite.org/fts5.html>
- Discord bot API docs: <https://discord.com/developers/docs/intro>
## Related
- [[Discord]]
- [[Gitcrawl]]
- [[SQLite]]
- [[Semantic Search]]
- [[Embeddings]]
- [[Knowii Community]]