# Gitcrawl Gitcrawl is a local-first [[GitHub]] triage tool and a drop-in caching shim for the [[GitHub CLI]]. It mirrors issues, PRs, and adjacent metadata into a local [[SQLite]] database, then transparently serves read-only `gh` calls from that cache while letting mutating commands hit GitHub live. The pitch is two-in-one: it solves API rate-limit exhaustion *and* makes triage tractable on repos with hundreds or thousands of open issues — without giving up the muscle memory of `gh`. ## The problem GitHub's API rate limits punish exactly the workflows that need it most: monorepo triage, automated agents, dashboards, large-scale issue reviews. The web UI's search and filters are too coarse for serious triage. And every script that calls `gh issue list` or `gh pr view` is hitting the live API, redundantly, every time. Gitcrawl flips this. The cache becomes the primary store; GitHub becomes the upstream replication source. ## How it works - **SQLite-backed mirror.** Stores issues, PRs, comments, reviews, files, commits, checks, and workflow runs in `~/.config/gitcrawl/gitcrawl.db`. - **`gh`-compatible shim.** Intercepts read-only `gh` commands and serves them from the local DB. Mutating commands (create, comment, merge) pass through to live GitHub unchanged. Drop-in: existing scripts don't need rewriting. - **Cache visibility.** `gh xcache stats` exposes hit rate, freshness, and size of the local store. - **Semantic clustering.** Groups related issues and PRs using [[Embeddings]] (OpenAI), but clusters are anchored to deterministic GitHub references (linked issues, PRs, commits) so weak similarity scores can't fuse unrelated tickets into mega-clusters. - **TUI + JSON output.** Keyboard/mouse-driven terminal UI for human triage; structured JSON output for agents and automation pipelines. ## Design choices worth noting - **Local-first, deliberately CLI-only.** No hosted service, no HTTP API, no web UI. The tool stays small and the data stays on your machine. - **Drop-in over reinvention.** Rather than building a new triage CLI and asking users to relearn it, gitcrawl wraps `gh`. The cost of adoption is near zero. - **Hybrid clustering.** Pure embedding clustering tends to glue everything together once a similarity threshold is crossed. Anchoring clusters to explicit GitHub cross-references trades recall for precision — usually the right call in triage, where false positives waste maintainer time. - **Two layers, one promise.** The triage UI and the API cache could have shipped as separate tools. Bundling them means the cache is always populated by real workflows, not a synthetic sync job. ## Where it fits For maintainers of busy repos, gitcrawl is the missing layer between `gh` (the command) and a real triage workflow. For agentic systems that read GitHub heavily, it's a transparent rate-limit shield: point your scripts at `gh` as usual, and gitcrawl absorbs the read traffic locally. Same family as [[Discrawl]] (Discord → SQLite + FTS5) — the broader pattern is *vendor chat or vendor issue tracker → local queryable mirror*. ## References - Project site: <https://gitcrawl.sh/> - Source: <https://github.com/openclaw/gitcrawl> - License: MIT - GitHub CLI: <https://cli.github.com/> ## Related - [[Discrawl]] - [[GitHub]] - [[GitHub CLI]] - [[SQLite]] - [[Embeddings]]