# summarize (CLI) summarize is an open-source [[Command Line Interface (CLI)]] tool and browser extension by [[Peter Steinberger]] that turns URLs, files, and media into clean summaries using [[Large Language Models (LLMs)]]. Written in [[TypeScript]], it bundles content extraction, transcription, and LLM summarization into a single command. ## What It Does Three-stage pipeline: 1. **Fetch + Extract**: pulls source content, cleans HTML via Mozilla Readability (the same library [[Defuddle]] builds on), with Firecrawl as fallback for sites that block scraping 2. **Transcribe** (media only): prefers published transcripts, then falls back through whisper.cpp, Groq, AssemblyAI, Gemini, OpenAI 3. **Summarize**: sends to an LLM, streams Markdown output with token counts and cost estimates ## Supported Inputs - **Web pages**: HTML cleaned via Readability - **PDFs**: local and remote - **Images**: JPEG, PNG, WebP, GIF - **Audio/Video**: MP3, WAV, M4A, MP4, MOV, WEBM (auto-transcribed) - **YouTube**: transcript-first, then yt-dlp + Whisper fallback - **Podcasts**: Apple Podcasts, Spotify, RSS feeds (Podcasting 2.0 transcripts) - **Text files**: .txt, .md, .json, .yaml, .xml - **Stdin**: piped content including binary (50MB limit) ## Key Features - **Provider-agnostic**: works with OpenAI, Anthropic, Google Gemini, xAI, NVIDIA, Z.AI, [[OpenRouter]] (including free models), GitHub Models, and local OpenAI-compatible endpoints - **Coding CLI backends**: can delegate to Claude CLI, Codex, Gemini CLI, Cursor Agent, OpenClaw, OpenCode - **Auto model selection**: picks the best model based on input type and prompt size; retries with alternatives on failure. Built-in `--model free` preset uses OpenRouter free models, refreshable via `summarize refresh-free` - **Chrome Side Panel + Firefox Sidebar**: browser extension with streaming summaries, chat mode (conversational follow-ups with full transcript context), and hover tooltip link previews (experimental) - **Video slides**: `--slides` extracts slide screenshots from YouTube or local videos via ffmpeg, adds OCR via Tesseract, aligns with transcript timestamps, and renders clickable `[mm:ss]` seek links - **Configurable output**: presets (short/medium/long/xl/xxl) or character targets; `--lang` for output language - **Smart defaults**: skips LLM call if content is already shorter than requested length - **SQLite caching**: summary, transcript, and media download caches with configurable TTL/size caps - **Extract-only mode**: just cleaned content as Markdown, no summarization (similar to [[Defuddle]]) - **Custom prompts**: `--prompt` or `--prompt-file` to replace default summary instructions - **Metrics and cost tracking**: finish line shows token counts, timing, transcript stats, and cost estimates (including Whisper transcription costs) - **Themed terminal output**: 24-bit ANSI Markdown rendering with `--theme` support; `--plain` for raw output - **X/Twitter support**: extracts tweet text and auto-transcribes tweet videos via yt-dlp with browser cookie support ## Installation ```bash # npm (requires Node 22+) npm i -g @steipete/summarize # npx (no install) npx -y @steipete/summarize "URL" # Homebrew brew install summarize ``` Also available as a library: `npm i @steipete/summarize-core` (ESM-only since v0.8.0) ## Architecture The tool runs a localhost daemon (`127.0.0.1:8787`) that bridges the browser extension to the CLI. Autostart is supported via macOS LaunchAgent, Linux systemd user service, or Windows Scheduled Task. The daemon streams results over SSE with token-protected auth. Configuration lives in `~/.summarize/config.json` (JSON5-lenient). ## Compared to Defuddle [[Defuddle]] is extraction-only (HTML to clean Markdown). summarize wraps extraction + transcription + LLM summarization into a full pipeline. It actually uses Mozilla Readability internally, the same core engine Defuddle improves upon. ## References - https://github.com/steipete/summarize - https://summarize.sh/ - https://www.npmjs.com/package/@steipete/summarize ## Related - [[Defuddle]] - [[Peter Steinberger]] - [[Large Language Models (LLMs)]] - [[OpenRouter]] - [[Command Line Interface (CLI)]] - [[Obsidian Web Clipper]]