# summarize (CLI)
summarize is an open-source [[Command Line Interface (CLI)]] tool and browser extension by [[Peter Steinberger]] that turns URLs, files, and media into clean summaries using [[Large Language Models (LLMs)]]. Written in [[TypeScript]], it bundles content extraction, transcription, and LLM summarization into a single command.
## What It Does
Three-stage pipeline:
1. **Fetch + Extract**: pulls source content, cleans HTML via Mozilla Readability (the same library [[Defuddle]] builds on), with Firecrawl as fallback for sites that block scraping
2. **Transcribe** (media only): prefers published transcripts, then falls back through whisper.cpp, Groq, AssemblyAI, Gemini, OpenAI
3. **Summarize**: sends to an LLM, streams Markdown output with token counts and cost estimates
## Supported Inputs
- **Web pages**: HTML cleaned via Readability
- **PDFs**: local and remote
- **Images**: JPEG, PNG, WebP, GIF
- **Audio/Video**: MP3, WAV, M4A, MP4, MOV, WEBM (auto-transcribed)
- **YouTube**: transcript-first, then yt-dlp + Whisper fallback
- **Podcasts**: Apple Podcasts, Spotify, RSS feeds (Podcasting 2.0 transcripts)
- **Text files**: .txt, .md, .json, .yaml, .xml
- **Stdin**: piped content including binary (50MB limit)
## Key Features
- **Provider-agnostic**: works with OpenAI, Anthropic, Google Gemini, xAI, NVIDIA, Z.AI, [[OpenRouter]] (including free models), GitHub Models, and local OpenAI-compatible endpoints
- **Coding CLI backends**: can delegate to Claude CLI, Codex, Gemini CLI, Cursor Agent, OpenClaw, OpenCode
- **Auto model selection**: picks the best model based on input type and prompt size; retries with alternatives on failure. Built-in `--model free` preset uses OpenRouter free models, refreshable via `summarize refresh-free`
- **Chrome Side Panel + Firefox Sidebar**: browser extension with streaming summaries, chat mode (conversational follow-ups with full transcript context), and hover tooltip link previews (experimental)
- **Video slides**: `--slides` extracts slide screenshots from YouTube or local videos via ffmpeg, adds OCR via Tesseract, aligns with transcript timestamps, and renders clickable `[mm:ss]` seek links
- **Configurable output**: presets (short/medium/long/xl/xxl) or character targets; `--lang` for output language
- **Smart defaults**: skips LLM call if content is already shorter than requested length
- **SQLite caching**: summary, transcript, and media download caches with configurable TTL/size caps
- **Extract-only mode**: just cleaned content as Markdown, no summarization (similar to [[Defuddle]])
- **Custom prompts**: `--prompt` or `--prompt-file` to replace default summary instructions
- **Metrics and cost tracking**: finish line shows token counts, timing, transcript stats, and cost estimates (including Whisper transcription costs)
- **Themed terminal output**: 24-bit ANSI Markdown rendering with `--theme` support; `--plain` for raw output
- **X/Twitter support**: extracts tweet text and auto-transcribes tweet videos via yt-dlp with browser cookie support
## Installation
```bash
# npm (requires Node 22+)
npm i -g @steipete/summarize
# npx (no install)
npx -y @steipete/summarize "URL"
# Homebrew
brew install summarize
```
Also available as a library: `npm i @steipete/summarize-core` (ESM-only since v0.8.0)
## Architecture
The tool runs a localhost daemon (`127.0.0.1:8787`) that bridges the browser extension to the CLI. Autostart is supported via macOS LaunchAgent, Linux systemd user service, or Windows Scheduled Task. The daemon streams results over SSE with token-protected auth. Configuration lives in `~/.summarize/config.json` (JSON5-lenient).
## Compared to Defuddle
[[Defuddle]] is extraction-only (HTML to clean Markdown). summarize wraps extraction + transcription + LLM summarization into a full pipeline. It actually uses Mozilla Readability internally, the same core engine Defuddle improves upon.
## References
- https://github.com/steipete/summarize
- https://summarize.sh/
- https://www.npmjs.com/package/@steipete/summarize
## Related
- [[Defuddle]]
- [[Peter Steinberger]]
- [[Large Language Models (LLMs)]]
- [[OpenRouter]]
- [[Command Line Interface (CLI)]]
- [[Obsidian Web Clipper]]