# Defuddle Defuddle is an open-source content extraction library by [[Steph Ango]] (kepano) that strips away clutter from web pages — ads, navigation, sidebars, comments — and returns clean HTML or Markdown with extracted metadata. Originally built for the [[Obsidian Web Clipper]] browser extension, it works in browsers, Node.js, and CLI environments. ## What It Does - Extracts the primary content of a web page, removing noise (headers, footers, sidebars, comments) - Outputs clean **HTML or Markdown** - Extracts metadata: author, publication date, description, schema.org data - Standardizes HTML elements: headings, code blocks, footnotes, math notation, callouts - Uses mobile styles to infer which elements are non-essential (more forgiving than alternatives like Readability) ## Usage Available as: - **Browser library** — drop-in script - **Node.js module** — works with JSDOM or linkedom for server-side use - **CLI** — `defuddle <url>` with flags for Markdown output, JSON metadata, and debug mode ```bash # Example CLI usage npx defuddle https://example.com --markdown ``` Install via npm: ```bash npm install defuddle ``` ## Context Created by Steph Ango as the extraction engine powering [[Obsidian Web Clipper]]. Designed as a more lenient alternative to Mozilla Readability — it errs on the side of keeping more content rather than stripping too aggressively. ## References - Official website: https://defuddle.md - Pricing: https://defuddle.md/pricing - Documentation: https://defuddle.md/docs - Playground: https://defuddle.md/playground - NPM package: https://www.npmjs.com/package/defuddle - Source code: https://github.com/kepano/defuddle - Terms of service: https://defuddle.md/terms - Privacy policy: https://defuddle.md/privacy ## Related - [[Steph Ango]] - [[summarize (CLI)]] - [[Obsidian Web Clipper]]