# sag sag is a modern [[Text-to-Speech (TTS)]] CLI by [[Peter Steinberger]] that mimics the macOS native `say` command — same flag shape, same muscle memory — but routes synthesis through [[ElevenLabs]] for dramatically better voices. `sag "Hello world"` plays speech to the speakers. `sag -v Roger -o out.mp3 "Hello"` renders to a file with a chosen voice. Anywhere you would have piped to `say`, you can pipe to `sag` instead. ## Why it matters The macOS `say` voices have aged badly compared to modern neural TTS. ElevenLabs produces voices that are convincingly human, multilingual, and expressive. But its native interface is an HTTP API — fine for a service, awful for a one-off "speak this notification". sag closes that gap by wrapping the API in a `say`-compatible CLI, with sane defaults and streaming playback. For agentic workflows in particular ([[OpenClaw]], CI scripts, CLI assistants), having a drop-in `say` that sounds good unblocks a category of voice-out use cases that were previously embarrassing to ship. ## Key features - **`say`-compatible flags.** `-v` voice, `-r` rate, `-o` output. Existing scripts barely change. - **Streaming playback by default.** Audio plays as it generates — minimal latency. - **Voice discovery.** `sag voices` lists, searches, and filters available ElevenLabs voices. - **Multiple output formats.** MP3, WAV/PCM; format inferred from extension. - **All ElevenLabs models.** `eleven_v3` (most expressive, default), `eleven_multilingual_v2` (stable), `eleven_flash_v2_5` (ultra-low latency), `eleven_turbo_v2_5`. - **Full voice parameter control.** Stability, similarity, style, speaker boost, seed, language, text normalization. - **Cross-platform.** Linux, macOS, Windows. Plays via `afplay` on macOS and `oto` elsewhere. ## Design choices worth noting - **No default internal timeout.** Lets external orchestrators (agents, CI) own timeout policy without surprise truncation. - **SSML for v2/v2.5, audio tags for v3.** v3 uses prompt tags like `[whispers]` rather than SSML `<break>` — sag exposes both, lets you pick the right one for the model. - **Streaming-first.** The default optimizes for "speak now," not "generate a perfect file." File output is a flag away. - **Familiar shape.** Mirroring `say` is a deliberate adoption-cost decision. The tool is invisible to anyone who already knows `say`. ## Where it fits Anywhere a script wants to *say* something with a voice that doesn't sound like 2010. Notification hooks, CI announcements, agent voice-out, accessibility utilities, podcast-style content from text. Pair with [[OpenClaw]] for voice replies, or pipe long content through it for "listen instead of read" workflows. Requires an ElevenLabs API key (`ELEVENLABS_API_KEY`). ## References - Project site: <https://sag.sh/> - Source: <https://github.com/steipete/sag> - License: MIT - ElevenLabs API: <https://elevenlabs.io/docs/api-reference> ## Related - [[Peter Steinberger]] - [[ElevenLabs]] - [[Text-to-Speech (TTS)]] - [[Go]] - [[OpenClaw]] - [[Spogo]] - [[Birdclaw]] - [[WaCLI]]