# sag
sag is a modern [[Text-to-Speech (TTS)]] CLI by [[Peter Steinberger]] that mimics the macOS native `say` command — same flag shape, same muscle memory — but routes synthesis through [[ElevenLabs]] for dramatically better voices.
`sag "Hello world"` plays speech to the speakers. `sag -v Roger -o out.mp3 "Hello"` renders to a file with a chosen voice. Anywhere you would have piped to `say`, you can pipe to `sag` instead.
## Why it matters
The macOS `say` voices have aged badly compared to modern neural TTS. ElevenLabs produces voices that are convincingly human, multilingual, and expressive. But its native interface is an HTTP API — fine for a service, awful for a one-off "speak this notification". sag closes that gap by wrapping the API in a `say`-compatible CLI, with sane defaults and streaming playback.
For agentic workflows in particular ([[OpenClaw]], CI scripts, CLI assistants), having a drop-in `say` that sounds good unblocks a category of voice-out use cases that were previously embarrassing to ship.
## Key features
- **`say`-compatible flags.** `-v` voice, `-r` rate, `-o` output. Existing scripts barely change.
- **Streaming playback by default.** Audio plays as it generates — minimal latency.
- **Voice discovery.** `sag voices` lists, searches, and filters available ElevenLabs voices.
- **Multiple output formats.** MP3, WAV/PCM; format inferred from extension.
- **All ElevenLabs models.** `eleven_v3` (most expressive, default), `eleven_multilingual_v2` (stable), `eleven_flash_v2_5` (ultra-low latency), `eleven_turbo_v2_5`.
- **Full voice parameter control.** Stability, similarity, style, speaker boost, seed, language, text normalization.
- **Cross-platform.** Linux, macOS, Windows. Plays via `afplay` on macOS and `oto` elsewhere.
## Design choices worth noting
- **No default internal timeout.** Lets external orchestrators (agents, CI) own timeout policy without surprise truncation.
- **SSML for v2/v2.5, audio tags for v3.** v3 uses prompt tags like `[whispers]` rather than SSML `<break>` — sag exposes both, lets you pick the right one for the model.
- **Streaming-first.** The default optimizes for "speak now," not "generate a perfect file." File output is a flag away.
- **Familiar shape.** Mirroring `say` is a deliberate adoption-cost decision. The tool is invisible to anyone who already knows `say`.
## Where it fits
Anywhere a script wants to *say* something with a voice that doesn't sound like 2010. Notification hooks, CI announcements, agent voice-out, accessibility utilities, podcast-style content from text. Pair with [[OpenClaw]] for voice replies, or pipe long content through it for "listen instead of read" workflows. Requires an ElevenLabs API key (`ELEVENLABS_API_KEY`).
## References
- Project site: <https://sag.sh/>
- Source: <https://github.com/steipete/sag>
- License: MIT
- ElevenLabs API: <https://elevenlabs.io/docs/api-reference>
## Related
- [[Peter Steinberger]]
- [[ElevenLabs]]
- [[Text-to-Speech (TTS)]]
- [[Go]]
- [[OpenClaw]]
- [[Spogo]]
- [[Birdclaw]]
- [[WaCLI]]