# VoiceBox
VoiceBox is an open-source, local-first desktop app for voice cloning and text-to-speech synthesis. Think of it as a self-hosted alternative to ElevenLabs — everything runs on your machine, no cloud uploads, no subscriptions. Powered by Qwen3-TTS for voice cloning. Licensed under MIT.
## Key Features
- **Voice cloning**: Clone voices from audio samples using Qwen3-TTS
- **Fully local**: All processing happens on-device, complete privacy
- **Multi-track timeline editor**: Compose narratives with trimming and inline editing
- **Batch generation**: Generate speech from multiple text segments with caching
- **Automatic transcription**: Built-in Whisper integration
- **System audio capture**: Record system audio on macOS and Windows
- **Voice profile management**: Import/export voice profiles
- **REST API**: Programmatic access with auto-generated OpenAPI docs
- **Multi-language**: English, Chinese, and more
## Tech Stack
- **Desktop**: [[Tauri]] ([[Rust]]) — 10x smaller bundle than Electron, native performance
- **Frontend**: [[React]] + [[TypeScript]] + [[Tailwind CSS]]
- **Backend**: FastAPI ([[Python]])
- **Inference**: MLX (Apple Silicon) / PyTorch (other platforms)
- **Database**: [[SQLite]]
## Installation
Downloads available for macOS (Apple Silicon and Intel) and Windows. Linux builds planned.
## Roadmap
Real-time synthesis, conversation mode, and additional voice models (XTTS, Bark).
## References
- GitHub: https://github.com/jamiepine/voicebox
## Related
- [[Text-to-Speech (TTS)]]
- [[Voice Cloning]]
- [[Artificial Intelligence (AI)]]
- [[Tauri]]
- [[ebook2audiobook]]