# VoiceBox VoiceBox is an open-source, local-first desktop app for voice cloning and text-to-speech synthesis. Think of it as a self-hosted alternative to ElevenLabs — everything runs on your machine, no cloud uploads, no subscriptions. Powered by Qwen3-TTS for voice cloning. Licensed under MIT. ## Key Features - **Voice cloning**: Clone voices from audio samples using Qwen3-TTS - **Fully local**: All processing happens on-device, complete privacy - **Multi-track timeline editor**: Compose narratives with trimming and inline editing - **Batch generation**: Generate speech from multiple text segments with caching - **Automatic transcription**: Built-in Whisper integration - **System audio capture**: Record system audio on macOS and Windows - **Voice profile management**: Import/export voice profiles - **REST API**: Programmatic access with auto-generated OpenAPI docs - **Multi-language**: English, Chinese, and more ## Tech Stack - **Desktop**: [[Tauri]] ([[Rust]]) — 10x smaller bundle than Electron, native performance - **Frontend**: [[React]] + [[TypeScript]] + [[Tailwind CSS]] - **Backend**: FastAPI ([[Python]]) - **Inference**: MLX (Apple Silicon) / PyTorch (other platforms) - **Database**: [[SQLite]] ## Installation Downloads available for macOS (Apple Silicon and Intel) and Windows. Linux builds planned. ## Roadmap Real-time synthesis, conversation mode, and additional voice models (XTTS, Bark). ## References - GitHub: https://github.com/jamiepine/voicebox ## Related - [[Text-to-Speech (TTS)]] - [[Voice Cloning]] - [[Artificial Intelligence (AI)]] - [[Tauri]] - [[ebook2audiobook]]