# Voice Clone Studio
Voice Clone Studio is a modular [[Gradio]]-based web UI for voice cloning, voice design, and multi-speaker conversation. It is powered by [[Qwen3-TTS]], [[VibeVoice]], and [[LuxTTS]], and supports both [[Whisper]] and VibeVoice ASR for automatic transcription. Written in [[Python]], licensed under Apache 2.0.
## Features
- **Voice Clone**: Clone voices from short audio samples (3-10 seconds). Supports multiple engines (Qwen3-TTS 0.6B/1.7B, VibeVoice 1.5B/Large), voice prompt caching, seed control, 40+ emotion presets
- **Conversation**: Create multi-speaker dialogues. Qwen mode offers 9 preset voices; VibeVoice mode supports up to 4 custom speakers and up to 90 minutes of continuous speech. Useful for podcasts, audiobooks, and long-form narratives
- **Voice Presets**: Generate speech using pre-built premium voices with optional style instructions
- **Voice Design**: Create voices from natural language descriptions (age, gender, emotion, accent) without any reference audio
- **Train Custom Voices**: Fine-tune custom voice models with your own training data. Complete pipeline from dataset management to training and evaluation
- **Prep Audio**: Trim, normalize, convert to mono, denoise (via DeepFilterNet), and transcribe audio
## Models
| Model | Sizes | Use Case |
|-------|-------|----------|
| Qwen3-TTS Base | 0.6B, 1.7B | Voice cloning from samples |
| Qwen3-TTS CustomVoice | 0.6B, 1.7B | Premium speakers with style control |
| Qwen3-TTS VoiceDesign | 1.7B | Voice design from descriptions |
| VibeVoice-TTS | 1.5B, Large, Large-4bit | Voice cloning and long-form multi-speaker |
| VibeVoice-ASR | Large | Audio transcription |
| Whisper | Medium | Audio transcription |
Models are automatically downloaded from HuggingFace on first use.
## Requirements
- Python 3.10+
- CUDA-compatible GPU (8GB+ VRAM recommended)
- SOX and FFMPEG for audio processing
- Flash Attention 2 (optional, for faster generation)
## References
- Source code: https://github.com/FranckyB/Voice-Clone-Studio
- Qwen3-TTS: https://github.com/QwenLM/Qwen3-TTS
- VibeVoice: https://github.com/microsoft/VibeVoice
- LuxTTS: https://github.com/ysharma3501/LuxTTS
## Related
- [[Qwen3-TTS]]
- [[VibeVoice]]
- [[Text-to-Speech (TTS)]]
- [[Voice Cloning]]
- [[Gradio]]