# Voice Clone Studio Voice Clone Studio is a modular [[Gradio]]-based web UI for voice cloning, voice design, and multi-speaker conversation. It is powered by [[Qwen3-TTS]], [[VibeVoice]], and [[LuxTTS]], and supports both [[Whisper]] and VibeVoice ASR for automatic transcription. Written in [[Python]], licensed under Apache 2.0. ## Features - **Voice Clone**: Clone voices from short audio samples (3-10 seconds). Supports multiple engines (Qwen3-TTS 0.6B/1.7B, VibeVoice 1.5B/Large), voice prompt caching, seed control, 40+ emotion presets - **Conversation**: Create multi-speaker dialogues. Qwen mode offers 9 preset voices; VibeVoice mode supports up to 4 custom speakers and up to 90 minutes of continuous speech. Useful for podcasts, audiobooks, and long-form narratives - **Voice Presets**: Generate speech using pre-built premium voices with optional style instructions - **Voice Design**: Create voices from natural language descriptions (age, gender, emotion, accent) without any reference audio - **Train Custom Voices**: Fine-tune custom voice models with your own training data. Complete pipeline from dataset management to training and evaluation - **Prep Audio**: Trim, normalize, convert to mono, denoise (via DeepFilterNet), and transcribe audio ## Models | Model | Sizes | Use Case | |-------|-------|----------| | Qwen3-TTS Base | 0.6B, 1.7B | Voice cloning from samples | | Qwen3-TTS CustomVoice | 0.6B, 1.7B | Premium speakers with style control | | Qwen3-TTS VoiceDesign | 1.7B | Voice design from descriptions | | VibeVoice-TTS | 1.5B, Large, Large-4bit | Voice cloning and long-form multi-speaker | | VibeVoice-ASR | Large | Audio transcription | | Whisper | Medium | Audio transcription | Models are automatically downloaded from HuggingFace on first use. ## Requirements - Python 3.10+ - CUDA-compatible GPU (8GB+ VRAM recommended) - SOX and FFMPEG for audio processing - Flash Attention 2 (optional, for faster generation) ## References - Source code: https://github.com/FranckyB/Voice-Clone-Studio - Qwen3-TTS: https://github.com/QwenLM/Qwen3-TTS - VibeVoice: https://github.com/microsoft/VibeVoice - LuxTTS: https://github.com/ysharma3501/LuxTTS ## Related - [[Qwen3-TTS]] - [[VibeVoice]] - [[Text-to-Speech (TTS)]] - [[Voice Cloning]] - [[Gradio]]