# Qwen3-TTS
Qwen3-TTS is an open-source [[Text-to-Speech (TTS)]] model series developed by the Qwen team at Alibaba Cloud. Released in January 2026 under Apache 2.0. Trained on 5+ million hours of speech data across 10 languages.
Uses a Dual-Track hybrid streaming architecture with a discrete multi-codebook LM, enabling both streaming and non-streaming generation from a single model. End-to-end synthesis latency reaches 97ms.
## Model variants
- **Qwen3-TTS-12Hz-1.7B**: Flagship model, best quality and control
- **Qwen3-TTS-12Hz-0.6B**: Lightweight version balancing efficiency with quality
- **Qwen3-TTS CustomVoice**: Premium pre-built speakers with style instructions
- **Qwen3-TTS VoiceDesign (1.7B)**: Creates voices from natural language descriptions, no reference audio needed
## Supported languages
Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian. Includes multiple dialectal voice profiles.
## Key features
- [[Voice Cloning]] from 3 seconds of reference audio (0.95 similarity score)
- Voice design from natural language descriptions
- 40+ emotion presets
- Streaming and non-streaming generation
- 97ms end-to-end latency
- 9 pre-built premium speakers (Vivian, Serena, Uncle_Fu, Dylan, Eric, Ryan, Aiden, Ono_Anna, Sohee)
## References
- Announcement: https://qwen.ai/blog?id=qwen3tts-0115
- Source code: https://github.com/QwenLM/Qwen3-TTS
- Demo: https://huggingface.co/spaces/Qwen/Qwen3-TTS
## Related
- [[Text-to-Speech (TTS)]]
- [[Voice Cloning]]
- [[Voice Clone Studio]]
- [[VibeVoice]]
- [[Qwen]]