# Text-to-Speech (TTS)
Text-to-Speech (TTS) is technology that converts written text into spoken audio using speech synthesis models. Modern TTS systems use deep learning to produce highly natural-sounding speech.
## How it works
Modern neural TTS typically follows a pipeline:
1. **Text input**: The system receives written text
2. **Linguistic analysis**: Grammar, sentence structure, and phonetics are analyzed
3. **Prosody generation**: Rhythm, pitch, and intonation are determined for natural delivery
4. **Audio waveform generation**: A neural vocoder converts the intermediate representation into audio
## Approaches
- **Concatenative synthesis**: Stitches together pre-recorded speech segments. Sounds natural but inflexible
- **Parametric synthesis**: Generates speech from statistical models of acoustic features
- **Neural synthesis**: Uses deep learning for end-to-end generation. Current state of the art
## Key neural TTS models (historical)
- **WaveNet** (Google DeepMind): Directly generates raw audio waveforms via autoregressive deep generative model
- **Tacotron 2** (Google): Converts text to mel-spectrograms using an encoder-decoder architecture with attention
- **FastSpeech** (Microsoft Research, 2019): Non-autoregressive approach addressing speed limitations of Tacotron 2
## Modern open-source TTS models
- [[Qwen3-TTS]] (Alibaba)
- [[VibeVoice]] (Microsoft)
- [[LuxTTS]]
- Chatterbox ([[Resemble.AI]], MIT license)
- Coqui TTS
- Bark (Suno)
- Fish Speech
## Notable proprietary TTS models
- [[Gemini 3.1 Flash TTS]] (Google) — controllable, expressive, inline audio tags
## Notable proprietary TTS models
- [[Gemini 3.1 Flash TTS]] (Google) — controllable, expressive, inline audio tags
## Applications
- Voice assistants
- Accessibility (screen readers)
- E-learning and audiobooks
- Podcasts and media production
- Multilingual content delivery
- [[Voice Cloning]]
## References
- Wikipedia: https://en.wikipedia.org/wiki/Speech_synthesis
## Related
- [[Voice Cloning]]
- [[Gemini 3.1 Flash TTS]]
- [[Qwen3-TTS]]
- [[VibeVoice]]
- [[Voice Clone Studio]]
- [[Speech-to-Text (STT)]]
- [[Resemble.AI]]