# Voxtral Transcribe 2
Voxtral Transcribe 2 is a family of speech-to-text models by [[Mistral AI]], released in February 2026. Includes two variants: Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications. Supports 13 languages. Successor to the original Voxtral (July 2025).
## Model variants
- **Voxtral Mini Transcribe V2**: Batch transcription with speaker diarization, context biasing, and word-level timestamps. Processes audio up to 3 hours per request
- **Voxtral Realtime**: 4B parameter streaming model for live transcription. Latency configurable down to sub-200ms. Open weights under Apache 2.0
## Key features
- Speaker diarization with precise start/end times
- Context biasing (up to 100 words/phrases to guide spelling)
- Word-level timestamps
- Noise robustness in challenging acoustic environments
- On-device deployment (4B parameter footprint)
- GDPR and HIPAA-compliant deployment options
## Supported languages
English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, Dutch (13 languages)
## Performance
- Mini: ~4% WER on FLEURS, outperforms GPT-4o mini Transcribe, Gemini 2.5 Flash, AssemblyAI Universal, Deepgram Nova
- Realtime: matches Mini quality at 2.4s delay; within 1-2% WER at 480ms delay
- Mini processes audio ~3x faster than ElevenLabs Scribe v2
## Pricing
- Mini: $0.003/minute
- Realtime: $0.006/minute
## References
- Announcement: https://mistral.ai/news/voxtral-transcribe-2
- HuggingFace: https://huggingface.co/mistralai
## Related
- [[Whisper]]
- [[Parakeet V3]]
- [[Speech-to-Text (STT)]]
- [[Mistral AI]]