Voxtral Transcribe 2 - DeveloPassion

# Voxtral Transcribe 2 Voxtral Transcribe 2 is a family of speech-to-text models by [[Mistral AI]], released in February 2026. Includes two variants: Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications. Supports 13 languages. Successor to the original Voxtral (July 2025). ## Model variants - **Voxtral Mini Transcribe V2**: Batch transcription with speaker diarization, context biasing, and word-level timestamps. Processes audio up to 3 hours per request - **Voxtral Realtime**: 4B parameter streaming model for live transcription. Latency configurable down to sub-200ms. Open weights under Apache 2.0 ## Key features - Speaker diarization with precise start/end times - Context biasing (up to 100 words/phrases to guide spelling) - Word-level timestamps - Noise robustness in challenging acoustic environments - On-device deployment (4B parameter footprint) - GDPR and HIPAA-compliant deployment options ## Supported languages English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, Dutch (13 languages) ## Performance - Mini: ~4% WER on FLEURS, outperforms GPT-4o mini Transcribe, Gemini 2.5 Flash, AssemblyAI Universal, Deepgram Nova - Realtime: matches Mini quality at 2.4s delay; within 1-2% WER at 480ms delay - Mini processes audio ~3x faster than ElevenLabs Scribe v2 ## Pricing - Mini: $0.003/minute - Realtime: $0.006/minute ## References - Announcement: https://mistral.ai/news/voxtral-transcribe-2 - HuggingFace: https://huggingface.co/mistralai ## Related - [[Whisper]] - [[Parakeet V3]] - [[Speech-to-Text (STT)]] - [[Mistral AI]]