Parakeet V3 - DeveloPassion

# Parakeet V3 Parakeet V3 (parakeet-tdt-0.6b-v3) is a 600-million-parameter multilingual ASR model by NVIDIA, part of the NeMo Parakeet series. Licensed under CC-BY-4.0. Extends v2 by expanding language support from English-only to 25 European languages with automatic language detection. ## Key features - 25 European languages with automatic language detection (no prompting needed) - Speeds exceeding 2000x real-time (RTFx) on the Open ASR leaderboard, among the fastest ASR models available - Long audio support: up to 24 minutes with full attention (A100 80GB), up to 3 hours with local attention - RNN-Transducer architecture enables streaming recognition with minimal latency - Trained on the Granary multilingual corpus ## Model variants - **parakeet-tdt-0.6b-v3**: 600M parameters, 25 languages - **parakeet-tdt-1.1b**: 1.1B parameters, higher accuracy ## How it compares to Whisper Parakeet V3 is significantly faster than [[Whisper]] variants while achieving competitive or superior accuracy on English benchmarks. Its RNN-Transducer architecture enables streaming use cases that Whisper's encoder-decoder approach does not natively support. ## References - HuggingFace: https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3 - NVIDIA blog: https://developer.nvidia.com/blog/pushing-the-boundaries-of-speech-recognition-with-nemo-parakeet-asr-models/ ## Related - [[Whisper]] - [[Speech-to-Text (STT)]] - [[NeMo]]