# Moonshine Moonshine is a family of speech-to-text models by Useful Sensors (co-founded by [[Pete Warden]]), optimized for fast and accurate ASR on resource-constrained devices. 5x-15x faster than [[Whisper]] while matching or exceeding its accuracy. Designed for real-time, on-device use cases. English models under MIT license. ## Model variants - **Moonshine Tiny**: 27M parameters (~190MB) - **Moonshine Base**: 62M parameters (~400MB) - Non-English variants available for Arabic, Chinese, Japanese, Korean, Spanish, Ukrainian, Vietnamese ## Key features - Variable-length input windows (vs Whisper's fixed 30-second chunks), eliminating zero-padding overhead - Runs in ~8MB of RAM for short utterances, viable for embedded hardware (Raspberry Pi) - Fully on-device, no network connectivity required - Backends: Hugging Face Transformers, ONNX (recommended for edge), Keras (deprecated) - JavaScript implementation (MoonshineJS) for browser deployment ## Performance - Moonshine Tiny: 12.66 WER on English benchmarks (vs Whisper Tiny 12.81 WER), with 5x less compute on 10-second clips - 1.7x speed boost over Whisper through architectural optimizations - Compute scales with input audio length rather than fixed window ## Licensing - English models and code: MIT - Non-English models: Moonshine AI Community License (free for researchers, developers, and businesses under $1M annual revenue) ## References - Source code: https://github.com/usefulsensors/moonshine - HuggingFace: https://huggingface.co/UsefulSensors/moonshine - Announcement: https://petewarden.com/2024/10/21/introducing-moonshine-the-new-state-of-the-art-for-speech-to-text/ - Paper: https://arxiv.org/abs/2410.15608 ## Related - [[Whisper]] - [[Parakeet V3]] - [[Speech-to-Text (STT)]]