# Moonshine
Moonshine is a family of speech-to-text models by Useful Sensors (co-founded by [[Pete Warden]]), optimized for fast and accurate ASR on resource-constrained devices. 5x-15x faster than [[Whisper]] while matching or exceeding its accuracy. Designed for real-time, on-device use cases. English models under MIT license.
## Model variants
- **Moonshine Tiny**: 27M parameters (~190MB)
- **Moonshine Base**: 62M parameters (~400MB)
- Non-English variants available for Arabic, Chinese, Japanese, Korean, Spanish, Ukrainian, Vietnamese
## Key features
- Variable-length input windows (vs Whisper's fixed 30-second chunks), eliminating zero-padding overhead
- Runs in ~8MB of RAM for short utterances, viable for embedded hardware (Raspberry Pi)
- Fully on-device, no network connectivity required
- Backends: Hugging Face Transformers, ONNX (recommended for edge), Keras (deprecated)
- JavaScript implementation (MoonshineJS) for browser deployment
## Performance
- Moonshine Tiny: 12.66 WER on English benchmarks (vs Whisper Tiny 12.81 WER), with 5x less compute on 10-second clips
- 1.7x speed boost over Whisper through architectural optimizations
- Compute scales with input audio length rather than fixed window
## Licensing
- English models and code: MIT
- Non-English models: Moonshine AI Community License (free for researchers, developers, and businesses under $1M annual revenue)
## References
- Source code: https://github.com/usefulsensors/moonshine
- HuggingFace: https://huggingface.co/UsefulSensors/moonshine
- Announcement: https://petewarden.com/2024/10/21/introducing-moonshine-the-new-state-of-the-art-for-speech-to-text/
- Paper: https://arxiv.org/abs/2410.15608
## Related
- [[Whisper]]
- [[Parakeet V3]]
- [[Speech-to-Text (STT)]]