# LiteRT-LM
LiteRT-LM is Google AI Edge's production-ready, open-source inference framework for deploying [[Large Language Models (LLMs)]] on edge devices. It powers on-device GenAI in Chrome, Chromebook Plus, and Pixel Watch, and is the framework behind the widest deployments of Gemini Nano across Google products.
## Key Features
- Cross-platform: Android, iOS, Web, Desktop (Windows, macOS, Linux), IoT (Raspberry Pi)
- GPU and NPU (Neural Processing Unit) acceleration
- Multi-modality: vision and audio inputs
- Function calling / tool use for agentic workflows
- Supports [[Gemma]], Llama, Phi-4, and Qwen models
- [[Gemma 4]] support added in v0.10.1 (April 2026), including the E2B (Edge 2B) variant
## Architecture
C++-based LLM pipeline framework built on LiteRT. Handles session cloning, KV-cache management, prompt caching/scoring, and stateful [[AI Inference]] across multiple models and processing steps.
Language composition: 76.7% C++, 7% CMake, 5% Starlark, 4.1% Rust, 3.9% Python, 1.8% Kotlin.
## APIs
- Kotlin (stable): Android/JVM
- Python (stable): prototyping
- C++ (stable): native performance
- Swift (in development): iOS/macOS
## CLI
Run models locally without code:
```bash
litert-lm run --from-huggingface-repo=[model] --prompt="[query]"
```
## License
Apache-2.0.
## References
- https://github.com/google-ai-edge/LiteRT-LM
- https://ai.google.dev/edge/litert-lm/overview
- https://developers.googleblog.com/on-device-genai-in-chrome-chromebook-plus-and-pixel-watch-with-litert-lm/
## Related
- [[Artificial Intelligence (AI)]]
- [[Large Language Models (LLMs)]]
- [[AI Inference]]
- [[AI Open Weight Models]]
- [[Gemma]]
- [[Gemma 4]]