LiteRT-LM - DeveloPassion

# LiteRT-LM LiteRT-LM is Google AI Edge's production-ready, open-source inference framework for deploying [[Large Language Models (LLMs)]] on edge devices. It powers on-device GenAI in Chrome, Chromebook Plus, and Pixel Watch, and is the framework behind the widest deployments of Gemini Nano across Google products. ## Key Features - Cross-platform: Android, iOS, Web, Desktop (Windows, macOS, Linux), IoT (Raspberry Pi) - GPU and NPU (Neural Processing Unit) acceleration - Multi-modality: vision and audio inputs - Function calling / tool use for agentic workflows - Supports [[Gemma]], Llama, Phi-4, and Qwen models - [[Gemma 4]] support added in v0.10.1 (April 2026), including the E2B (Edge 2B) variant ## Architecture C++-based LLM pipeline framework built on LiteRT. Handles session cloning, KV-cache management, prompt caching/scoring, and stateful [[AI Inference]] across multiple models and processing steps. Language composition: 76.7% C++, 7% CMake, 5% Starlark, 4.1% Rust, 3.9% Python, 1.8% Kotlin. ## APIs - Kotlin (stable): Android/JVM - Python (stable): prototyping - C++ (stable): native performance - Swift (in development): iOS/macOS ## CLI Run models locally without code: ```bash litert-lm run --from-huggingface-repo=[model] --prompt="[query]" ``` ## License Apache-2.0. ## References - https://github.com/google-ai-edge/LiteRT-LM - https://ai.google.dev/edge/litert-lm/overview - https://developers.googleblog.com/on-device-genai-in-chrome-chromebook-plus-and-pixel-watch-with-litert-lm/ ## Related - [[Artificial Intelligence (AI)]] - [[Large Language Models (LLMs)]] - [[AI Inference]] - [[AI Open Weight Models]] - [[Gemma]] - [[Gemma 4]]