# NVIDIA Nemotron NVIDIA Nemotron is a family of open-source [[Large Language Models (LLMs)]] with open weights, training data, and training recipes. Purpose-built for agentic AI; building specialized [[AI Agents]] with reasoning, coding, math, tool calling, and instruction following capabilities. Published on Hugging Face under permissive licenses (NVIDIA Open Model License, CC-BY-4.0, Apache 2.0). ## Architecture The Nemotron 3 family uses a hybrid Mamba-Transformer MoE (Mixture-of-Experts) architecture with up to 1M token context windows. Mamba-2 layers handle efficient long-sequence processing, attention layers provide precise token interactions, and MoE ensures parameter efficiency by activating only a fraction of total parameters per forward pass. All Nemotron 3 models support a configurable "thinking" mode. When enabled, the model generates a reasoning trace (chain-of-thought) before the final answer, improving accuracy on hard problems. This can be toggled via a flag in the chat template. ## Model lineup | Model | Total / Active Params | Context | Use case | |---|---|---|---| | Nemotron 3 Nano 4B | 4B / 4B | 256K | Edge, ultra-lightweight | | Nemotron 3 Nano 30B | 30B / 3.5B | 1M | Edge/PC, targeted agents | | Nemotron Cascade 2 | 30B / 3B | 256K | Math/coding (IMO/IOI gold) | | Nemotron 3 Super | 120B / 12B | 1M | Multi-agent, production | | Llama Nemotron Ultra | 253B | - | Enterprise, highest accuracy | Legacy models: Nemotron 70B (Llama-3.1 based, RLHF), Nemotron Mini 4B (distilled, 4K context). ## Capabilities - **Agentic AI**: tool calling, multi-step reasoning, [[AI Agent Orchestration]] - **Coding**: competitive programming, code generation, SWE tasks - **Math**: Olympiad-level performance (IMO/IOI gold medals with Cascade 2) - **RAG**: [[Retrieval-Augmented Generation (RAG)]] with Nemotron RAG models - **Safety**: content moderation, jailbreak detection, PII detection - **Multilingual**: English, German, Spanish, French, Italian, Japanese, Chinese ## Ecosystem Beyond models, Nemotron provides open datasets (10T+ language tokens, 18M+ SFT samples), fully reproducible training recipes (pretraining, SFT, RL), and deployment cookbooks. Compatible with [[Ollama]], vLLM, SGLang, TensorRT-LLM, and NVIDIA NIM microservices. Runs on any NVIDIA GPU from edge to data center. Nemotron 3 Super is the default inference backend for [[NemoClaw]]. ## References - https://developer.nvidia.com/nemotron - https://github.com/NVIDIA-NeMo/Nemotron - https://ollama.com/library/nemotron ## Related - [[Large Language Models (LLMs)]] - [[AI Agents]] - [[AI Agent Orchestration]] - [[Retrieval-Augmented Generation (RAG)]] - [[NeMo]] - [[NemoClaw]] - [[OpenShell]] - [[Ollama]] - [[Reinforcement Learning From Human Feedback (RLHF)]]