# NVIDIA Nemotron
NVIDIA Nemotron is a family of open-source [[Large Language Models (LLMs)]] with open weights, training data, and training recipes. Purpose-built for agentic AI; building specialized [[AI Agents]] with reasoning, coding, math, tool calling, and instruction following capabilities. Published on Hugging Face under permissive licenses (NVIDIA Open Model License, CC-BY-4.0, Apache 2.0).
## Architecture
The Nemotron 3 family uses a hybrid Mamba-Transformer MoE (Mixture-of-Experts) architecture with up to 1M token context windows. Mamba-2 layers handle efficient long-sequence processing, attention layers provide precise token interactions, and MoE ensures parameter efficiency by activating only a fraction of total parameters per forward pass.
All Nemotron 3 models support a configurable "thinking" mode. When enabled, the model generates a reasoning trace (chain-of-thought) before the final answer, improving accuracy on hard problems. This can be toggled via a flag in the chat template.
## Model lineup
| Model | Total / Active Params | Context | Use case |
|---|---|---|---|
| Nemotron 3 Nano 4B | 4B / 4B | 256K | Edge, ultra-lightweight |
| Nemotron 3 Nano 30B | 30B / 3.5B | 1M | Edge/PC, targeted agents |
| Nemotron Cascade 2 | 30B / 3B | 256K | Math/coding (IMO/IOI gold) |
| Nemotron 3 Super | 120B / 12B | 1M | Multi-agent, production |
| Llama Nemotron Ultra | 253B | - | Enterprise, highest accuracy |
Legacy models: Nemotron 70B (Llama-3.1 based, RLHF), Nemotron Mini 4B (distilled, 4K context).
## Capabilities
- **Agentic AI**: tool calling, multi-step reasoning, [[AI Agent Orchestration]]
- **Coding**: competitive programming, code generation, SWE tasks
- **Math**: Olympiad-level performance (IMO/IOI gold medals with Cascade 2)
- **RAG**: [[Retrieval-Augmented Generation (RAG)]] with Nemotron RAG models
- **Safety**: content moderation, jailbreak detection, PII detection
- **Multilingual**: English, German, Spanish, French, Italian, Japanese, Chinese
## Ecosystem
Beyond models, Nemotron provides open datasets (10T+ language tokens, 18M+ SFT samples), fully reproducible training recipes (pretraining, SFT, RL), and deployment cookbooks. Compatible with [[Ollama]], vLLM, SGLang, TensorRT-LLM, and NVIDIA NIM microservices. Runs on any NVIDIA GPU from edge to data center.
Nemotron 3 Super is the default inference backend for [[NemoClaw]].
## References
- https://developer.nvidia.com/nemotron
- https://github.com/NVIDIA-NeMo/Nemotron
- https://ollama.com/library/nemotron
## Related
- [[Large Language Models (LLMs)]]
- [[AI Agents]]
- [[AI Agent Orchestration]]
- [[Retrieval-Augmented Generation (RAG)]]
- [[NeMo]]
- [[NemoClaw]]
- [[OpenShell]]
- [[Ollama]]
- [[Reinforcement Learning From Human Feedback (RLHF)]]