NVIDIA Nemotron
by NVIDIA
Open-weight model family designed for agentic workflows and long-context automation — not a chatbot, a builder’s tool
See https://build.nvidia.com/nvidia/nemotron-3
Model family
| Model | Params | Context | Notes |
|---|---|---|---|
| Nemotron 3 Super | 120B | 1M tokens | Flagship; MoE + Mamba architecture |
| Nemotron 3 Nano | Smaller | — | Lightweight variant |
| Nemotron Cascade | — | — | Post-training variant with Cascade RL |
Features
- 1M token context window — designed for long-context agent workflows, document synthesis, multi-doc reasoning
- MoE + Mamba architecture — Mixture-of-Experts combined with Mamba state-space model; efficient at scale
- Open weights — fully open; run locally via Ollama or LM Studio
- Agent-first design — built for automation pipelines, not consumer chat; optimized for multi-step agent tasks
- NVIDIA NIM compatible — available via NVIDIA’s inference microservices platform
Superpowers
Nemotron 3 Super’s defining characteristic is the 1M token context window in a fully open-weight model. This makes it uniquely suited for agent workflows where the agent needs to hold an entire codebase, document corpus, or conversation history in context simultaneously. NVIDIA positioned it explicitly for business automation rather than chat — making it a strong choice for agentic pipelines where context depth matters more than raw speed. Run locally via Ollama on appropriate hardware (120B requires significant VRAM).
Local setup
ollama pull nemotron3-super # or equivalent model tag
ollama run nemotron3-super Pricing
- Open weights — free to run locally
- NVIDIA NIM API — cloud-hosted inference (check current pricing)