NVIDIA Nemotron

by NVIDIA

Open-weight model family designed for agentic workflows and long-context automation — not a chatbot, a builder’s tool

See https://build.nvidia.com/nvidia/nemotron-3

Model family

ModelParamsContextNotes
Nemotron 3 Super120B1M tokensFlagship; MoE + Mamba architecture
Nemotron 3 NanoSmallerLightweight variant
Nemotron CascadePost-training variant with Cascade RL

Features

  • 1M token context window — designed for long-context agent workflows, document synthesis, multi-doc reasoning
  • MoE + Mamba architecture — Mixture-of-Experts combined with Mamba state-space model; efficient at scale
  • Open weights — fully open; run locally via Ollama or LM Studio
  • Agent-first design — built for automation pipelines, not consumer chat; optimized for multi-step agent tasks
  • NVIDIA NIM compatible — available via NVIDIA’s inference microservices platform

Superpowers

Nemotron 3 Super’s defining characteristic is the 1M token context window in a fully open-weight model. This makes it uniquely suited for agent workflows where the agent needs to hold an entire codebase, document corpus, or conversation history in context simultaneously. NVIDIA positioned it explicitly for business automation rather than chat — making it a strong choice for agentic pipelines where context depth matters more than raw speed. Run locally via Ollama on appropriate hardware (120B requires significant VRAM).

Local setup

ollama pull nemotron3-super  # or equivalent model tag  
ollama run nemotron3-super  

Pricing

  • Open weights — free to run locally
  • NVIDIA NIM API — cloud-hosted inference (check current pricing)