NVIDIA Nemotron

by NVIDIA

Open-weight model family designed for agentic workflows and long-context automation — not a chatbot, a builder’s tool

See https://build.nvidia.com/nvidia/nemotron-3

Model family

Model	Params	Context	Notes
Nemotron 3 Super	120B	1M tokens	Flagship; MoE + Mamba architecture
Nemotron 3 Nano	Smaller	—	Lightweight variant
Nemotron Cascade	—	—	Post-training variant with Cascade RL

Features

1M token context window — designed for long-context agent workflows, document synthesis, multi-doc reasoning
MoE + Mamba architecture — Mixture-of-Experts combined with Mamba state-space model; efficient at scale
Open weights — fully open; run locally via Ollama or LM Studio
Agent-first design — built for automation pipelines, not consumer chat; optimized for multi-step agent tasks
NVIDIA NIM compatible — available via NVIDIA’s inference microservices platform

Superpowers

Nemotron 3 Super’s defining characteristic is the 1M token context window in a fully open-weight model. This makes it uniquely suited for agent workflows where the agent needs to hold an entire codebase, document corpus, or conversation history in context simultaneously. NVIDIA positioned it explicitly for business automation rather than chat — making it a strong choice for agentic pipelines where context depth matters more than raw speed. Run locally via Ollama on appropriate hardware (120B requires significant VRAM).

Local setup

ollama pull nemotron3-super  # or equivalent model tag  
ollama run nemotron3-super

Pricing

Open weights — free to run locally
NVIDIA NIM API — cloud-hosted inference (check current pricing)

ThirdBrAIn.tech

Explorer

NVIDIA Nemotron

NVIDIA Nemotron

Model family

Features

Superpowers

Local setup

Pricing

Filter Videos

Tags

Channels

Favorites

Table of Contents

Recent Updates

Video topics

Arcade.ai MCP Gateway

Langbase

Manus Academy

Kimi K2 Thinking

Codestral 22B

Mistral 7B

Mistral Large 2

Mixtral 8x7B

Integrated Frameworks for Operations

Backlinks