Nemotron 3 Ultra: NVIDIA’s top U.S. open model

NVIDIA Nemotron 3 Ultra arrives as the top U.S. open model

Published 04 Jun 2026, 06:04 Updated 04 Jun 2026, 06:23 3 min read Talia Emily Rogic (Editor-in-Chief & Founder)

NVIDIA Nemotron 3 Ultra open AI model visual for agent workflows

NVIDIA’s Nemotron 3 Ultra brings a 550B-parameter open MoE model to agent workflows, scoring 48 on Artificial Analysis while Kimi K2.6 still leads at 54.

NVIDIA’s Nemotron 3 Ultra is now the company’s clearest attempt to turn its open-model work into a production product for long-running AI agents. The company describes Ultra as a 550-billion-parameter mixture-of-experts model built for coding, research and enterprise workflows, with about 55 billion parameters active during inference. NVIDIA’s official availability plan points to Hugging Face, ModelScope, OpenRouter and build.nvidia.com as NVIDIA NIM microservices on June 4. That makes this more than a benchmark headline: it is a model NVIDIA wants developers to deploy through the same agent stack it is building around NemoClaw, OpenShell and CUDA-X skills.

What NVIDIA is claiming

The main technical claim is efficiency at scale. NVIDIA says Nemotron 3 Ultra can deliver up to five times faster inference and up to 30 percent lower cost than comparable open frontier models in its class. Those are vendor claims, not independent production guarantees, so they should be read as a performance target until teams test the model in their own workloads. Still, the direction is clear. NVIDIA is not presenting Ultra as a general chatbot release alone; it is positioning the model for agents that run longer tasks, call tools, write code and coordinate work across enterprise systems.

That distinction matters because model quality is only one part of the deployment question. A strong open model for agents needs latency, memory behavior, runtime controls, orchestration support and guardrails. NVIDIA’s announcement names Hermes Agent, LangChain Deep Agents, OpenClaw, OpenHands and OpenCode among the harnesses the model is post-trained for. That gives the launch a practical enterprise angle, especially for teams already building around the NVIDIA software stack.

The benchmark context is strong, but not absolute

Artificial Analysis gives Nemotron 3 Ultra a score of 48 on its Intelligence Index, making it the highest-scoring U.S. open-weights model in that evaluation. The same analysis places Kimi K2.6 at 54, so the global open-model lead still sits outside the U.S. ecosystem. That is the key editorial boundary: Ultra is a major U.S. open-model result, but it is not the overall leader in the benchmark NVIDIA and Artificial Analysis are discussing.

Speed is another important part of the story. Artificial Analysis says a pre-release DeepInfra endpoint served Ultra at more than 300 tokens per second, far above the 50–100 tokens per second range it cites for several large Chinese peer models currently in market. That figure is promising for agent workloads, where long chains of tool calls can make latency expensive. But pre-release endpoint performance should not be treated as a universal result for every provider, quantization, batch size or enterprise deployment.

Why it matters for developers

For developers, the practical question is not whether Nemotron 3 Ultra “beats” every rival. It is whether an open model with this level of intelligence, speed and enterprise packaging can reduce dependence on closed APIs for agent workflows. Teams working on regulated data, private repositories or domain-specific assistants may see value in a model that can be deployed through controlled infrastructure rather than only through a public chat service.

The launch also connects to NVIDIA’s wider Computex push around local and enterprise AI hardware. The company’s recent RTX Spark and DGX Station story shows the same direction: more AI work should run closer to the developer, the enterprise or the workstation when the economics make sense. Ultra itself is still a very large model, so most users will reach it through APIs, cloud partners or NIM services rather than on a normal laptop. Smaller Nemotron models remain more realistic for local PC-class experiments.

The safest reading is therefore measured. Nemotron 3 Ultra gives NVIDIA the strongest U.S. open-model result currently visible in the Artificial Analysis ranking and a credible enterprise deployment story. It also shows that the global open-model race is still highly competitive, with Chinese models setting the top benchmark mark. The next test is not the launch post; it is whether Ultra’s speed, cost and agent reliability hold up in real production systems.

Source references

NVIDIA Newsroom - Official announcement of Nemotron 3 Ultra, 550B MoE positioning, agent platforms, speed/cost claims and June 4 availability route
Artificial Analysis - Benchmark context: 550B/55B active, Intelligence Index 48, Kimi K2.6 at 54 and pre-release speed above 300 tokens per second
NVIDIA Blog - Jensen Huang GTC Taipei/Computex keynote context and NVIDIA positioning for open models, agentic AI and lower-cost inference
NVIDIA Nemotron docs - Technical documentation context for Nemotron 3 Ultra base model and deployment-oriented model family details
The Decoder - Independent technology coverage of Nemotron 3 Ultra’s open-model ranking and the gap to Chinese open-weight competitors