NVIDIA has started putting Vera, its first standalone CPU built around agentic AI workloads, into the hands of major AI infrastructure customers. The first systems went to Anthropic in San Francisco, OpenAI in Mission Bay and SpaceXAI in Palo Alto, with Oracle Cloud Infrastructure receiving a system shortly afterward in Santa Clara. NVIDIA said Ian Buck, its vice president for hyperscale and high-performance computing, personally handled the deliveries and walked the teams through the hardware.

The theatre of the hand-off is easy to overread. The more important point is technical: NVIDIA is trying to make the CPU part of the AI stack as strategic as the GPU. Vera is not being positioned as a training accelerator. It is aimed at the orchestration work around AI agents — tool calls, sandbox execution, retrieval, memory handling and the coordination of multi-step tasks that can sit outside the GPU-heavy model call itself.

Why Vera is not another GPU story

For the past several years, NVIDIA’s AI business has been defined by GPU demand. Vera widens that story. NVIDIA describes the CPU as the successor to Grace and as a processor designed for the moment when models move from answering prompts to acting across tools and systems. That distinction matters because agent workflows can spend meaningful time outside the model: launching code sandboxes, waiting on file operations, querying databases, coordinating services or preparing the next model call.

In that context, a faster CPU is not a minor background component. It can affect how quickly an agent moves through a task, how many simultaneous tool calls a system can manage and how much infrastructure cost sits around the model itself. NVIDIA’s own framing is aggressive, but the underlying direction is credible: AI infrastructure is becoming a full system problem, not only a GPU procurement problem.

The performance claims need careful reading

NVIDIA says Vera Rubin NVL72 configurations can reduce token costs, speed up agent sandboxes and improve enterprise data-query workloads compared with older CPU setups. Those figures should be read as NVIDIA benchmark claims, not as universal production guarantees. Real-world results will depend on workload mix, software support, memory layout, networking and whether customers are actually running agent-heavy applications rather than conventional inference.

The named recipients also matter. Anthropic and OpenAI both operate large model infrastructure; Oracle is a cloud supplier trying to win more AI workloads; SpaceXAI points to the broader xAI and infrastructure orbit around Elon Musk’s companies. Getting hardware into these environments gives NVIDIA early feedback from customers that can stress the system in ways smaller demos cannot.

Why the delivery still matters

The most useful reading is not that Vera has suddenly replaced the GPU. It has not. The GPU remains the centre of training and high-throughput inference. Vera matters because it shows NVIDIA trying to own more of the machine around the model. If agentic systems become a larger share of enterprise AI use, CPU bottlenecks, memory bandwidth, scheduling and tool execution will become business issues rather than low-level engineering details.

That is the strategic risk for Intel and AMD. NVIDIA already sells the accelerator, the networking layer and the software stack around AI clusters. A credible CPU gives it another point of control. It also gives cloud providers and AI labs a reason to evaluate NVIDIA systems as integrated platforms rather than collections of separate chips.

For buyers, the practical question is narrower: does Vera make a specific workload cheaper or faster enough to justify migration? For NVIDIA, the question is larger: can the company turn agent infrastructure into a new market before competitors define it differently? The first Vera deliveries do not answer that yet, but they mark the moment the argument moves from keynote slide to customer testing.