Forward-looking: Nvidia is poised to move beyond the graphics processors that powered its rise to dominance. Nvidia GTC kicks off this week – now branded more explicitly as an "AI conference" – where chief executive Jensen Huang is expected to introduce a new processor designed specifically for inference, according to people familiar with the company's plans.

The shift would mark a rare departure from the philosophy that has guided Nvidia for most of its 33-year history: that a single class of GPU can handle every stage of AI computing, from training massive models to delivering fast, real-time responses. As competition intensifies and demand grows for quicker, more efficient AI output, the company is beginning to explore a more specialized approach.

The upcoming chip is said to be based on technology from Groq, the startup Nvidia acquired late last year in a transaction valued at roughly $20 billion. The acquisition combined a licensing partnership with the recruitment of Groq's core engineering team, including founder and former Google chip designer Jonathan Ross.

Groq's hardware – known as language processing units, or LPUs – was originally designed to accelerate the inference side of machine learning, focusing on rapid reasoning and text-generation tasks with lower latency than conventional GPUs. If unveiled as expected, the processor would be the first major product to emerge from that acquisition and a signal that Nvidia intends to expand its silicon lineup beyond its GPU-centric systems.

According to people briefed on Nvidia's plans, the Groq-based processor will be designed to operate alongside the company's next-generation Vera Rubin GPU. Together, the two chips are expected to anchor a new platform aimed at handling an increasingly diverse mix of AI workloads across modern data centers.

The move comes as Nvidia's grip on the AI hardware market faces pressure from several directions. Large customers such as Google and Meta are investing heavily in their own custom chips, while startups are developing specialized processors aimed at reducing cost and power consumption. Earlier this week, Meta unveiled an internal lineup of four inference-focused chips, reinforcing the broader shift toward heterogeneous AI infrastructure.

For Nvidia, the challenge is both technical and logistical. The company's Blackwell and Rubin-class GPUs rely on high-bandwidth memory (HBM) to process the massive datasets behind modern AI models. But HBM has become scarce and increasingly expensive, as suppliers including SK Hynix and Micron struggle to keep pace with surging demand.

To work around that bottleneck, the new Groq-derived processor is expected to use static random-access memory (SRAM) – a faster, more accessible, though typically smaller-capacity form of memory. SRAM's low latency makes it particularly suited for inference based on reasoning rather than the bulk computations required for training.

Many analysts expect inference to dominate AI infrastructure spending within the next few years. Bank of America estimates that by 2030 – when the AI data-center market could reach roughly $1.2 trillion – inference will represent about 75% of total spending, up from roughly half in 2025. In a recent note, the firm said it expects Nvidia to unveil a "broadened AI portfolio," including an SRAM-based processor derived from its Groq acquisition.

The pivot toward inference hardware also addresses a practical constraint facing many enterprises deploying AI today. A large share of existing data centers were never designed to support the liquid-cooling systems required by Nvidia's newest high-performance GPUs.

"Many enterprises want to do inference using their existing data centers, but the vast majority of today's data centers… can't support the latest liquid-cooled GPUs," said June Paik, chief executive of AI chip startup FuriosaAI.

That flexibility could make Nvidia's new chips appealing to companies reluctant to rebuild facilities from the ground up. Ben Bajarin, analyst at Creative Strategies, said the market is moving toward a more diverse mix of AI accelerators.

"The future of the data center is not going to be a one-size-fits-all world," he said.