Nvidia turns to silicon photonics to supercharge next-gen AI clusters

Skye Jacobs

Posts: 1,985   +58
Staff
The big picture: As artificial intelligence clusters grow to thousands of accelerators working in unison, the challenge of connecting GPUs with ever-higher bandwidth has become one of the most pressing issues in data center design. Nvidia is responding by moving away from traditional electrical signaling and adopting silicon photonics, a shift the company argues is now essential for scaling large AI systems.

Earlier this year, the company confirmed that its next-generation rack-scale AI platforms will abandon pluggable optical modules in favor of co-packaged optics. At the Hot Chips conference, Nvidia shared new details about its upcoming photonic interconnect products – Quantum-X and Spectrum-X Photonics – scheduled for launch in 2026 for InfiniBand and Ethernet, respectively.

In today's large-scale AI training clusters, servers spanning multiple racks must operate as a single, tightly coupled system. To achieve this, operators have moved switches from the top of individual racks to the end of entire rows, increasing the distance between GPUs and their first switch. At speeds of 800 Gb/s, copper cables cannot span these distances reliably, making optical connections essential.

The current standard – pluggable optical transceivers – comes with significant drawbacks. Electrical signals must exit the switch ASIC, travel through long traces and multiple connectors, and only then be converted into light.

This process introduces about 22 decibels of signal loss on 200 Gb/s channels, requiring additional equalization circuitry. That compensation comes at a cost: power consumption rises to roughly 30 watts per port, creating more heat, increasing cooling requirements, and adding potential points of failure.

Co-packaged optics addresses these inefficiencies by integrating the optical engine directly with the switch ASIC. This design couples electrical signals into fiber almost immediately, reducing signal loss to around four decibels. As a result, per-port power consumption drops to just nine watts, while eliminating numerous external connectors and components altogether.

Nvidia cites four key advantages to this transition: power efficiency improves by 3.5×, signal integrity increases by 64×, resiliency improves tenfold, and system deployment accelerates by about 30 percent thanks to fewer components requiring assembly or maintenance.

The shift is built on TSMC's COUPE platform (Compact Universal Photonic Engine), which will roll out in three phases. The first generation introduces 1.6 Tb/s optical engines for OSFP connectors. The second phase migrates to TSMC's CoWoS packaging, enabling 6.4 Tb/s bandwidth at the board level. The final stage integrates photonics directly within processors, driving throughput toward 12.8 Tb/s while further cutting latency and power consumption.

Nvidia's roadmap brings co-packaged optics to both InfiniBand and Ethernet. On the InfiniBand side, the company plans to launch Quantum-X switches in early 2026. Each switch will deliver 115 Tb/s of aggregate throughput across 144 ports running at 800 Gb/s each.

The platform integrates an ASIC capable of 14.4 teraflops of in-network processing and supports the fourth generation of SHARP (Scalable Hierarchical Aggregation Reduction Protocol), a technology aimed at reducing communication latency in collective AI workloads. All Quantum-X switches will feature liquid cooling.

Later in 2026, Nvidia will debut Spectrum-X Photonics, extending CPO to Ethernet. Powered by the Spectrum-6 ASIC, the lineup will launch with two models: the SN6810, offering 102.4 Tb/s across 128 ports, and the SN6800, scaling up to 409.6 Tb/s and 512 ports. Like their InfiniBand counterparts, the Ethernet switches will also rely on liquid cooling.

By embedding optics at the silicon level, Nvidia aims to simplify data center build-outs for future generative AI systems. CPO-based networks eliminate thousands of discrete optical modules, speed up deployment, and improve long-term reliability. Nvidia positions these interconnects not as optional enhancements but as architectural necessities for scaling clusters into the tens of thousands of GPUs.

This strategy puts competitive pressure on rivals to accelerate their own optical networking roadmaps. Earlier this year, AMD acquired Enosemi, a photonics startup – a move widely interpreted as an effort to counter Nvidia's shift toward light-based communication.

Nvidia's approach is tightly linked to TSMC's progress. The first-generation COUPE platform stacks a 65nm electronic integrated circuit with a photonic integrated circuit using SoIC-X packaging. As TSMC advances through second- and third-generation iterations, Nvidia expects its platforms to track those improvements in both bandwidth and power efficiency.

Permalink to story:

 
Back