Los Alamos' Nvidia-powered Venado supercomputer trades accuracy for efficiency with 10 exaFLOPS for AI workloads

Alfonso Maruccia

Posts: 1,029   +302
Staff
In a nutshell: The performance of supercomputers included in the TOP500 list is measured in floating point operations per second (FLOPS) with a specific level of precision. By lowering TOP500's 64-bit double-precision floating-point format (FP64) requirements, supercomputers can seemingly achieve even higher performance levels in specific AI-based workloads.

The Los Alamos National Laboratory has a new high-performance computing (HPC) system named Venado, a supercomputer specifically designed to accelerate AI algorithms and AI-based research programs. Venado is far from being the fastest supercomputer in the world, but it packs enough Nvidia chips for AI acceleration to provide the Department of Energy's laboratory with a powerful means to integrate artificial intelligence into basic research and the "advanced national security" of the US.

Venado has been built in partnership with Hewlett Packard Enterprise (HPE) and Nvidia, Los Alamos said in its official announcement. The new HPE Cray EX-based supercomputer boasts a housing capacity for 2,560 liquid-cooled GH200 Grace Hopper Superchips, Nvidia's latest solution for HPC systems. Furthermore, the supercomputer includes 920 Nvidia Grace CPU Superchips.

Venado is the first large-scale system with Nvidia Grace CPU superchips deployed in the US. Every Grace SoC packs 144 Arm computing cores, which according to the DOE's laboratory are good enough to deliver an "immediate" performance boost to different kinds of HPC applications.

The Venado supercomputer is seemingly capable of providing ten exaFLOPS of power for AI workloads, a truly spectacular achievement if we consider that Frontier, the world's first exascale supercomputer, is currently topping the TOP500 rankings with "just" 1,194 petaFLOPS of computing prowess. However, Venado's computing capabilities are rated at a quarter of the floating-point precision required for the TOP500 list (FP8 vs FP64).

Despite trading accuracy for efficiency, Venado should be perfect for running large language models (LLM) and other machine learning-based applications. Nvidia superchips can execute "millions more instructions" per second compared to previous chip technology, Los Alamos said, while having much lower costs and power consumption levels.

Venado has been described as a "compact" supercomputer based on HPE's Cray EX platform. The system is networked with HPE's "extremely high-speed" Slingshot 11 interconnects, while additional HPE Cray software provides optimized modeling and simulation workloads.

According to Ian Buck, Nvidia's vice-president of hyperscale and HPC, Venado exploits the company's Grace Hopper architecture to deliver "groundbreaking performance" and energy efficiency in cutting-edge scientific research. The supercomputer is expected to provide significant discoveries in material science, renewable energy, astrophysics, and elsewhere.

Permalink to story:

 
In a field that is often described as "hallucinating" I would like to know how accuracy is defined
 
Yes because the one constant and nearly universal criticism for AI is just how accurate it is already so let's make it less accurate.

But hey at least this kinda shows the cracks are already beginning to form: Once you try to balance reasonable concerns like 'Well if we don't run this "overseas" and we run this domestically we can't just make the entire power grid unreliable and/or accelerate our emissions by a triple digit percentage" you just might have a few companies consider whenever they truly need to get rid of all of their office employees instead of replacing them with the fever dreams of stolen data that's been through unnecessary ML models.
 
Yes because the one constant and nearly universal criticism for AI is just how accurate it is already so let's make it less accurate.

But hey at least this kinda shows the cracks are already beginning to form: Once you try to balance reasonable concerns like 'Well if we don't run this "overseas" and we run this domestically we can't just make the entire power grid unreliable and/or accelerate our emissions by a triple digit percentage" you just might have a few companies consider whenever they truly need to get rid of all of their office employees instead of replacing them with the fever dreams of stolen data that's been through unnecessary ML models.
If you consider how dumb the average person is and then consider that half the population is dumber than that, then AI isn't doing too bad relative to humans. These LLMs are useful for many things and they are improving quickly. That doesn't change that AI is often described by the experts as hallucinating so I want to know who is linking hallucinations with accuracy
 
Back