TL;DR: Private companies are in desperate need for additional GPU computing capabilities to train their new generative AI services, yet they are encountering significant challenges in obtaining them. In contrast, researchers involved in US supercomputing projects currently have access to potent Nvidia GPU nodes at a substantial discount, albeit for a limited time.
Users collaborating with the National Energy Research Scientific Computing Center (NERSC) and the US Department of Energy can now execute their tasks on Perlmutter's GPU nodes for half the cost compared to what they would have paid just a few weeks ago. This special offer is available until the end of September. According to Rebecca Hartman-Baker from NERSC, it presents an excellent opportunity to commence compute-intensive research following the summer break.
Perlmutter serves as NERSC's primary high-performance computing system, comprising a supercomputer equipped with 3,072 AMD "Milan" EPYC CPUs and 1,792 NVIDIA A100 GPU-accelerated nodes. This system incorporates several technological innovations aimed at accelerating the scientific productivity of researchers who access NERSC services.
Importantly, it is exclusively designated for scientific research purposes. The name "Perlmutter" commemorates Saul Perlmutter, the distinguished US astrophysicist who led the team honored with the Nobel Prize for their groundbreaking evidence confirming the universe's accelerating expansion.
NERSC's 50 percent discount offer serves as an incentive for running scientific jobs on Perlmutter GPU nodes at the present moment, thereby circumventing the typical "end-of-the-year crunch" characterized by extended query times and delayed job turnaround. Hartman-Baker conveyed in an email announcement that utilizing the HPC system presently benefits the entire NERSC community, as it helps distribute computing demand more evenly across the year.
Any job or segment thereof running from September 6 to the very beginning of October 1 will incur only half the standard charges. Thanks to this limited-time discount rate, Hartman-Baker explained that a three-hour job on seven GPU nodes would only cost 10.5 GPU node-hours, whereas without the discount, the same job would have incurred a charge of 21 GPU node-hours.
NERSC is also offering additional assistance to users through Perlmutter GPU "Virtual Office Hours," providing support for getting started with the supercomputer GPU nodes, addressing concerns about insufficient allocation, poor performance, and more. Perlmutter's overall processing power in phase one, which concluded on May 27, 2022, reached 70.9 PFLOPS.
As highlighted by Microsoft HPC storage expert Glenn K. Lockwood, who was the first to report on NERSC's special offer, the AI industry is currently grappling with a "GPU crunch" that is unlikely to be resolved anytime soon. If the DoE were to lease their "idle" computing capacity for commercial workloads, the US government could potentially generate a substantial revenue stream.