AMD CPUs and GPUs will power the future world's fastest supercomputer, 10x faster than...

William Gayde

TS Addict
Staff member

The new system will be maintained by the US Department of Energy's (DOE) National Nuclear Security Administration (NNSA). Its main purpose will be to help model how America's existing nuclear weapons stockpile is aging through simulations and artificial intelligence.

In addition to national security workloads, El Capitan will also target some other key areas. This includes a partnership with the National Cancer Institute and additional DOE labs to accelerate research towards cancer drugs and how certain proteins mutate. El Capitan will also be used in research to help fight climate change.

This system is a big win for both AMD and Hewlett Packard Enterprise (HPE), who designed the system. Supercomputers used to be dominated by Intel CPUs and Nvidia GPUs, but AMD's improvements in both sectors are starting to eat away at that.

El Capitan will use 4th generation EPYC CPUs, codenamed "Genoa," based on the Zen 4 architecture. On the GPU side, it will use Radeon Instinct cards with the 3rd generation Infinity architecture.

The compute hardware will be implemented using Cray's Shasta system and Slingshot interconnect.

This features a 4:1 GPU to CPU ration with local flash storage for improved access speed. To help manage the massive heat generated by such a system, the blades are all individually water cooled. In addition to El Capitan, HPE and DOE are also working on two other exascale systems, Aurora and Frontier.

Permalink to story.

 

amstech

IT Overlord
Impressive!
I wouldnt consider a Radeon GPU but theres no doubt AMD is coming around after 15 years of being the little guy, nothing lasts forever. Nvidia still has the best GPUs and there is no close 2nd, but hopefully AMD improves here because Nvidias pricing is just assinine.
 
  • Like
Reactions: TYguys

wiyosaya

TS Evangelist
Impressive!
I wouldnt consider a Radeon GPU but theres no doubt AMD is coming around after 15 years of being the little guy, nothing lasts forever. Nvidia still has the best GPUs and there is no close 2nd, but hopefully AMD improves here because Nvidias pricing is just assinine.
nVidia, currently, has the best GPUs for gaming, however, if you look at BOINC projects that have a GPU version and have versions for both AMD and nVidia GPUs, AMD GPUs process work units up to 10 times as fast as nVidia - especially in consumer cards. I bet that holds true with the professional cards, too.

GPU compute is/has been AMD's world for a long time, and my bet is that is part of what drove the decision to use AMD GPUs. The compute power of those GPUs is probably a big factor in the compute power of the system.

I don't hold out much hope for price reductions on GPUs. If AMD should take the lead - however unlikely that may be - the bar that nVidia has set in price will certainly either be met or exceeded by AMD, IMO. Take the price of the TR 3990 for example; IMO, we have Intel to thank for that.
 
  • Like
Reactions: Reehahs and Tams80

hahahanoobs

TS Evangelist
AMD GPUs have been far ahead of nVidia GPUs in compute for years.

I wonder what new and exciting developments we will see from AMD with the revenue from such wins.

EDIT: I almost forgot - will it play Crysis? 🙄
“GPUs now power five out of the world’s seven fastest systems as well as 17 of the 20 most energy efficient systems on the new Green500 list,” the company remarked, adding that the “majority of computing performance added to the Top500 list comes from Nvidia GPUs.”

The latest Top500 report includes 110 systems with some manner of accelerator and/or co-processor technology, up from 101 six months ago. 98 are equipped with Nvidia chips, seven systems utilize Intel Xeon Phi (coprocessor) technology and four are using PEZY technology. Two systems (ranked 52 and 252) employ a combination of Nvidia and Intel Xeon Phi accelerators/coprocessors. The newly upgraded Tianhe-2a (now in fourth position with 61.44 petaflops up from 33.86 petaflops), installed at the National Super Computer Center in Guangzhou, employs custom-built Matrox-2000 accelerators. 19 systems now use Xeon Phi as the main processing unit.
 
  • Like
Reactions: wiyosaya

wiyosaya

TS Evangelist
“GPUs now power five out of the world’s seven fastest systems as well as 17 of the 20 most energy efficient systems on the new Green500 list,” the company remarked, adding that the “majority of computing performance added to the Top500 list comes from Nvidia GPUs.”

The latest Top500 report includes 110 systems with some manner of accelerator and/or co-processor technology, up from 101 six months ago. 98 are equipped with Nvidia chips, seven systems utilize Intel Xeon Phi (coprocessor) technology and four are using PEZY technology. Two systems (ranked 52 and 252) employ a combination of Nvidia and Intel Xeon Phi accelerators/coprocessors. The newly upgraded Tianhe-2a (now in fourth position with 61.44 petaflops up from 33.86 petaflops), installed at the National Super Computer Center in Guangzhou, employs custom-built Matrox-2000 accelerators. 19 systems now use Xeon Phi as the main processing unit.
Interesting.

I speak from the fact that I run nVidia GPUs mainly for GPU Grid. I've surveyed other BOINC projects and the result times were in that range in my survey. Anyway, I'm just some joe internet guy but this speaks volumes - https://www.anandtech.com/show/15422/the-amd-radeon-rx-5600-xt-review/14 though they are just consumer GPUs.

Your post speaks to the fact that systems have chosen nVidia GPUs in a majority of the instances. There had to have been a very good reason for choosing AMD for this machine, and I doubt that they would have chosen an inferior GPU for pricing reasons alone.
 
  • Like
Reactions: Reehahs

deksman2

TS Member
Impressive!
I wouldnt consider a Radeon GPU but theres no doubt AMD is coming around after 15 years of being the little guy, nothing lasts forever. Nvidia still has the best GPUs and there is no close 2nd, but hopefully AMD improves here because Nvidias pricing is just assinine.
As others mentioned, AMD's compute capabilities far exceed Nv's, and therefore, AMD IS the better choice here.

Also, NV doesn't have the best GPU's... they have a lead in gaming performance at 2070 Super and above, however, up to that tier, AMD is holding their ground.
RTX is not really an 'advantage'... its a proverbial gimmick that one barely notices anyway due to how fast paced action in games which support RTX usually is... and it reduces performance by a large amount (for a minimum of difference in visuals).

Compute and gaming are two different things.
Much larger/faster compute hw is one of the reasons why GCN is more power hungry as a consumer card (well, that/compute along with AMD's aggressive overvolting from factory to improve the number of functional dies).

I think you may be wondering about AMD improving on the gaming side.... but their GPU's are more than adequate for that too (even for 2k) if you don't need a top end GPU (which most people don't).
RDNA 2 will be out later this year, so it will be interesting to see what AMD does with it.

 

hahahanoobs

TS Evangelist
Interesting.

I speak from the fact that I run nVidia GPUs mainly for GPU Grid. I've surveyed other BOINC projects and the result times were in that range in my survey. Anyway, I'm just some joe internet guy but this speaks volumes - https://www.anandtech.com/show/15422/the-amd-radeon-rx-5600-xt-review/14 though they are just consumer GPUs.

Your post speaks to the fact that systems have chosen nVidia GPUs in a majority of the instances. There had to have been a very good reason for choosing AMD for this machine, and I doubt that they would have chosen an inferior GPU for pricing reasons alone.
Supercomputers have specific workloads, yes. In this case, AMD hardware is required. But as you can see, NVIDIA and Intel are dominating.
 

grumblguts

TS Maniac
I love the fact Nvidia has no competition at the very top.
It means our enemies have to pay out the nose for a tiny 20% performance increase.
They all do the hating wile we do the laughing.
 
Supercomputers have specific workloads, yes. In this case, AMD hardware is required. But as you can see, NVIDIA and Intel are dominating.
Absolutely - on systems commissioned a few years ago. No sane person would have built a super computer based on AMD CPU prior to Ryzen (excluding the original Opteron here).

Since Epyc established itself otoh, there don‘t seem to be (m)any Xeon based super computer announcements.
 

Puiu

TS Evangelist
AMD GPUs have been far ahead of nVidia GPUs in compute for years.

I wonder what new and exciting developments we will see from AMD with the revenue from such wins.

EDIT: I almost forgot - will it play Crysis? 🙄
For some raw compute workloads AMD is definitely ahead, but Nvidia is no slouch either. They have a ton of workloads where they are leaders (for example machine learning is one of them)
 
  • Like
Reactions: wiyosaya

hahahanoobs

TS Evangelist
Absolutely - on systems commissioned a few years ago. No sane person would have built a super computer based on AMD CPU prior to Ryzen (excluding the original Opteron here).

Since Epyc established itself otoh, there don‘t seem to be (m)any Xeon based super computer announcements.
But there is desire for Xeon 14nm and 10nm orders if you look. Also, Xeon orders are likely to be requested because they are the best for the desired job. AMD doesn't have the one chip that rules them all. Intel and Nvidia are still the heavyweights in supercomputing and HPC. Even Xe is getting a mention. AMD has many roadblocks ahead.

Your first paragraph - That's obvious.
Your second paragraph - With new players of course others will lose out.

Neither paragraph contained anything that added to the conversation. Your comments are basic, but if you have some relevant links you think I should look at, I'd be glad to see them.
 
  • Like
Reactions: CharmsD

wiyosaya

TS Evangelist
Supercomputers have specific workloads, yes. In this case, AMD hardware is required. But as you can see, NVIDIA and Intel are dominating.
I agree about the workloads. It would be interesting to hear why they chose AMD hardware.
Absolutely - on systems commissioned a few years ago. No sane person would have built a super computer based on AMD CPU prior to Ryzen (excluding the original Opteron here).

Since Epyc established itself otoh, there don‘t seem to be (m)any Xeon based super computer announcements.
Agreed. No supercomputer designer in their right mind would have specified any AMD CPU between Opteron and and Epyc. The one thing that I can think of where AMD is ahead of Intel is the fact that Epyc has 128 lanes of PCI-e available for use. That means perhaps as many as 6, GPUs running 16 PCI-e lanes and some left over for other peripherals per Epyc CPU.
 

Ravalo

TS Addict
AMD GPUs have been far ahead of nVidia GPUs in compute for years.
Except for the radeon VII, which matched a 10-20k$ card in compute for just 700$

edit: I thought he meant far behind lel
yeah I have 10 iq sorry
still, the Radeon VII was a misunderstood and misadvertised card at the time which could’ve been marketed for content creation and professional workloads instead of being marketed as a gaming card
 
Last edited:

hahahanoobs

TS Evangelist
I agree about the workloads. It would be interesting to hear why they chose AMD hardware.

Agreed. No supercomputer designer in their right mind would have specified any AMD CPU between Opteron and and Epyc. The one thing that I can think of where AMD is ahead of Intel is the fact that Epyc has 128 lanes of PCI-e available for use. That means perhaps as many as 6, GPUs running 16 PCI-e lanes and some left over for other peripherals per Epyc CPU.
Surprisingly to me price/performance was a deciding factor on this computer and another I forget the name of.
 
Last edited:
  • Like
Reactions: wiyosaya
I agree about the workloads. It would be interesting to hear why they chose AMD hardware.
This article on next platform has good information:
Lawrence Livermore To Surpass 2 Exaflops With AMD Compute

To quote from the article:

“Our workloads are primarily not deep learning models, although we are exploring something we call cognitive simulation, which brings deep learning and other AI models to bear on our workloads by evaluating how they can accelerate our simulations and how they can also improve their accuracy and find where they actually work,” explained de Supinski. “And so for that, we see this system as providing some significant benefits because of those operations. But I think it’s important to understand that that the primary goal of this system is large scale physics simulation and not deep learning.”
Agreed. No supercomputer designer in their right mind would have specified any AMD CPU between Opteron and and Epyc. The one thing that I can think of where AMD is ahead of Intel is the fact that Epyc has 128 lanes of PCI-e available for use. That means perhaps as many as 6, GPUs running 16 PCI-e lanes and some left over for other peripherals per Epyc CPU.
Current CPU have more PCIe lanes, and 4.0 (I.e. twice the bandwith) at that. If this is still the case in the time frame this system is planned for is questionable.

One advantage is certainly higher density in single socket systems. Intel is nowhere near that and by the time they have their own "glue", AMD has already had experience with it for several years.

Power consumption is next - it can be argued that AMD may still hold this advantage in the future.

Another quote from the article to put the cost of power consumption in perspective:

It costs roughly $1 per watt per year to power a supercomputer in the urban areas where they tend to be installed. So that is $50 million over five years for that incremental 10 megawatts of juice
While gamers may not care about power consumption (does not really matter from a financial perspective for one PC), this is quite different for super computers.

So if you need half the power for the same performance, you can save a lot, I.e. a 30MW system will save $150 Million over five years vs. a 60 MW system.

Another advantage may be nUMA between the CPU and GPU. AMD exlored this in the past but this time it may work due to the better interconnect.
 
  • Like
Reactions: TempleOrion

wiyosaya

TS Evangelist
This article on next platform has good information:
Lawrence Livermore To Surpass 2 Exaflops With AMD Compute

To quote from the article:





Current CPU have more PCIe lanes, and 4.0 (I.e. twice the bandwith) at that. If this is still the case in the time frame this system is planned for is questionable.

One advantage is certainly higher density in single socket systems. Intel is nowhere near that and by the time they have their own "glue", AMD has already had experience with it for several years.

Power consumption is next - it can be argued that AMD may still hold this advantage in the future.

Another quote from the article to put the cost of power consumption in perspective:



While gamers may not care about power consumption (does not really matter from a financial perspective for one PC), this is quite different for super computers.

So if you need half the power for the same performance, you can save a lot, I.e. a 30MW system will save $150 Million over five years vs. a 60 MW system.

Another advantage may be nUMA between the CPU and GPU. AMD exlored this in the past but this time it may work due to the better interconnect.
All very interesting.

This article seems to think that AMD is hinting that Genoa will have PCI-e 5.0 https://www.truecosmos.com/amd-epyc-genoa-ddr5-memory-pcie-5-0-protocol/
 

JimboJoneson

TS Maniac
Yeah, sadly it reached EOL because of low sales and low supply

It didn't do everything well, but quite excelled in certain workloads: And their Instinct cards do better yet at some of these ... So yeah ... if one is building a supercomputer and needs a lot of very high precision compute ... well, the last two graphs paint an interesting picture ...








Just look at the FP64 compute compared to 2080ti ... makes the ti look like a little kids toy ... eh, quantum physics ;)? And there's the $5,000 quadro P6000 down there near the bottom ...





I wonder what Arcturus will be like at compute?
 
Last edited:

wiyosaya

TS Evangelist
It didn't do everything well, but quite excelled in certain workloads: And their Instinct cards do better yet at some of these ... So yeah ... if one is building a supercomputer and needs a lot of very high precision compute ... well, the last two graphs paint an interesting picture ...








Just look at the FP64 compute compared to 2080ti ... makes the ti look like a little kids toy ... eh, quantum physics ;)? And there's the $5,000 quadro P6000 down there near the bottom ...





I wonder what Arcturus will be like at compute?
Thanks for the backup. 🥳
 

neeyik

TS Evangelist
Staff member
Just look at the FP64 compute compared to 2080ti ... makes the ti look like a little kids toy ... eh, quantum physics ;)? And there's the $5,000 quadro P6000 down there near the bottom ...
A slightly fairer comparison would be the Radeon VII vs the Titan V (3.36 TFLOPS @ 1.75GHz vs 7.45 TFLOPS @ 1.46 GHz), as both products are targeted at FP64 throughput. Not on price, of course, as the AMD card is roughly a third the cost of the Nvidia product!
 
Last edited: