Nvidia and Microsoft working to bring a GPU-based, AI supercomputer to the cloud

Alfonso Maruccia

Posts: 1,022   +301
Staff
Forward-looking: Nvidia and Microsoft are working on a virtual supercomputer with GPU-based Azure instances. The design goal is to accelerate the latest AI algorithms to create even more weirdly realistic artworks or conduct AI research.

Generative AI models prove helpful for many applications. Machine learning algorithms can create uncanny imagery or predict source code from the future, often negatively swaying public opinion with their "in-the-wrong-hands" capabilities. A new partnership between two of the biggest tech companies out there is promising to accelerate these capabilities, creating an "AI supercomputer" in the cloud.

Nvidia and Microsoft announced a "multi-year collaboration" to build the world's most powerful supercomputer specifically designed to serve as an accelerator for AI and machine learning algorithms. The partnership leverages Microsoft's cloud-based Azure platform against Nvidia's high-end GPU hardware. Several other components will accelerate the entire communication stack.

The hardware end includes "tens of thousands" of Nvidia A100 (Ampere-based) and H100 (Hopper) enterprise GPUs. The cloud-based infrastructure includes Microsoft's GPU-accelerated ND- and NC-series virtual machines. Quantum-2 400Gb/s InfiniBand networking tech and the company's AI Enterprise software suite bridge the communication device.

Essentially the new AI supercomputer design functions as a cloud service working with Azure instances. Nvidia clarified that customers could acquire resources "as they normally would with a real supercomputer" while the software layer reserves the required virtual machines.

Architecturally, it's the same as a physical supercomputer but operates on VMs "in the cloud." The most obvious advantage is that it does not require a dedicated (and massive) physical device in a research lab. Nevertheless, the provided capabilities will let enterprises scale virtual instances "all the way up to supercomputing status."

The virtual supercomputer's primary focus is on bringing improvements and advancements in generative AI models. This rapidly emerging area of AI research currently relies on models like Megatron Turing NLG 530B as a basis for "unsupervised, self-learning algorithms to create new text, code, digital images, video or audio."

Nvidia highlighted how AI technology advances are accelerating, industry adoption is growing, and the breakthroughs in the field have triggered a tidal wave of research, new startups, and enterprise applications. The partnership with Microsoft will provide customers and researchers with a state-of-the-art AI infrastructure and software to capitalize on the transformative power of AI.

Or, in Microsoft's own words, powerful AI capabilities "for every enterprise on Microsoft Azure."

Permalink to story.

 
Funny how they went with Nvidia when the top 5 supercomputers in the whole world are using AMD parts.

And knowing how nvidia screwed MS in the past, this tells me that MS is back into taking money from others to do projects like this, ignoring the risk of getting backstabbed again.

Oh well, I guess thats one way to become a trillion dollars company.
 
Well, let's hope that Microsludge is smart enough to provide their own power cables, especially after those RTX 4090 power cables ..... eh?
 
Funny how they went with Nvidia when the top 5 supercomputers in the whole world are using AMD parts.

And knowing how nvidia screwed MS in the past, this tells me that MS is back into taking money from others to do projects like this, ignoring the risk of getting backstabbed again.

Oh well, I guess thats one way to become a trillion dollars company.
I don't see why they wouldn't use EPYC CPU's in this, they probably just don't want to talk about it for PR purposes.
 
I don't see why they wouldn't use EPYC CPU's in this, they probably just don't want to talk about it for PR purposes.
Exactly and also adding Instinct GPU, which my understanding is has terrific performance.

Really estrange partnership, considering how AMD has behaved with MS and how nvidia screwed them.
 
Can you imagine the electric bill for that bad boy, especially if they use processors from the RTX-4090? The AI will probably come to the conclusion is that itself, is contributing to global warming.

Not to mention all the power connections melting at once, killing any humans in the server room with a release of cyanide gas. Still, we'll know more than we did before, which is "progress", (of a sort).

Relax kidz, it's for a "good cause". :rolleyes:
Oh well, I guess thats one way to become a trillion dollars company.

Couldn't they just buy Twitter instead :confused:
 
Can you imagine the electric bill for that bad boy, especially if they use processors from the RTX-4090?
It’s using A100s and H100s, 400W and 800W respectively (non-consumer version of Ampere and the separate Hopper architectures). Possibly not being run at full chat but given that there are thousands of them, it’ll be a few power stations worth of ‘leccy.
 
Can you imagine the electric bill for that bad boy, especially if they use processors from the RTX-4090? The AI will probably come to the conclusion is that itself, is contributing to global warming.

Not to mention all the power connections melting at once, killing any humans in the server room with a release of cyanide gas. Still, we'll know more than we did before, which is "progress", (of a sort).

Relax kidz, it's for a "good cause". :rolleyes:


Couldn't they just buy Twitter instead :confused:
If you read the review, it has record efficiency per watt, for performance.
 
If you read the review, it has record efficiency per watt, for performance.
So what? That just means they can or will, put more individual processors into it. The only calculation that's meaningful, is how much power it draws as a complete entity.

Granted, at say, 25% more efficiency you could have 1000 processors working on the same overall draw as 750 in the past. But again, so what, you'd still get the same electric bill for either machine..
 
So what? That just means they can or will, put more individual processors into it. The only calculation that's meaningful, is how much power it draws as a complete entity.

Granted, at say, 25% more efficiency you could have 1000 processors working on the same overall draw as 750 in the past. But again, so what, you'd still get the same electric bill for either machine..
I'm not going to argue with you - do a bit more research / review reading, and it will be obvious to you how efficient they are, compared to older GPUs at the same task.
 
I'm not going to argue with you
Nor did I invite you to do so.

My take (still) is, individual efficiency doesn't matter. Presumably, they'll use a higher density of individual processors. Which sort of makes sense since they're trying to outdo what they already have. Any energy savings will be at least partially or wholly offset by increased capacity..

I would encourage you to read @neeyik post #7
 
Back