Amazon is ditching Nvidia GPUs in favor of their own silicon

mongeese

Posts: 643   +123
Staff
What just happened? Amazon has announced that they're migrating their artificial intelligence processing to custom AWS Inferentia chips. This means that Amazon's biggest inferencing services, like virtual assistant Alexa, will be processed on faster, specialized silicon instead of somewhat multi-purpose GPUs.

Amazon has already shifted about 80% of Alexa processing onto Elastic Compute Cloud (EC2) Inf1 instances, which use the new AWS Inferentia chips. Compared to the G4 instances, which used traditional GPUs, the Inf1 instances push throughput up by 30% and costs down by 45%. Amazon reckons that they're the best instances on the market for inferencing natural language and voice processing workloads.

Alexa works like this: the actual speaker box (or cylinder, as it may be) does basically nothing, while AWS processors in the cloud do everything. Or to put it more technically... the system kicks in once the wake word has been detected by the Echo's on-device chip. It starts streaming the audio to the cloud in real-time. Off in a data center somewhere, the audio is turned into text (this is an example of inferencing). Then, meaning is withdrawn from the text (another example of inferencing). Any required actions are completed, like pulling up the day's weather information.

Once Alexa has completed your request, she needs to communicate the answer to you. What she's supposed to say is chosen from a modular script. Then the script is turned into an audio file (another example of inferencing) and sent to your Echo device. The Echo plays the file and you decide to bring an umbrella to work with you.

Evidently enough, inferencing is a big part of the job. It's unsurprising that Amazon has invested millions of dollars into making the perfect inferencing chips.

Speaking of, the Inferentia chips are comprised of four NeuronCores. Each one implements a "high-performance systolic array matrix multiply engine." More or less, each NeuronCore is made up of a very large number of small data processing units (DPUs) that process data in a linear, independent fashion. Each Inferentia chip also has a huge cache, which improves latencies.

Permalink to story.

 
Watch Nvidia showcase how their products beat amazon's own alternative lol

Is not nearly that simple. The sheer size of AWS means that Nvidia would be losing a big chunk of change in GPU sales, so they ideally want to turn to either Google Cloud or Microsoft Azure to reinforce commitments.

Problem is Nvidia has a nasty habit of being difficult to work with and wanting those kinds of long term commitments whereas the only way to drive cost down for mega cloud providers like AWS is to just use customized solution.

Something AMD has been working on at least for the consoles but they show they are willing to do custom solutions so if the other big cloud providers don't want to get into the business of building their own silicon from scratch, AMD is the best next thing while Nvidia demands too much of a commitment for a product that is not highly customizable and not playing nice with protocols aiming to lock tremendously vast data centers into their proprietary tech which they might not want to commit to ideally.
 
Compared to the G4 instances, which used traditional GPUs, the Inf1 instances push throughput up by 30% and costs down by 45%
I'm guessing, those numbers are based on some old system from nVidia, and not the latest DGX A100, which put a whole new spin on performance/vs price ratio. This makes the whole endeavor from Amazon both odd and dubious.



 
Considering this is just inference, it's not surprising. Why waste a whole GPU when all you need is a tiny tpu to run the inference. I doubt you can do any training on those instances. Similar to google coral.
 
Is not nearly that simple. The sheer size of AWS means that Nvidia would be losing a big chunk of change in GPU sales, so they ideally want to turn to either Google Cloud or Microsoft Azure to reinforce commitments.

Problem is Nvidia has a nasty habit of being difficult to work with and wanting those kinds of long term commitments whereas the only way to drive cost down for mega cloud providers like AWS is to just use customized solution.

Something AMD has been working on at least for the consoles but they show they are willing to do custom solutions so if the other big cloud providers don't want to get into the business of building their own silicon from scratch, AMD is the best next thing while Nvidia demands too much of a commitment for a product that is not highly customizable and not playing nice with protocols aiming to lock tremendously vast data centers into their proprietary tech which they might not want to commit to ideally.

Except Nvidia aquiring ARM makes them capable of offering much more customizable solutions
problem for nvidia is that their GPUs are "too good" to be used for anything, just doing instancing is a waste of resources

Also just making the hardware is a part of the solution, nvidia offers a software stack second to none AMD has pretty much nothing in comparison, AMD needs to invest not just in hardware, but software too

 
Back