Amazon is ditching Nvidia GPUs in favor of their own silicon

mongeese · Nov 15, 2020

What just happened? Amazon has announced that they're migrating their artificial intelligence processing to custom AWS Inferentia chips. This means that Amazon's biggest inferencing services, like virtual assistant Alexa, will be processed on faster, specialized silicon instead of somewhat multi-purpose GPUs.

Amazon has already shifted about 80% of Alexa processing onto Elastic Compute Cloud (EC2) Inf1 instances, which use the new AWS Inferentia chips. Compared to the G4 instances, which used traditional GPUs, the Inf1 instances push throughput up by 30% and costs down by 45%. Amazon reckons that they're the best instances on the market for inferencing natural language and voice processing workloads.

Alexa works like this: the actual speaker box (or cylinder, as it may be) does basically nothing, while AWS processors in the cloud do everything. Or to put it more technically... the system kicks in once the wake word has been detected by the Echo's on-device chip. It starts streaming the audio to the cloud in real-time. Off in a data center somewhere, the audio is turned into text (this is an example of inferencing). Then, meaning is withdrawn from the text (another example of inferencing). Any required actions are completed, like pulling up the day's weather information.

Once Alexa has completed your request, she needs to communicate the answer to you. What she's supposed to say is chosen from a modular script. Then the script is turned into an audio file (another example of inferencing) and sent to your Echo device. The Echo plays the file and you decide to bring an umbrella to work with you.

Evidently enough, inferencing is a big part of the job. It's unsurprising that Amazon has invested millions of dollars into making the perfect inferencing chips.

Speaking of, the Inferentia chips are comprised of four NeuronCores. Each one implements a "high-performance systolic array matrix multiply engine." More or less, each NeuronCore is made up of a very large number of small data processing units (DPUs) that process data in a linear, independent fashion. Each Inferentia chip also has a huge cache, which improves latencies.

Permalink to story.

https://www.techspot.com/news/87603-amazon-ditching-nvidia-gpus-favor-their-own-silicon.html

brucek · Nov 15, 2020

They probably couldn't push F5 fast enough either.

NewSparrow · Nov 15, 2020

I wonder what Apple is doing with Siri, it seems Alexa is ahead in the voice assistance space.

neeyik · Nov 15, 2020

Amazon provided a closer look at their AWS Inferentia chips last year:

Bulllee · Nov 15, 2020

Another tech battle at the close of 2020...been some year!

franticfrosty · Nov 15, 2020

Watch Nvidia showcase how their products beat amazon's own alternative lol

Uncle Al · Nov 15, 2020

Gives me just another great reason to keep Alexa turned off .......

Dimitriid · Nov 15, 2020

franticfrosty said:
Watch Nvidia showcase how their products beat amazon's own alternative lol

Is not nearly that simple. The sheer size of AWS means that Nvidia would be losing a big chunk of change in GPU sales, so they ideally want to turn to either Google Cloud or Microsoft Azure to reinforce commitments.

Problem is Nvidia has a nasty habit of being difficult to work with and wanting those kinds of long term commitments whereas the only way to drive cost down for mega cloud providers like AWS is to just use customized solution.

Something AMD has been working on at least for the consoles but they show they are willing to do custom solutions so if the other big cloud providers don't want to get into the business of building their own silicon from scratch, AMD is the best next thing while Nvidia demands too much of a commitment for a product that is not highly customizable and not playing nice with protocols aiming to lock tremendously vast data centers into their proprietary tech which they might not want to commit to ideally.

OortCloud · Nov 15, 2020

Uh oh - Cyberdyne systems here we come!

p51d007 · Nov 15, 2020

Sorry...other than my phone "hey google"...I have NO smart speakers, appliances etc.
Hell, Google, Amazon, CIA already know enough

VitalyT · Nov 15, 2020

Compared to the G4 instances, which used traditional GPUs, the Inf1 instances push throughput up by 30% and costs down by 45%

I'm guessing, those numbers are based on some old system from nVidia, and not the latest DGX A100, which put a whole new spin on performance/vs price ratio. This makes the whole endeavor from Amazon both odd and dubious.

AndyChow · Nov 16, 2020

Considering this is just inference, it's not surprising. Why waste a whole GPU when all you need is a tiny tpu to run the inference. I doubt you can do any training on those instances. Similar to google coral.

trparky · Nov 16, 2020

TD_Baker said:
I wonder what Apple is doing with Siri, it seems Alexa is ahead in the voice assistance space.

Yeah, but at what cost? The answer is... your privacy.

Stoly · Nov 16, 2020

p51d007 said:
Sorry...other than my phone "hey google"...I have NO smart speakers, appliances etc.
Hell, Google, Amazon, CIA already know enough

you'd like to think that ;-) ;-)

Stoly · Nov 16, 2020

Dimitriid said:
Is not nearly that simple. The sheer size of AWS means that Nvidia would be losing a big chunk of change in GPU sales, so they ideally want to turn to either Google Cloud or Microsoft Azure to reinforce commitments.

Problem is Nvidia has a nasty habit of being difficult to work with and wanting those kinds of long term commitments whereas the only way to drive cost down for mega cloud providers like AWS is to just use customized solution.

Something AMD has been working on at least for the consoles but they show they are willing to do custom solutions so if the other big cloud providers don't want to get into the business of building their own silicon from scratch, AMD is the best next thing while Nvidia demands too much of a commitment for a product that is not highly customizable and not playing nice with protocols aiming to lock tremendously vast data centers into their proprietary tech which they might not want to commit to ideally.

Except Nvidia aquiring ARM makes them capable of offering much more customizable solutions
problem for nvidia is that their GPUs are "too good" to be used for anything, just doing instancing is a waste of resources

Also just making the hardware is a part of the solution, nvidia offers a software stack second to none AMD has pretty much nothing in comparison, AMD needs to invest not just in hardware, but software too

Amazon is ditching Nvidia GPUs in favor of their own silicon

mongeese

Posts: 643 +123

brucek

Posts: 2,118 +3,409

NewSparrow

Posts: 56 +86

neeyik

Posts: 2,963 +3,645

Bulllee

Posts: 263 +176

franticfrosty

Posts: 111 +119

Uncle Al

Posts: 10,519 +10,149

Dimitriid

Posts: 2,644 +5,378

OortCloud

Posts: 1,536 +2,022

p51d007

Posts: 4,492 +4,559

VitalyT

Posts: 7,271 +8,937

AndyChow

trparky

Posts: 1,607 +1,994

Stoly

Posts: 125 +78

Stoly

Posts: 125 +78

Similar threads

Latest posts