Not just the hardware: How deep is Nvidia's software moat?

Jay Goldberg

Posts: 75   +1
Staff
The big picture: Starting tomorrow, Nvidia is hosting its GTC developer conference. Once a sideshow for semis, the event has transformed into the center of attention for much of the industry. With Nvidia's rise, many have been asking the extent to which Nvidia's software provides a durable competitive moat for its hardware. As we have been getting a lot of questions about that, we want to lay out our thoughts here.

Beyond the potential announcement of the next-gen B200 GPU, GTC is not really an event about chips, GTC is a show for developers. This is Nvidia's flagship event for building the software ecosystem around CUDA and the other pieces of it's software stack.

It is important to note that when talking about Nvidia many people, ourselves included, tend to use "CUDA" as shorthand for all the software that Nvidia provides. This is misleading as Nvidia's software moat is more than just the CUDA development layer, and this is going to be critical for Nvidia in defending its position.

Editor's Note:
Guest author Jonathan Goldberg is the founder of D2D Advisory, a multi-functional consulting firm. Jonathan has developed growth strategies and alliances for companies in the mobile, networking, gaming, and software industries.

At last year's GTC, the company put out 37 press releases, featuring a dizzying array of partners, software libraries and models. We expect more of this next week as Nvidia bulks up its defenses.

These partners are important because there are now hundreds of companies and millions of developers building tools on top of Nvidia's offerings. Once built, those people are unlikely to rebuild their models and applications to run on other company's chips, at least any time soon. It is worth noting that Nvidia's partners and customers span dozens of industry verticals, and while not all of those are going all-in on Nvidia, it still demonstrates immense momentum in Nvidia's favor.

Put simply the defensibility of Nvidia's position right now rests on the inherent inertia of software ecosystems. Companies invest in software – writing the code, testing it, optimizing it, educating their workforce on its use, etc. – and once that investment is made they are going to be deeply reluctant to switch.

We saw this with the Arm ecosystem's attempt to move into the data center over the last ten years. Even as Arm-based chips started to demonstrate real power and performance advantages over x86, it still took years for the software companies and their customers to move, a transition that is still underway. Nvidia appears to be in early days of building up exactly that form of software advantage. And if they can achieve it across a wide swathe of enterprises, they are likely to hold onto for many years. This more than anything else is what positions Nvidia best for the future.

Nvidia has formidable barriers to entry in its software. CUDA is a big part of that, but even if alternatives to CUDA emerge, the way in which Nvidia is providing software and libraries to so many points to them building a very defensible ecosystem.

We point all this out because we are starting to see alternatives to CUDA emerge. AMD has made a lot of progress with its answer to CUDA, ROCm. However, when we say progress, we mean they now have a good, workable platform, but it will take years for it to gain even a share of the adoption of CUDA. ROCm is only available on a small number of AMD products today, while CUDA has worked on all Nvidia GPUs for years.

Other alternatives like UXL or varying combinations of PyTorch and Triton, are similarly interesting but also in early days. UXL in particular looks promising, as it is backed by a group of some of the biggest names in the industry. Of course, that is also its greatest weakness, as those members have highly divergent interests.

We would argue that little of this will matter if Nvidia can get entrenched. And here is where we need to distinguish between CUDA and the Nvidia software ecosystem. The industry will come up with alternatives to CUDA, but that does not mean they can completely erase Nvidia's software barriers to entry.

Also read: Goodbye to Graphics: How GPUs Came to Dominate AI and Compute – No Longer "Just" a Graphics Card

That being said, the biggest threat to Nvidia's software moat is its largest customers. The hyperscalers have no interest in being locked into Nvidia in any way, and they have the resources to build alternatives. To be fair, they are not immune to staying close to Nvidia, it remains the default solution and still has many advantages, but if anyone puts a dent in Nvidia's software ambitions, it is most likely to be from this corner.

And that, of course, opens up the question as to what exactly Nvidia's software ambitions are.

In past years, as Nvidia launched its software offerings, up to and including its cloud service Omniverse, they have conveyed a sense that they had ambitions to create a new component of their revenue stream. On their latest earnings call, they pointed out that they had generated $1 billion in software revenue. However, more recently, we have gotten the sense that they may be repositioning or scaling back those ambitions a bit, with software now positioned as a service they provide to their chip customers rather than a full-blown revenue segment in its own right.

After all, selling software risks putting Nvidia in direct competition with all its biggest customers.

Permalink to story:

 
"That being said, the biggest threat to Nvidia's software moat is its largest customers. The hyperscalers have no interest in being locked into Nvidia in any way, and they have the resources to build alternatives."

Apart from the comparison with ARM vs x86 which is in no way equivalent, that was a good bit of truth.
 
Blackwell + Nvidia Software Stack = x10 over AMD in most tasks that AI companies actually do.

People that think AMD is even close when it comes to AI, don't know what these companies actually want.
 
That's also why android phones are locked into arm cpus.

Apart from the comparison with ARM vs x86 which is in no way equivalent, that was a good bit of truth.

ARM still have efforts into server cpus agains x86, why that's not equivalent?

 
That's also why android phones are locked into arm cpus.



ARM still have efforts into server cpus agains x86, why that's not equivalent?
X86 has accumulated decades and decades of legacy software, as well as being widely used. AI is a recent phenomenon and there aren't millions of applications based on AI, the barrier is much lower.
 
Blackwell + Nvidia Software Stack = x10 over AMD in most tasks that AI companies actually do.

People that think AMD is even close when it comes to AI, don't know what these companies actually want.

These companies don't like vendor lock-in. It's a risk in many ways (cost, support...) They already feel pain because Nvidia has a backlog on orders.
They will surely try AMD MI300(X), especially if AMD offers them advantages in price, VRAM capacity, open-source support... Over time, the risk for Nvidia is that these companies get used to working with AMD, and that their usage is also boosting the software-ecosystem that supports AMD. (Hence the new terms and conditions for CUDA.)
 
Thank you for the excellent analysis! I used to love reading about gaming GPUs & drivers, and now I find the AI GPUs and their software to be just as interesting.

I foresee the "hyperscalers" that are a threat to Nvidia approach AMD & Intel to accelerate their options...they certainly have the resources to make that happen. I'm looking @ you Google. It's still early innings in this AI race, and we all saw what Intel did to its customers and then became complacent. Those days are gone.
 
These companies don't like vendor lock-in. It's a risk in many ways (cost, support...) They already feel pain because Nvidia has a backlog on orders.
They will surely try AMD MI300(X), especially if AMD offers them advantages in price, VRAM capacity, open-source support... Over time, the risk for Nvidia is that these companies get used to working with AMD, and that their usage is also boosting the software-ecosystem that supports AMD. (Hence the new terms and conditions for CUDA.)
Nvidia software stack is too powerful, do you have experience with AI at all? They deliver a full suite. MI300 is just a GPU, without proper software behind it, it means nothing for most AI companies.

MI300 series sales numbers are miniscule compared to Nvidias for a reason.

H100 is used by 99% of big AI compaines and Blackwell is a slot in replacement meaning AMD won't be relevant this gen.

AMD loves to deliver cherrypicked marketing slides for MI300 but in reality, they have pretty much nothing relevant, when you look at the performance in workloads AI companies ACTUALLY do.

"NVIDIA has launched a new generation of AI GPUs called Blackwell. The top of the line B200 GPU superchip boasts a remarkable 20 petaflops of FP4 power or about 30 times faster than the H100 and about 25 times less power hungry"
 
Last edited:
Nvidia software stack is too powerful, do you have experience with AI at all? They deliver a full suite. MI300 is just a GPU, without proper software behind it, it means nothing for most AI companies.

MI300 series sales numbers are miniscule compared to Nvidias for a reason.

H100 is used by 99% of big AI compaines and Blackwell is a slot in replacement meaning AMD won't be relevant this gen.

AMD loves to deliver cherrypicked marketing slides for MI300 but in reality, they have pretty much nothing relevant, when you look at the performance in workloads AI companies ACTUALLY do.

"NVIDIA has launched a new generation of AI GPUs called Blackwell. The top of the line B200 GPU superchip boasts a remarkable 20 petaflops of FP4 power or about 30 times faster than the H100 and about 25 times less power hungry"

Yes, AMD is far behind. And it knows it.
But you can be sure that companies will try AMD's products. When Nvidia cannot deliver, companies will want an alternative to create and offer their AI solutions to their own customer.
 
Yes, AMD is far behind. And it knows it.
But you can be sure that companies will try AMD's products. When Nvidia cannot deliver, companies will want an alternative to create and offer their AI solutions to their own customer.
Nvidia can deliver and they are selling AI GPUs left and right, even downscaled gaming production to ramp up AI production to meet demand.

That is why their market cap exploded recently and AMD has nothing in the same league as Nvidia, which has best AI chips and a software stack to support them, allowing for massive gains over whatever AMD has.

Nvidia steals the show for a reason. They had a massive headstart with billions coming in from AI already, going directly to R&D making them release even better products, faster than AMD will be able to follow.

AMD is a CPU company first and foremost, not a GPU company like Nvidia. AMD won't spend their entire R&D budget chasing the AI market, loosing CPU market in the process. Because they can't afford to do both.
 
Last edited:
Back