How to sell a CPU

Jay Goldberg

Posts: 75   +1
Staff
Editor's take: Much of the focus in semiconductors is on chip performance, and so for many outside the process it can be mystifying why sometimes a "better" chip loses out to a "weaker" chip. To name just one example, Intel still sells a lot of server CPUs despite their poor comparison with the latest AMD or Arm offerings.

Much of this comes down to the structure of the data center market, and it is much more complicated than many would think. This is important for any company looking to tap into this market, whether with a CPU or the latest AI accelerator.

The first issue is that the market for data center silicon is highly concentrated among ten customers – the "Super 7" – Amazon, Google, Facebook, Microsoft, Baidu, Alibaba, and Tencent, to which we would add Oracle, JD.com and Apple. These companies consume well above 50% of the industry's server-grade CPUs and over 70%-80% of other data center silicon segments.

Editor's Note:
Guest author Jonathan Goldberg is the founder of D2D Advisory, a multi-functional consulting firm. Jonathan has developed growth strategies and alliances for companies in the mobile, networking, gaming, and software industries.

Beyond these customers, the shift of enterprise IT to the cloud leaves a highly fragmented assortment of smaller customers – financial firms, research labs, a few oil and gas companies, and some of the smaller Internet companies.

For large, established semis companies, this is almost insurmountable. These companies have to target the biggest customers, anything below the top ten is too small to move the needle. Many startups in the space are looking to start with the smaller customers, who can provide sufficient revenue to keep the lights on and the VCs interested, but eventually they will need to break into the big leagues.

Those big customers are fully aware of their market position. Moreover, they are writing big checks.

Those big customers are fully aware of their market position. Moreover, they are writing big checks. So they make their suppliers run a gauntlet of qualification. This begins years before a chip is actually produced, as the chip designers seek input from their customers on chip specifications. How much and what type of memory will the customer use? How many I/O channels? etc. This is followed by models showing emulation of the chip design, typically running on FPGA boards. Once the design is finalized it is sent to the foundry for manufacture.

Then the real work begins.

The hyperscalers have rigorous testing processes in place, complete with their own set of confusing acronyms. Typically, this involves a handful of chips to play around with in the lab. This is followed by a few dozen – enough to build a working server rack. All of this just proves the chip performs as promised at the design stage.

The next step is to build a full-blown system – a few thousand chips. At this stage, the customers typically run their actual production software monitoring performance very closely. This step is particularly painful for the chip designers because they have no access to the customers' software and so have no way to test out the performance ahead of time.

Around this time customers also build out sophisticated total cost of ownership (TCO) models. These look at the total performance of the system versus the cost of not only the chips but the other elements of their servers as well – memory, power consumption, cooling needs, and more.

A difficult reality in this market is that while the main processor is the most important part of any server, it typically only comprises 20%-ish of the cost of that server. These models ultimately drive the customer's purchase decisions.

A difficult reality in this market is that while the main processor is the most important part of any server, it typically only comprises 20%-ish of the cost of that server.

While all this is going on, the chip company has to scramble. When the chip first comes back from the foundry, it may have bugs, and the manufacturing process needs to be tuned for better yield. So in the early days there are never enough chips to go around. Every customer wants to try them out forcing the chip designer to triage priorities and ration supply. When there are only a handful of customers this step carries considerable risk – no customer ever feels they have the supplier's full support.

Even as volumes increase, new problems arise. The customer does not want to buy chips, they want to buy complete systems. So the chip companies need to line up support from the ODM ecosystem.

Those companies have to produce their own set of designs – for the board and the entire rack – and these need to be evaluated too. This is a big part of Intel's staying power – every ODM is willing to do these designs for them as they likely do other (PC) business with Intel. Everyone else has to struggle with smaller ODMs of the big ODM's "B" Team of designers.

From first pen-to-paper to first sizable purchase order the whole process can take three to four years. Not as painful as automotive design cycles, but in many regards even more challenging.

Earlier this week, we were discussing the news that Ampere is selling a Developer Kit version of its latest chips. While Ampere is still tiny relative to Intel, they have been doing this for long enough to have some experience navigating all the steps above.

Those developer kits are a clever way to broaden their market. Ampere is small enough that smaller customers still matter to them. However, they are not yet big enough to provide full sales support to those customers. The developer kit broadens the top of their sales funnel by letting curious engineers participate in the first two steps of the evaluation process.

None of this is easy, and these complexities all rest on top of the challenge of actually designing a chip.

Permalink to story.

 
Thanks for this interesting article shedding light on a process I didn't know much about. While this all rings true to me, I missed the part about how after a long and exacting evaluation process the "weaker" chip often comes out on top? The best I could guess is that it's something about the TCO models mentioned. Is the article saying that despite Intel's chips maybe having weaker performance, they include more TCO features (manageability, reliability, support processes, etc?) so that after taking those into account the price/performance is stronger?
 
None of this is easy, and these complexities all rest on top of the challenge of actually designing a chip.
Jay, you wouldn't believe all I would have to say if I had even a rudimentary understanding of this stuff. :confused:

Seriously, even at that, it was a fascinating read, mostly because I had no idea that so much was involved AFTER the design and construct process.

(y) (Y)
 
The reason why the biggest stay the biggest. When you have such committment ultimate performance just isn't the point, guarantees of delivering and supporting the process matter more.

If your latest product is mediocre but you have proven you usually deliver a workable solution for decades then you still get the nod.
 
I've still been trying to reconcile all this information, when finally I remembered that AMD only re-emerged as a relevant CPU maker around 2018. If the procurement process is really years long as described in this article, maybe that's the reason their regained credibility hasn't reflected in more purchases yet. Still I'd expect we must be getting to the point where they are re-qualifying as a potential vendor.

The article describes the immense purchasing power of the big 7, but if they are truly only willing to consider buying from one pre-determined vendor, that purchasing power is illusory. And if it was all pre-determined, why go to the expense of an extensive evaluation process? And surely there's no guarantee of delivery & support that Intel is offering that AMD is not willing to match?
 
Azures V5 VM's (well at least the new D series) appear to be running on AMD Epyc CPU's.

It really does take years for this stuff to come through.
 
The reason why the biggest stay the biggest. When you have such committment ultimate performance just isn't the point, guarantees of delivering and supporting the process matter more.

If your latest product is mediocre but you have proven you usually deliver a workable solution for decades then you still get the nod.
Absolutely this. 80% of possible performance being available a guaranteed 100% on time and supported 100% of the time will always beat 100% performance which is only 75% assured to deliver on time or to be supported 80% of the time. TIME IS MONEY for business, and if you lose time on task to deliver the service, you don't stay top dog. Big business will not give anyone a second chance.
 
It all makes sense - but shake ups can happen - USA car manufacturers used to give buyers a 1001 options - that is gone by the way.

Where new players may get a look in is new ways of using CPUs- plus University , Govt , Defense - you want a super computer , the best chatbot etc.

Some of these small players who work with AMD , Nvidia whoever - may become a big player - if they can bring services to you and me - mid journey , chatbot, analysis , game design and creation, thought to foreign speech ??.
Problem is the big guys will seek to buy them out.

The other way is to force change is probably power efficiency - a system that is 25% more power efficient is extremely tempting - as one of main costs - given servers use will probably hit 2% and keep going up of world power
 

I've still been trying to reconcile all this information, when finally I remembered that AMD only re-emerged as a relevant CPU maker around 2018. If the procurement process is really years long as described in this article, maybe that's the reason their regained credibility hasn't reflected in more purchases yet. Still I'd expect we must be getting to the point where they are re-qualifying as a potential vendor.

The article describes the immense purchasing power of the big 7, but if they are truly only willing to consider buying from one pre-determined vendor, that purchasing power is illusory. And if it was all pre-determined, why go to the expense of an extensive evaluation process? And surely there's no guarantee of delivery & support that Intel is offering that AMD is not willing to match?

I wonder if it’s picking which CPUs, and how many, within one vendor’s stack that the cloud provider will pick. So if it’s too inefficient or too expensive or too niche, they’ll cut orders, wait for the next generation, buy cheaper SKUs, request cheaper pricing, ask for longer support contracts, etc. It could all still be within the same vendor.

Still, I think it’s finally changing. Apparently Microsoft cancelled up to 75% of its Sapphire Rapids orders and is moving that volume to Ampere in early 2024, according to some rumors this past week.

Instead of toying with the same vendor, they’re moving shop.
 
Back