The future of semiconductors is UCIe

Bob O'Donnell

Posts: 81   +1
Staff member
Editor's take: If you want to become a serious tech industry watcher or a hardcore tech enthusiast, then you need to start closely watching what's happening in the semiconductor industry. Not only are chips at the literal heart of all our tech devices, but they also power the software and experiences we've all become so reliant on. Most important of all, however, they are the leading-edge indicator of where important technology trends are headed, because chip designs, and the technologies that go into them, must be completed years ahead of products that use them and the software needed to leverage them.

With the above thought in mind, let me explain why a seemingly modest announcement about a new industry consortium and semiconductor industry standard, called Universal Chiplet Interconnect Express (or UCIe), is so incredibly important.

First, a bit more context. Over the last few years, there's been a great deal of debate and discussion about the ongoing viability of Moore's Law and the potential stalling of chip industry advancements. Remember that Intel co-founder Gordon Moore famously predicted just over 50 years ago that semiconductor performance would double roughly every 18-24 months and his prognostication has proven to be remarkably prescient. In fact, many have argued that the sum of Silicon Valley and the tech industry at large's incredible advances over the last half century have essentially been a "fulfillment" of that law.

Over the last few years, there's been a great deal of debate and discussion about the ongoing viability of Moore's Law and the potential stalling of chip industry advancements.

As the chipmaking process has advanced, however, the industry has started to face some potential physical limitations that seem very challenging to overcome. Individual transistors have become so small that they're approaching the size of individual atoms -- and you can't get any smaller than that. As a result, traditional efforts to improve performance by shrinking transistors and fitting more and more of them onto a single die is coming to an end. However, chip companies recognized these potential challenges years ago and started focusing on other ideas and chip design concepts to keep performance advancing at a Moore's Law-like rate.

Chief among these are ideas around breaking up large monolithic chips into smaller components, or chiplets, and combining these in clever ways. This has led to a number of important advancements in chip architectures, chip packaging, and the interconnections between a number of components.

Just over 10 years ago, for example, Arm introduced the idea of big.LITTLE, which consisted of multiple CPU cores of different sizes connected together to get high-quality performance but at significantly reduced power levels. Since then, we've seen virtually every chip company leverage the concept with Intel's new P and E cores in 12th-gen CPUs being the most recent example.

The rise of multi-part SoCs, where multiple different elements, such as CPUs, GPUs, ISPs (image signal processors), modems, etc. are all combined onto a single chip -- such as what Qualcomm does with its popular Snapdragon line -- is another development from the disaggregation of large, single die chips. The connections between these chiplets have also seen important advances.

When AMD first introduced Ryzen CPUs back in 2017, for example, one of the unique characteristics of the design was the use of a high-speed Infinity Fabric to connect several equal-sized CPU cores together so that they could function more efficiently.

"Want to mix an Intel CPU with an AMD GPU, a Qualcomm modem, a Google TPU AI accelerator and a Microsoft Pluton security processor onto a single chip package, or system on package (SOP)?"

With a few exceptions, most of these packaging and interconnect capabilities were limited to a company's own products, meaning it could only mix and match various components of its own. Recognizing that the ability to combine components from different vendors could be useful -- particularly in high-performance server applications -- led to the creation of the Compute Express Link standard. CXL, which is just starting to be used in real-world products, is ideally optimized to do things like interconnect specialized accelerators, like AI processors, with CPUs and memory in a speedy, efficient manner.

But as great as CXL may be, it didn't quite take things to the level of being able to mix and match different chiplets made by different companies using different types and sizes of manufacturing processes in a true Lego-like fashion. That's where the new UCIe standard comes in.

Started by a powerful consortium of Intel, AMD, Arm, Qualcomm, Samsung, Google, Meta, and Microsoft, as well as chipmakers TSMC and ASE, UCIe builds on the CXL and PCIe 5.0 standards and defines the physical (interconnect) and logical (software) standards by which companies can start designing and building the chips of their dreams.

Want to mix an Intel CPU with an AMD GPU, a Qualcomm modem, a Google TPU AI accelerator and a Microsoft Pluton security processor onto a single chip package, or system on package (SOP)? When UCIe-based products start to get commercialized in say the 2024-2025 timeframe, that's exactly what you should be able to do.

Not only is this technologically and conceptually cool, but it also opens a whole new range of opportunities for chip companies and device makers and creates many new types of options for the semiconductor industry as a whole. For example, this could enable the creation of smaller yet still financially viable semiconductor companies that only focus on very specialized chiplets or who only concentrate on putting together interesting combinations of exiting parts made by others.

For device manufacturers, this theoretically allows them to build their own custom chip design without the burden (and cost) of an entire semiconductor team. In other words, you could create an Apple-level of chip specificity at what should be a significantly lower development cost.

From the manufacturing side, there are huge benefits as well. Though it's not well-known, not all chips can benefit from being built at cutting edge process nodes, such as today's 4 nm and 3 nm. In fact, many chips, particularly those that process analog signals, are actually better off being built at larger process nodes.

Things like 5G modems, RF front ends, WiFi and Bluetooth radios, etc., perform significantly better when built at larger nodes, because they can avoid issues like signal leakage. As a result, companies like GlobalFoundries and others that don't have the smallest process nodes but do specialize in unique manufacturing, process, or packaging technologies should have an even brighter future in a chiplet-driven semiconductor world.

The ability to show value won't be limited to those who remain on the cutting edge of process technology -- though, to be sure, that will continue to be extremely valuable for the foreseeable future. Instead, chip design companies or foundries that can demonstrate the ability to offer unique capabilities at one of many different steps along the semiconductor industry supply chain should be able to build more viable businesses. Plus, the ability to mix and match across multiple companies could lead to a more competitive market and, hopefully, should be able to reduce the kind of supply chain disruptions we've seen over the last few years.

There's still a lot of work to be done to broaden support for UCIe even further and ensure that it works as well, and as seamlessly, as the concept first suggests. Thankfully, the initial set of companies launching the standard are impressive enough that they're bound to encourage both some obvious missing players (I'm looking at you Apple and Nvidia) as well as a broad array of lesser-known companies to participate.

The possibilities for UCIe and, most importantly, its potential for disruption is enormous. Today's semiconductor industry has already morphed into an exciting and competitive new era, and because of the pandemic-drive chip shortages we've experienced in all aspects of society, awareness of the importance semiconductors has never been higher. With the launch of UCIe, I believe there's the potential for the industry to reach an even higher level, and that, most certainly, will be interesting to watch.

Bob O'Donnell is the founder and chief analyst of TECHnalysis Research, LLC a technology consulting firm that provides strategic consulting and market research services to the technology industry and professional financial community. You can follow him on Twitter .

Permalink to story.

 
Excellent write and thanks for this.

I do worry Apple will not give two hoots about this. They will want to run a locked ecosystem forever IMO. Still maybe rational minds not 100% profit driven will prevail.
 
It obviously is the near future .
Software, OS is critical

I like the mention that this will create a lot of smaller players - that can leverage to do things better and for products unknown .
If this can incorporate neural networks that mimic bees navigation ( something that is happening now ) - then we need to think outside PC, gaming device, media consumption - I'm think Terminators , Robots , highly autonomous devices. medical devices for controlling hearts, drug supply etc , artificial limbs . Devices needing real time feedback that are not linear or integer derivatives/integration calculus - ie the much harder factional calculus - that show up in real life examples .
Also traffic flow, proper smart traffic lights . space exploration
Obviously it will be used for warfare - modelling flows , multiple movements .
 
It obviously is the near future .
Software, OS is critical

I like the mention that this will create a lot of smaller players - that can leverage to do things better and for products unknown .
If this can incorporate neural networks that mimic bees navigation ( something that is happening now ) - then we need to think outside PC, gaming device, media consumption - I'm think Terminators , Robots , highly autonomous devices. medical devices for controlling hearts, drug supply etc , artificial limbs . Devices needing real time feedback that are not linear or integer derivatives/integration calculus - ie the much harder factional calculus - that show up in real life examples .
Also traffic flow, proper smart traffic lights . space exploration
Obviously it will be used for warfare - modelling flows , multiple movements .

The only problem is that a lot of the "smaller operators" will be Chinese scammers swapping out chiplets for inferior ones.
 
This will be great for the future of microprocessors! I wonder if this is something that is truly transformative, because the players involved are certainly impressive!
 
Not wanting to be "that guy", but anyway...

Moore's law wasn't that performance would double, it was chip density would.

"Moore's law is the observation that the number of transistors in a dense integrated circuit (IC) doubles about every two years". https://en.wikipedia.org/wiki/Moore's_law

So while this is exciting news it really doesn't have any bearing on Moore's law. It's forecasted to end sometime in 2025 regardless of any other tech advances to come. Performance increases OTOH have a brighter future since new concepts like UCIe are being explored. But AFAIK eventually quantum will be the only game in town.
 
Well, having written a lot of software in my day, I still think from a software and systems perspective. What concerns me here is that the Wintel computer industry has coalesced around a limited number of CPUs (AMD & Intel), graphics subsystems, Ethernet, wifi, audio. This has made Microsoft's job of providing an operating system easier than Microsoft itself thinks, because the combinations of chips that need to be tested becomes finite and much smaller than it used to be. Note also that the API for hardware drivers has become very stable across Windows 7, 8, 10, and 11, unlike previous Windows versions where the driver APIs changed with every new Windows version.

Now stir in UCI-e chips and whatta ya got? Unless managed well, we have a cacophony of hardware for which to develop and test software. If much or all of this is invisible to the software developer, no problem. But since standard APIs do not exist for AI, among other potential chiplet types, definition of clear-cut, well-thought-out and stable APIs for AI need to be done. Ditto for any other new chiplet function.
 
This seems mostly to be targeted for professional work or closed systems, like gaming consoles. And maybe something like the raspberry pi.
 
This seems mostly to be targeted for professional work or closed systems, like gaming consoles. And maybe something like the raspberry pi.
Nope. Not with all those big-time companies working in a consortium. Far more pervasive than closed systems or highly specialized ones.
 
Unless managed well, we have a cacophony of hardware for which to develop and test software.
Do programmers currently care whether the CPU their program is running on has 2 cores or 32? whether the processor has an iGPU or a dedicated GPU? whether it has MMX, SSE or AVX-512 vector processing units? No because it's all handled by the compiler or other low level functions. Our programs will still run on systems that are missing certain features, it's just that they'll run slower. That's why we advise minimum specs for applications to run.
 
Do programmers currently care whether the CPU their program is running on has 2 cores or 32? whether the processor has an iGPU or a dedicated GPU? whether it has MMX, SSE or AVX-512 vector processing units? No because it's all handled by the compiler or other low level functions. Our programs will still run on systems that are missing certain features, it's just that they'll run slower. That's why we advise minimum specs for applications to run.
Yes, but... Everything you say about the current Windows (or MacOS or Linux) environment is spot on. It all works seamlessly because there are proven and stable APIs for programmers to use hardware capabilities. Programmers in these environments rarely write code that touches the bare iron, as I once did.

To say it differently, the chiplet hardware pieces all by themselves are insufficient. Also needed are clearly defined and stable APIs for using the chiplets. I'll note again that it has taken many years from Windows NT to Windows 7 to provide stable little-changing hardware APIs to use the current collection of hardware devices. Unless there is insane and intense intellectual activity up front to design chiplet APIs, we'll all have the same mess on our hands that we have had for years with the current collection of Windows hardware, often rendered obsolete by a changing API.

On the other hand, for closed use in specific industries like automotive, machine controls and similar, programming bare chiplet iron is OK because the company doing it owns it, altho the saga of 3G modems in vehicles is not OK. Even in industrial applications, there will be moves to standardize how to program chiplets across an industry.
 
Everyone keeps equating Moores Law with Transiter Density and it had nothing to do with that. It's always been about performance or as the Iternal Combustion Engines Foks say Horsepower not cylinders. When you look at the performance of the 8008 chip that started all of this and what happened when they released they 8088 the performance did double. Even the 286 doubled the performance while the 386 increased it again. The P4 was and still is an annomoly as performance didn't double for Intel while with the current Ryzen designs, we're seeing 25 plus percent increases per generation in IPC (instructions per clock), once again validating the Performance doubling that Moore's Theory is all about. As I said, it's got nothing to do with V8/V12 when it's a Horsepower increase and we'll continue seeing that if they quit trying to reach molecular computing levels before we develop warp drive.

Edit: Not instead of Now

Thanks Geezer: I always thought it was Performance and you just gave me the poke in the eye about being wrong. Oh well, get reminded about things but I'll stand by what I said in regards to Performance being the key factor over the years when you look at IPC.
 
You can use our toys in your 4GB/s per lane, pcie 5 based sandpit, says "cat got the cream" smug Lisa.

Mean time tho, we will have the similar inhouse option of doing similar w/ infinity Fabric (how does 2x GPUs linked at ~3TB/s sound?)... oh. and we can throw in a Xilinks FGPA matrix as well, for dynamically optimising teams of resources to the workload.

She may look like she is opening up the chiplets world she so dominates with Fabric, but in fact she draws the net tighter by drawing their product's orbits closer to her stronger one.
 
Not wanting to be "that guy", but anyway...

Moore's law wasn't that performance would double, it was chip density would.

"Moore's law is the observation that the number of transistors in a dense integrated circuit (IC) doubles about every two years". https://en.wikipedia.org/wiki/Moore's_law

So while this is exciting news it really doesn't have any bearing on Moore's law. It's forecasted to end sometime in 2025 regardless of any other tech advances to come. Performance increases OTOH have a brighter future since new concepts like UCIe are being explored. But AFAIK eventually quantum will be the only game in town.

Transistors will continue to shrink for a long time, but the cost per transistor and the economics of designing a chip are going up and up. That means in practice, fewer companies will use leading edge nodes. Thankfully, things like UCI-e and AI-base chip design tools will help counter-act this trend.
 
Transistors will continue to shrink for a long time, but the cost per transistor and the economics of designing a chip are going up and up. That means in practice, fewer companies will use leading edge nodes. Thankfully, things like UCI-e and AI-base chip design tools will help counter-act this trend.
Give this a read. We're fast approaching the point where the size of silicon atoms will limit how much further we can go. It's physics, which means, no, we won't see transistors shrinking for a long time to come. 2-3 more years at best.
 
Do programmers currently care whether the CPU their program is running on has 2 cores or 32? whether the processor has an iGPU or a dedicated GPU? whether it has MMX, SSE or AVX-512 vector processing units? No because it's all handled by the compiler or other low level functions. Our programs will still run on systems that are missing certain features, it's just that they'll run slower. That's why we advise minimum specs for applications to run.
You mentions lots of aspects. Some of them indeed must be concerned by programmer or user who runs the software, e.g.:
a. MMX, AVX, etc. are much faster than generic instructions, e.g. for AV1 codecs, so programmer must check for their availability and use them if they exist to gain huge performance increment.
b. If the discreet GPU is Nvidia then programmer can use CUDA for related software rather than OpenCL.
c. Multicore processor is not plain uniform on all cores. For memory heavy application, the program must run within same cores that share caches, e.g. spreading such program to both chiplets of 5950X can be contra productive. This CPU affinity must be controlled by user when launching the software.
 
Transistors will continue to shrink for a long time, but the cost per transistor and the economics of designing a chip are going up and up. That means in practice, fewer companies will use leading edge nodes. Thankfully, things like UCI-e and AI-base chip design tools will help counter-act this trend.
silicon atom size is around 0.2 nm. we are already at 4nm process now. In 10 years, there might be no more manufacturing improvement can be done on silicon based chip.
 
Excellent write and thanks for this.

I do worry Apple will not give two hoots about this. They will want to run a locked ecosystem forever IMO. Still maybe rational minds not 100% profit driven will prevail.
This can be useful for chiplet based igpu, such as past kabylake g.

I don't see ucie are relevant for other use cases.
smartphone and tablet will stay single chip soc due to space constraint.
wireless modem chip (wifi and cellular) in laptop needs to be close to antenna, instead of near cpu, to reduce feeder loss inefficiency. putting it in cpu pacakge is even contra productive as the modem will add heat into cpu package.
I used external usb cellular modem in the past and it got really hot.
 
You mentions lots of aspects. Some of them indeed must be concerned by programmer or user who runs the software
Disagree. Modern compilers handle all the instances you mention. The only exception is CUDA which needs an NVidia GPU (because it's produced by NVidia) but that's a minute fraction of all software that's produced. You can even program and run OpenCL without a GPU. You can create multiple threads within your programs to run on multiple cores but it doesn't matter if the program ends up running on a 2 core processor, it simply runs slower.

If programmers had to care about all the different features available on every different processor then it would take forever to code anything and software would cost a fortune. Instead we have programs like Excel or GTA 5 that run on everything from Intel dual core processors to an Apple Mac to an AMD threadripper. It will run on internal GPU's to AMD or NVidia GPUS.
 
Disagree. Modern compilers handle all the instances you mention. The only exception is CUDA which needs an NVidia GPU (because it's produced by NVidia) but that's a minute fraction of all software that's produced. You can even program and run OpenCL without a GPU. You can create multiple threads within your programs to run on multiple cores but it doesn't matter if the program ends up running on a 2 core processor, it simply runs slower.

If programmers had to care about all the different features available on every different processor then it would take forever to code anything and software would cost a fortune. Instead we have programs like Excel or GTA 5 that run on everything from Intel dual core processors to an Apple Mac to an AMD threadripper. It will run on internal GPU's to AMD or NVidia GPUS.
Many famous software checks hardware spec.
Just open chrome://gpu in all chromium based browsers, including mobile browser, then you might see that several features are disabled because of the cpu or gpu types.
Other example, Openssl will use aes acceleration, e.g. Aes-ni, if they are available in processor.
 
Many famous software checks hardware spec.
Just open chrome://gpu in all chromium based browsers, including mobile browser, then you might see that several features are disabled because of the cpu or gpu types.
Other example, Openssl will use aes acceleration, e.g. Aes-ni, if they are available in processor.
While Chrome looks like a simple application you can install, it is also an full operating system developed by Google, so they can afford to add all the bells and whistles. It took millions of man hours to develop and takes hours just to compile on an ordinary PC. OpenSSL is not ordinary software, or "famous software", but offers high level cryptography for internet servers.

I appreciate the time you're spending to find exceptions to what I stated, and I accept there will always be exceptions, but it just doesn't make sense to developers to check all the specifics of the hardware their programs are running on. Doing this would lead to much longer development times which means more expensive software. It's just far easier to rely on compilers, or low level libraries, to do all this work for you (for free).
 
While Chrome looks like a simple application you can install, it is also an full operating system developed by Google, so they can afford to add all the bells and whistles. It took millions of man hours to develop and takes hours just to compile on an ordinary PC. OpenSSL is not ordinary software, or "famous software", but offers high level cryptography for internet servers.

I appreciate the time you're spending to find exceptions to what I stated, and I accept there will always be exceptions, but it just doesn't make sense to developers to check all the specifics of the hardware their programs are running on. Doing this would lead to much longer development times which means more expensive software. It's just far easier to rely on compilers, or low level libraries, to do all this work for you (for free).
Openssl is very very common software.
Almost all public webservers are Linux based and they use openssl for processing https, which is almost 100% of internet traffic today.
Aes ni in Intel and amd processor are at least 5x faster than generic instructions.
 
Back