Explainer: What Are Processor Threads?

Brilliant article and I always love learning something new. I have heard rumours of 4 threads per core on desktop cpu's, do you think though that will happen. Intel is going all in on cores, and I believe Arrow Lake will have up to 40 E cores along side the 8 P cores. But E cores are single thread. Will that ever change too? But 15900K will support 56 threads as is. AMD is going hybrid with Zen 5 too, but their E cores will be Zen 5c cores and are SMT enabled.
 
I have heard rumours of 4 threads per core on desktop cpu's, do you think though that will happen.
Well you can never say never to such things, but to take proper advantage over more than two threads, the pipelines need to have lots of stages and the resources (low latency wide cache, lots of internal bandwidth, large register files, etc) to prevent a thread from stalling. When that happens on today's CPUs, it's only one other thread that gets impacted by the stall; if it was 4 way SMT, then that's 3 other threads being held up too. Better to simply have more cores, rather wider SMT.

E cores are single thread. Will that ever change too?
Again, can't say never to this, as the E cores are Intel's low power Atom architecture, which was original SMT capable. To say they're an odd design, compared to the P cores, would be an understatement. There's a huge number of pipelines, 17 in total, compared to the P's 12 -- there's literally one port for each CPU primary task, whereas in the P cores, some ports will handle integer and vector tasks (but only one at a time)

The reason for this is quite clever. The E core pipelines are designed to be left idle, if there isn't an instruction for a suitable port; this doesn't cause any stalls because any threads currently in flight are already been processed via another pipeline. By letting them run idle, they consume less energy, keeping the entire core's power demand down.

P cores, on the other hand, are 'classic' core designs. They're meant to be kept as busy as possible, all the time, and SMT helps in that endeavor. Hence why they have this functionality but the Es don't.
 
Nice article, but you don't go deep enough to explain how is it implemented in the CPU itself, IIRC, the core will have 2 sets of registers so that switching between the two threads can be done faster than saving/loading register values on/to the memory stack.
 
Well you can never say never to such things, but to take proper advantage over more than two threads, the pipelines need to have lots of stages and the resources (low latency wide cache, lots of internal bandwidth, large register files, etc) to prevent a thread from stalling. When that happens on today's CPUs, it's only one other thread that gets impacted by the stall; if it was 4 way SMT, then that's 3 other threads being held up too. Better to simply have more cores, rather wider SMT.

Again, can't say never to this, as the E cores are Intel's low power Atom architecture, which was original SMT capable. To say they're an odd design, compared to the P cores, would be an understatement. There's a huge number of pipelines, 17 in total, compared to the P's 12 -- there's literally one port for each CPU primary task, whereas in the P cores, some ports will handle integer and vector tasks (but only one at a time)

The reason for this is quite clever. The E core pipelines are designed to be left idle, if there isn't an instruction for a suitable port; this doesn't cause any stalls because any threads currently in flight are already been processed via another pipeline. By letting them run idle, they consume less energy, keeping the entire core's power demand down.

P cores, on the other hand, are 'classic' core designs. They're meant to be kept as busy as possible, all the time, and SMT helps in that endeavor. Hence why they have this functionality but the Es don't.


Thanks for the informative reply.
 
Nice article, but you don't go deep enough to explain how is it implemented in the CPU itself, IIRC, the core will have 2 sets of registers so that switching between the two threads can be done faster than saving/loading register values on/to the memory stack.
Sure, a lot more could have been said about CPUs properly handle SMT, but then the focus of the article would swing far more towards processor architecture. Let's face it: the question 'what are processor threads' was answered in just a few sentences, but that's not much an article! :)
 
Back