Researchers develop hardware component to boost core-to-core communications in CPUs

Shawn Knight

Posts: 15,285   +192
Staff member

Researchers from North Carolina State University in collaboration with Intel have come up with a method that significantly boosts the communication between cores of a processor.

Processors powering today’s computers and mobile devices rely on multiple cores that work together to tackle workloads. This is currently achieved by sending and receiving software commands between cores. While this method does work, it takes time which slows down a chip’s overall performance.

Researchers at NC State knew they could do better.

With help from Intel, the team created a new chip design that replaces the aforementioned software middleman with an integrated hardware solution that accelerates communication between cores.

Yan Solihin, a professor of electrical and computer engineering at NC State and co-author of a paper on the matter, calls the approach the core-to-core communication acceleration framework (CAF). According to the professor, CAF improves communication performance by two to 12 times which translates to execution times – from start to finish – that are at least twice as fast.

The magic behind the CAF design is a small device attached to the processor called a queue management device (QMD) that, in addition to keeping track of communication requests between cores, can perform simple computational functions. Because of this, it can accelerate some basic computational functions by up to 15 percent.

Solihin said his team is currently looking to develop other on-chip devices that could accelerate additional multi-core computations. Their work will be presented in a paper titled, “CAF: Core to Core Communication Acceleration Framework” at the 25th Annual Conference on Parallel Architectures and Compilation Techniques in Israel on September 11.

Permalink to story.

 
Am I the only guy that is just learning that this wasn't done with hardware?
I mean, I know that software has needed to be designed to run efficiently on multi core CPUs, but.....
 
Am I the only guy that is just learning that this wasn't done with hardware?
I mean, I know that software has needed to be designed to run efficiently on multi core CPUs, but.....

I'm sure you're not the only one, but this is why only software that was designed to use multiple cores can actually use multiple cores. We've had dual core processors for years (upwards of a decade now?), and even now, it takes expensive programs like Photoshop or Matlab, or cranky one-off research-focused programs, to actually use all cores. Even AAA games still rarely use more than 1 CPU core.

Now, GPU cores are a slightly different animal. Unless I'm mistaken, they do have a kind of hardware interconnect/controller, that allows software to utilize their massive core counts as if they were a single processor. So it was relatively easy to get a program to take advantage there; even freeware uses GPU multi-core processing these days. Trouble is that GPU architecture is very limited in the types of problems it can tackle. Great for anything graphics oriented, and the occasional (and specific) machine learning task, but no where near as flexible as a CPU.
 
I'm sure you're not the only one, but this is why only software that was designed to use multiple cores can actually use multiple cores. We've had dual core processors for years (upwards of a decade now?), and even now, it takes expensive programs like Photoshop or Matlab, or cranky one-off research-focused programs, to actually use all cores. Even AAA games still rarely use more than 1 CPU core.

You got the wrong idea, the article isn't talking about software running over the OS, it refers to software in the actual interconnect controllers. The logic of the interconnection is handled by code, not by hardwired gate-logic which the CAF is refering to.

Off-topic [unrelated to the previous paragraph, but FYI]: this is the pseudo-code of how difficult it is to implement multi-processor code:
Code:
Thread n = new Thread(){
    void run() {
         ...
    }
}

There, very hard and tiresome. The actual problem comes from balancing workloads among a variable amount of available cores and actually being able to divide tasks. But programming for multi-processors by itself isn't hard, at all. If you want to see cheap 100% CPU usage, just create an infinite loop in as many thread instances to fill all available logical cores and there you're utilizing all your CPU with less than 10 lines of code, depending on the language and OS; just to give you an idea.
 
I'm sure you're not the only one, but this is why only software that was designed to use multiple cores can actually use multiple cores. We've had dual core processors for years (upwards of a decade now?), and even now, it takes expensive programs like Photoshop or Matlab, or cranky one-off research-focused programs, to actually use all cores. Even AAA games still rarely use more than 1 CPU core.

You got the wrong idea, the article isn't talking about software running over the OS, it refers to software in the actual interconnect controllers. The logic of the interconnection is handled by code, not by hardwired gate-logic which the CAF is refering to.

Off-topic [unrelated to the previous paragraph, but FYI]: this is the pseudo-code of how difficult it is to implement multi-processor code:
Code:
Thread n = new Thread(){
    void run() {
         ...
    }
}

There, very hard and tiresome. The actual problem comes from balancing workloads among a variable amount of available cores and actually being able to divide tasks. But programming for multi-processors by itself isn't hard, at all. If you want to see cheap 100% CPU usage, just create an infinite loop in as many thread instances to fill all available logical cores and there you're utilizing all your CPU with less than 10 lines of code, depending on the language and OS; just to give you an idea.

That pseudo-code is a good way to get your multi-threaded application messing itself up. Just because you can add multiple threads, doesn't mean it is easy to do so without your bugs multiplying exponentially to match - either in the code itself, or while it is running and different threads are doing different things with your variables.

This piece of hardware sounds like it is the CPU counterpart to the hardware that GPUs have had pretty much since the beginning. It will allow developers to execute single-threaded applications on a multi-threaded CPU, so now there is no need to even specialize your code to be multi-threaded. The hardware handles it for you (for the most part - I'm sure there will still be a few things the developer needs to do themselves and be conscious of while they code)
 
Good deal. Will multi-processor cpus be finally utilized? It all seem more gimmicky up to this point.
 
That pseudo-code is a good way to get your multi-threaded application messing itself up. Just because you can add multiple threads, doesn't mean it is easy to do so without your bugs multiplying exponentially to match - either in the code itself, or while it is running and different threads are doing different things with your variables.

Did you even read what followed the pseudo-code? I'm not digging into semaphores, shared resources among threads, etc. The written example I used (the infinite loops) is in the ideal scenario where there are no dependencies and each thread can mind its own business. I'm just saying that coding for multiple processors isn't hard in principle, I stated some of the problems to actually achieve that for divide-and-conquer workloads. But using all your cores for the sake of it, if that's what you want, doesn't take much effort.
 
Did you even read what followed the pseudo-code? I'm not digging into semaphores, shared resources among threads, etc. The written example I used (the infinite loops) is in the ideal scenario where there are no dependencies and each thread can mind its own business. I'm just saying that coding for multiple processors isn't hard in principle, I stated some of the problems to actually achieve that for divide-and-conquer workloads. But using all your cores for the sake of it, if that's what you want, doesn't take much effort.

And I wasn't denying that - but when was the last time you wrote a piece of ideal, bug-free code? Or even saw a piece of ideal, bug-free code that anyone wrote? Idealism is fine to think about, but save it for the Undergrad Intro courses because it will never align with reality.
 
Back