Why is CPU not as important as people think it is?


TS Guru
I'm taking a class right now and the teacher keeps alluding to the fact that it doesn't matter how fast the CPU is. For example the CPU could be sitting there ideally. Could someone elaborate, is there no point in getting a faster CPU if the bus is too slow?


TS Enthusiast
I don't really understand what he is suggesting. There are too many variables. What CPU? What is the bus speed? What applications are in question? What do you mean by

"For example the CPU could be sitting there ideally"


TS Guru
For example she said there's an 800MHz Sparc that out performs an Intel Dual core at 2.2 GHz.

I'll update the details as soon as she post the lecture slides.


TS Ambassador
The concept is the simple problem of moving data into/outof memory, which is why we depend upon instruction and data caching. Consider:

If we fetch one instruction, prepare it for execution, perform the execution and iterate,
the CPU is limited by the fetch/prepare operations. Big Blue mainframes solved this in System/360 by changing first the "word size" and then adding caching. The cache was loaded and the instructions prepared and then the processor was given a stream of "ready-to-run" instructions. In this manner, the CPU was kept active even though memory access times were much slower. (some of you may remember that computers began with ferrite-core memory which took 35ms to read and write! UCK! We do I/O faster than that today :) ).

Then once the processor was running full steam, the problem became fetching and storing data. For example, adding 1 to a number (in C code N++ or N+1) becomes fetch the current value of the variable N, increment it by 1 and store the new result back into the variable location.

Additionally, we need I/O to long term storage where all those files are kept and
once in a millennium (relative to processor speeds), we update an accounting file or view a picture of our girl friend. A good architecture does not force the processor to choose between performing I/O vs instructions but rather allow I/O to proceed independently using DMA and a memory arbitration interlock to avoid CPU fetch of data that is inbound from external storage.

Thus, the motherboard solution for instruction & data caching become the limiting factors to the system performance. Front-Side, Back-Side and Bus caching are the common techniques today.


TS Ambassador
"For example the CPU could be sitting there idle"
Intel/MS uses a System Idle Process which consumes time in a loop which is prepared to yield to any other process when interrupted.

Other architectures have no such Idle Process but rather put the processor in an enabled wait state
which allow a pure interrupt-driven use of the CPU - - nothing to do => wait until there is.

Darth Shiv

TS Evangelist
For example she said there's an 800MHz Sparc that out performs an Intel Dual core at 2.2 GHz.
That's because clock rate is not the be all and end all. It depends on what you do with each clock cycle. There are quite a few variables. You'll probably go over pipelining next so here's just a little illustration of the point.

Simplistically, modern processors use what are called pipelines arranged into a core, and nowadays it is also common to even have multiple cores in a single die. A die is the physical package for a processor.

When you compare chips, the pipeline efficiency is one part you look at to see which is fastest. Easiest illustrated with an example.

1) Say I have a processor that is "pipelined" and has 5 stages in the pipeline (call it Processor A). In ideal circumstances, it can be processing an instruction in each stage of the pipe. So it finishes 1 instruction per clock cycle and each instruction takes 5 clocks from start to finish.

2) Next I introduce a new architecture (call it Processor B) that has a clock speed 50% faster but each instruction takes 7 clock cycles to finish. As long as no instructions are dependent on ones that go before it, we just gained 50% performance at the cost of instruction completion latency. But that doesn't mean much when your clock rate is 2GHz! The latency is practically nothing.

3) Now we consider situations where some instructions need the results of others. This can cause the pipeline to "stall". E.g. A x B + C, let's say for argument sake and our processor has an instruction for add and an instruction for multiply.
Here processor A may have to wait up to 5 cycles for the multiply to finish before doing the add. Processor B may have to wait up to 7 clock cycles.

Back to your question about the Sparc vs the Intel dual core, there are many ways an 800MHz processor can "beat" a 2.2GHz dual core.

a) The Sparc may have more cores. You didn't specify but the SPARC may be 4 or 6 or 8 or more cores. Or it could be just 1. This is just one factor.

b) Each core can have many pipes (pipelines) in it. So when my example before mentioned pipelines, modern cores actually have multiple. There is no physical limitations to the number of pipelines other than the size and cost of the chip. Just like number of cores. The SPARC could have lots of pipelines compared to the dual core.

c) The SPARC could have instructions that are more efficient. E.g. for my example, what if one processor had an instruction that did the add and multiply in a single instruction? It could be more efficient than one that does not if the clock speed is not too slowed down too much.

Hope that helps!


TS Guru
Thanks, these replies are helpful, if nothing else they got me to reread old notes and realize there's some fundamental concepts that allude me. What is an instruction and what is a 1-clock step? By instruction is that one "function" in assembly? e.g. add, sub, mov etc.

T= (N x S)/R

T = performance for a program
N =actual number of executed instructions (not
just number of instructions, i.e. loops are
multiplied by number of iterations)
S = number of basic 1-clock steps needed for one
R = clock rate