Data centers are wasting energy by running processors at full speed

Skye Jacobs

Posts: 1,981   +58
Staff
Why it matters: Data centers are power hogs, and a key issue that operators have been trying to solve is how to reduce their energy and resource consumption. Some ingenious remedies have been found, such as using non-potable seawater to cool equipment, but one obvious solution appears to have been overlooked: enabling processors' various power-saving capabilities.

Data center power consumption has become a major concern as demand grows and utilities struggle to keep up. Operators are looking for ways to reduce energy use and costs, with many developing novel ways of cooling equipment and maximizing data center design.

A new post by Uptime Institute suggests enabling built-in power management features on servers could significantly reduce energy consumption. It says that OS-level governors and power profiles could reduce energy use by 25-50 percent, while enabling processor C-states could reduce idle power by nearly 20 percent.

These power-saving features are often disabled by default due to concerns about performance instability and latency. However, Uptime argues the performance impact is negligible for most workloads, except very latency-sensitive ones like high-frequency trading.

Indeed, modern processors often deliver more performance than is needed for acceptable service quality, and it is possible that running at full speed may waste energy. There's a point of diminishing returns where using more power yields minimal performance gains.

To address this issue, CPU vendors have developed various power/performance management techniques. Software-based controls can cut power use by 25 to 50 percent but may impact latency more. Hardware-only implementations have less latency impact but offer only 10 percent or less in power savings. A combined software/hardware approach offers a middle ground with 15 percent to 20 percent savings.

Despite the performance tradeoffs, Uptime argues power consumption should be the main concern for most use cases and that maximizing performance and enabling these features across a data center could add up to substantial energy and cost savings.

This approach makes sense, as over-performance is rarely tracked, while many tools exist to maintain minimum service levels. Additionally, the energy consumption curve for processors gets steeper as they approach peak performance, suggesting potential for savings.

It's worth noting that power management techniques originated in mobile applications where energy efficiency is critical. This background suggests that for most workloads, the latency impact of power management may be less than feared.

Given these factors, data centers may be wasting energy by running processors at full speed when it's not necessary for the workload. Supporting this idea, Uptime cites benchmark data showing servers are often most energy-efficient when limited to lower performance states.

Permalink to story:

 
I’d love to known how to keep my 5950x at 65W and undervolt 4080 and supposedly they’d give more or less same performance but it’s so complicated
 
Years ago, where I worked, I taught the IT people how to reduce the company's electricity bill... they didn't know about Intel SpeedStep , C-states or the saving modes... . to configure those options according the PC and how it was used. Also about the right PC on the right place. It was incredible sometimes to see a secretary, with an i7-920, with the clock at full speed, even in Word...
 
Last edited:
Perfect, I knew I was doing the right thing disabling all my CPU's C-states. Better performance and lower latency.
 
That works for Cloud and Enterprise but co-lo and wholesale have no control over what happens at the IT level. Furthermore, setting it all is overly complicated so as long as it is not enabled by default then nothing is going to be done
It work even better for CoLo as you supposedly own that equipment. So your folks need to configure it to begin with.
 
We had to run our HP servers in maximum power settings because they under-performed and could not escalate in a fast way, if suppliers could fix it I would have no problems with energy-saving settings.
 
If you look at web, and how webhosting has evolved, the fun thing I always look and laugh about is the mass deployment of for example wordpress.

Now there's generally nothing wrong with that, but the amount of resources required to get that running "proper" is just out of this world.

I've seen hosting company's advise someone to buy a expensive VPS while the website was obviously being attacked due to lack of proper security.

They are all trying to push for bigger, better more, but looking at the very base or optimizing for the very basic needs, nobody heard of.

On the contrary I run a Epyc server too, the avg load with approx 1000 websites alone and heavily tuned is 2% - power savings I cannot run and I pay for amps an hour.

Latency is important - esp in web sensitive applications.
 
I feel this article (or the whole website) is targeted at small-scale on-premise deployments that do not have capability to properly tune their software deployment. These deployment usually have stranded spare capacity and thus saving any power is helpful. That's not true for many other deployments.

Cloud vendors have to provide consistent performance because they don't know the workload so they have little choice. Meanwhile if you own the infrastructure and workload, instead of squeezing out 20% per server through reducing frequency, the obvious answer is to not reduce frequency but pack more work onto that host through improved resource colocation, isolation and load-balancing. This would enable you to shrink your fleet, saving not just the CPU power but the additional hardware entirely.

Mobile device optimization is very different from server, because:
1) Your phone can't find more work to do when there isn't any, but server workload is flexible.
2) The CPU power is main power cost outside display for a phone. However, memory, network, storage costs a lot of power on a server. Same optimization on CPU is relatively less noticeable and can easily backfire if it causes more under-utilization of other resources, even if compute performance doesn't regress much.
3) Client CPUs like desktop and mobile climb the steepest part of power-frequency curve to squeeze out last bit of performance. Server CPUs (except the few SKU for HPC where you run 100%) are power optimized at first place and don't get anywhere close to that end of the curve. It's not linear, but pretty close.

Thus outside very obvious situations like a server completely idling, optimizing for local CPU power saving for a fleet of enough scale is pursuing local maximum while missing the big picture. One should be focusing on saving hundreds of servers, not hundreds of MHz per server.
 
I’d love to known how to keep my 5950x at 65W and undervolt 4080 and supposedly they’d give more or less same performance but it’s so complicated

Enable 65W eco mode.

Undervolting your 4080 the easy way:

I've got my 13700K/4070 drawing 200-225W combined while gaming.
 
Back