Intel identifies cause behind Raptor Lake crashes, says mobile CPUs aren't affected

Daniel Sims

Posts: 1,880   +49
Staff
In a nutshell: Intel's latest statement on crashing 13th- and 14th-generation CPUs suggests that contrary to the company's recent denial, a faulty microcode algorithm is indeed mistakenly overvolting the processors. Intel plans to release a microcode patch in mid-August. Furthermore, the company denies Alderon Games' recent assertion that mobile CPUs are also impacted.

Chipzilla believes it has finally solved the mystery behind the instability recently reported in its latest high-end desktop processors. A patch should arrive in the middle of next month to address the issue.

After remaining silent for a worryingly long time, Intel announced that a microcode algorithm error sent the wrong amount of voltage to CPUs. The explanation sounds similar to a problem Intel discovered early in its investigation, but later denied was the true cause.

For months, users of Intel's 13th- and 14th-generation Core i9 and i7 processors have reported odd crashes related to CPU-intensive tasks. High-end games, often running on Unreal Engine 5, encounter errors when compiling shaders or performing other heavy workloads. Cinebench and Handbrake users also reported similar problems.

Suspicion turned toward overclocking and voltages early on, and motherboard manufacturers began deploying updates with more conservative power delivery profiles. Intel also suggested users and OEMs return to the company's recommended baseline voltages. This mitigated the issue but didn't solve it outright.

Intel discovered and addressed an overclocking-related microcode problem last month. However, following a report from Igor's Lab suggesting that the company had found the answer, it said it was still investigating. Now, it seems the problematic algorithm may have been the root cause after all.

While reports initially suggested the problem was restricted to consumer desktops, developer Alderon Games revealed that its Intel-powered servers, benchmarking tools, and laptops were also crashing. In likely the sharpest criticism of the whole episode, the company recommended that Intel recall the affected processors.

In response, Chipzilla told Tom's Hardware that its data indicated mobile CPUs weren't suffering from the same problem, and could be crashing from any number of other software and hardware errors. Alderon hit back by elaborating on its accusations.

Company head Matt Cassells claimed that the Core i9 13900HX and other notebook processors crashed identically to the impacted desktop CPUs. Accusing Chipzilla of downplaying the issue to avoid damaging its OEM partnerships, he said Alderon observed the crashes in Razer, MSI, and Asus laptops.

Intel is still validating its results before releasing the August microcode patch.

Permalink to story:

 
To put it another way, Intel's new patch will mean clock speeds will be lower.

That also means any results obtained with Raptor Lake CPUs using older microcode are NOT valid. Either CPU is not stable or speed is lower than using CPUs with fixed firmware. I advice Techspot leaves all Raptor Lake CPUs out from upcoming Zen5 article. Now that CPUs are defective, there is no reason to test them, right?
 
To put it another way, Intel's new patch will mean clock speeds will be lower.

That also means any results obtained with Raptor Lake CPUs using older microcode are NOT valid. Either CPU is not stable or speed is lower than using CPUs with fixed firmware. I advice Techspot leaves all Raptor Lake CPUs out from upcoming Zen5 article. Now that CPUs are defective, there is no reason to test them, right?
Since when undervolting means lower clocks?
 
Since when undervolting means lower clocks?
While general undervolting does not always lead to lower clocks if you keep it "reasonable", you can see regressions in many cases. Max boost or all core boost clocks can be affected and this isn't really the usual undervolt you and me can do.
 
High clock speeds require high voltages. And we know that voltages are very high Raptor Lake CPUs. Why? Just look at power consumption...
My I7 14700k is undervolted and the 5th and 6th core hit 5800, while the default is 5500Mhz (p-cores). Other cores are up in clocks. You can undervolt the CPU and up the clocks. You can see tutorials on YT, but ofc depends on your silicon. Your point is moot. It depends on thermals, system stability if clocks go lower, not just voltage.
 
While general undervolting does not always lead to lower clocks if you keep it "reasonable", you can see regressions in many cases. Max boost or all core boost clocks can be affected and this isn't really the usual undervolt you and me can do.
As I said in my above post, I managed to undervolt my i7 and up the clocks. And I'm not Intel. I'm sure their solution will be better and keep the clocks the same. Could be, would be a certain way, we'll see, all I'm saying it's doable.
 
Last edited:
Exactly, as I said in my above post, I managed to undervolt my i7 and up the clocks. And I'm not Intel. I'm sure their solution will be better and keep the clocks the same. Could be, would be a certain way, we'll see, all I'm saying it's doable.
"this isn't really the usual undervolt you and me can do"
 
My I7 14700k is undervolted and the 5th and 6th core hit 5800, while the default is 5500Mhz (p-cores). Other cores are up in clocks. You can undervolt the CPU and up the clocks. You can see tutorials on YT, but ofc depends on your silicon. Your point is moot. It depends on thermals, system stability if clocks go lower, not just voltage.

You can also overclock AMD CPU and still lower voltages. Problem is that those clocks and voltages must apply to Every CPU with similar model number out there. Even when using not so good cooling, ambient temperature is high, motherboard is not super quality etc etc. That's why voltages are always much higher than "needed".

But again, main reason for Raptor lake power consumption is high clock speeds. To reach high clock speeds, Intel needed to put too much voltage. If voltage is lowered, Intel must also lower clock speeds. Unless voltages were way too high on purpose that of course makes zero sense.
 
You can also overclock AMD CPU and still lower voltages. Problem is that those clocks and voltages must apply to Every CPU with similar model number out there. Even when using not so good cooling, ambient temperature is high, motherboard is not super quality etc etc. That's why voltages are always much higher than "needed".

But again, main reason for Raptor lake power consumption is high clock speeds. To reach high clock speeds, Intel needed to put too much voltage. If voltage is lowered, Intel must also lower clock speeds. Unless voltages were way too high on purpose that of course makes zero sense.

Pending how aggressively they have been binning the chips, the voltages may have been too high for this purpose.

It's all conjecture however.

GN is collecting information from the masses and stated in one of their recent videos they believe silicon to be affected from March 2023 going forward.

My 13700K was purchased, and has a production date, in March 2023. I de-lidded, applied a minor undervolt w/ minor overclock and capped PL2 at 185W since the diminishing returns are huge.

I have only had a handful of system crashes in 16 months and never when playing games. I still haven't tried any of the new BIOSes/microcode updates since my custom tuning settings are more conservative than "recommended".

It will be interesting to see testing from various professional outfits for what this final "fix" will do to performance.
 
Pending how aggressively they have been binning the chips, the voltages may have been too high for this purpose.

It's all conjecture however.

GN is collecting information from the masses and stated in one of their recent videos they believe silicon to be affected from March 2023 going forward.

My 13700K was purchased, and has a production date, in March 2023. I de-lidded, applied a minor undervolt w/ minor overclock and capped PL2 at 185W since the diminishing returns are huge.

I have only had a handful of system crashes in 16 months and never when playing games. I still haven't tried any of the new BIOSes/microcode updates since my custom tuning settings are more conservative than "recommended".

It will be interesting to see testing from various professional outfits for what this final "fix" will do to performance.

Possible yes. Intel may have put voltages extra high to get more working chips. Still, if they did it, overvoltaging was very excessive.

Also we must remember that boost clock speeds are "upto" something, nothing is guaranteed there. So Intel may lower boost clocks and still stay on right side legally. However doing that will also mean less than advertized performance.

Also lowering voltages without touching clock speeds rise interesting question about Intel making ultra hot CPUs on purpose.

In any case, Intel only have bad and worse options here.
 
- what are they going to do with CPUs that are already damaged beyond repair?
- why did it take them so long to identify a voltage that is too high, isn't that easy to measure?
- why are people claiming laptop crashes as well, while Intel denies this?
- does this mean the original power limits can be restored so people don't lose (even more) performance?
- is the performance and frequency of a stable Intel cpu eventually the same as during release day when all is said and done?

There are so many unanswered questions here, especially Intel denying that laptops are affected smells fishy.
 
I de-lidded, applied a minor undervolt w/ minor overclock and capped PL2 at 185W since the diminishing returns are huge.

delid is still recommend/provides significant improvements ? it's not classic delid you need to melt what is under no ?
 
You can also overclock AMD CPU and still lower voltages. Problem is that those clocks and voltages must apply to Every CPU with similar model number out there. Even when using not so good cooling, ambient temperature is high, motherboard is not super quality etc etc. That's why voltages are always much higher than "needed".

But again, main reason for Raptor lake power consumption is high clock speeds. To reach high clock speeds, Intel needed to put too much voltage. If voltage is lowered, Intel must also lower clock speeds. Unless voltages were way too high on purpose that of course makes zero sense.
I agree undervolt and overclock can only work I.e. have stability, depending on silicon quality and system build. However, that just means most of their silicon is trash, like an i5 they sold as an i7, or an i7 sold as an i9 and it was pushed to the limit. Because a "normal" i7 can sustain a 55x multiplier easily. An i7 14700k TDP is 125W vs Ryzen 7800x3d 120W. And consider Ryzen is built on a smaller node. Ok, power consumption can go higher, but my two cents, overclocking at any costs was the problem. That's why they initially had insane voltages. My Asus motherboard was "optimised" to pull 511 amps, short duration power package limit was set to 4095watts, which is basically, unlimited.
 
To put it another way, Intel's new patch will mean clock speeds will be lower.

That also means any results obtained with Raptor Lake CPUs using older microcode are NOT valid. Either CPU is not stable or speed is lower than using CPUs with fixed firmware. I advice Techspot leaves all Raptor Lake CPUs out from upcoming Zen5 article. Now that CPUs are defective, there is no reason to test them, right?

Well that will give the Steves something to do. :)
 
I agree undervolt and overclock can only work I.e. have stability, depending on silicon quality and system build. However, that just means most of their silicon is trash, like an i5 they sold as an i7, or an i7 sold as an i9 and it was pushed to the limit. Because a "normal" i7 can sustain a 55x multiplier easily. An i7 14700k TDP is 125W vs Ryzen 7800x3d 120W. And consider Ryzen is built on a smaller node. Ok, power consumption can go higher, but my two cents, overclocking at any costs was the problem. That's why they initially had insane voltages. My Asus motherboard was "optimised" to pull 511 amps, short duration power package limit was set to 4095watts, which is basically, unlimited.

Right now it seems most silicon is indeed trash. With limited information, that may of course change.

About wattage, those are not comparable. AMD default TDP for 7800X3D is 120 watts. Meaning CPU power consumption on default settings will not exceed 120 watts. Never. Well, accept very small exceeds. Socket power is different thing however.

14700K PL2 value Default is 253 watts. So despite 5 watt difference on CPU power, actual difference is 133 watts. And that is Before any motherboard maker toasty special barbeque.

So yeah, overclocking at any cost should be at least part of problem.
 
A point that seems to be overlooked is that virtually all consumers will likely never attempt the suggested fixes. If Intel does not provide an effective solution, then the majority may face unfavorable outcomes.
 
Some thoughts:

1: "Incorrect voltages" and "algorithm" makes me think that max clocks won't be impacted; this sounds like an issue where under certain workloads more power then anticipated/allowed was making its way to the processor, causing long-term damage.

2: This almost certainly affects lifetime of the CPUs, and those already showing problems are almost certainly damaged beyond repair. Will be interesting to see how strict Intel is with the coming RMAs, because they are going to be coming in droves.
 
To reach high clock speeds, Intel needed to put too much voltage. If voltage is lowered, Intel must also lower clock speeds. Unless voltages were way too high on purpose that of course makes zero sense.
A combination of poor logic and wishful thinking. If the voltage was correctly calculated, but too high for the chip to handle, your argument would be correct. But it sounds like the algorithm is incorrectly calculating an above-spec voltage -- one higher than required -- meaning the calculation can be corrected with little to no performance impact.

- what are they going to do with CPUs that are already damaged beyond repair?
Who says CPUs were "damaged beyond repair" by this?

- why did it take them so long to identify a voltage that is too high, isn't that easy to measure?
The answer should be obvious. The specific conditions requiring the algorithm to incorrect calculate too high a voltage needed to be duplicated.

- why are people claiming laptop crashes as well?
Confirmation bias is the most likely explanation.
 
A combination of poor logic and wishful thinking. If the voltage was correctly calculated, but too high for the chip to handle, your argument would be correct. But it sounds like the algorithm is incorrectly calculating an above-spec voltage -- one higher than required -- meaning the calculation can be corrected with little to no performance impact

There is very big problem if this is really case. It's very easy to spot "too high voltage". Just looking at clock speed vs power consumption curve tells very quickly that something is wrong. In other words, if on similar load power consumption gets much higher than it should, then there must be big jump on voltage. Either Intel had *****s doing testing or they just skipped testing altogether.
 
To put it another way, Intel's new patch will mean clock speeds will be lower.
No, Intel says like the microcode was giving dangerous amount of voltages to the CPU unnecessarily.
IF it's true, a quick patch should solve the problem without performance impact. Though, there are probably millions of degraded CPUs out in the wild already.
 
No, Intel says like the microcode was giving dangerous amount of voltages to the CPU unnecessarily.
IF it's true, a quick patch should solve the problem without performance impact. Though, there are probably millions of degraded CPUs out in the wild already.

Like said earlier, that hardly makes any sense. That kind of behaviour is very easy to spot during testing. Another thing is that if you need certain amount of coltage to reach high clock speeds, you cannot lower voltage or you don't get those clock speeds.

Only scenario that doesn't lower clock speeds is that error supplies "too much" voltage on other than max load situations. If true, then Intel's reputation is gone.
 
It's very easy to spot "too high voltage". Just looking at clock speed vs power consumption curve tells very quickly that something is wrong.
You are again incorrect. The microcode to calculate the delivered voltage depends on many factors besides clock speed. And the fact that the crashes are rare tells us the conditions to duplicate the problem are equally uncommon.

Another thing is that if you need certain amount of coltage [sic] to reach high clock speeds, you cannot lower voltage or you don't get those clock speeds.
You're still not seeing your logical error. For the voltage to be "required", the algorithm would have had to calculate it correctly. This is not happening, however. The algorithm is incorrectly calculating too high a voltage.
 
A combination of poor logic and wishful thinking. If the voltage was correctly calculated, but too high for the chip to handle, your argument would be correct. But it sounds like the algorithm is incorrectly calculating an above-spec voltage -- one higher than required -- meaning the calculation can be corrected with little to no performance impact.


Who says CPUs were "damaged beyond repair" by this?


The answer should be obvious. The specific conditions requiring the algorithm to incorrect calculate too high a voltage needed to be duplicated.


Confirmation bias is the most likely explanation.

We've had multiple systems at work that would instantly bsod upon launching an application that used more than a few cores. No bios update is going to fix that, they have been RMAed. And some of the replacement chips came instable out of the box, since we now do proper testing. We use aggressive power limits to keep them stable for now.

Haven't you seen any of the videos that are put out discussing the high failure rate? The level1techs video by Wendell drops some seriously concerning facts about failure rates.

There are also numerous articles about failing Intel laptops with raptor lake CPUs. I can't speak out of experience because I already went exclusively amd when covid was still a thing. Can't be bothered with extreme temperatures and scheduling issues. (I was using only Intel before they started their p and E core joke)

If you think this is all confirmation bias, you're in a state of denial yourself. Why do you think this news is being reported on everywhere? Nobody in their right mind would want to buy an Intel 13th / 14th Gen cpu right now and it's even worse that there still isn't a fix at this very moment.
 
Last edited:
Back