Software fix can double Threadripper 2990WX's performance in certain workloads

mongeese

Posts: 643   +123
Staff
What just happened? After the 16-core Threadripper 2950X launched with consistently great performance, AMD fans couldn’t have been more excited for the 32-core 2990WX… only to be disappointed when it finally released with far worse value and versatility. As suspected all along, if a new report can be fully confirmed, a bug in the Windows scheduler is halving 2990WX's performance and this could be fixable via software.

In TechSpot’s benchmarking session, we found that the 2990WX was 35% slower than the 2950X in Adobe Premiere Pro, 30% slower in Handbrake and 25% slower in 7-Zip File Compression. That’s not to say it didn’t have its place, however, as it was 40-50% faster in Corona and Blender rendering tests. In the end, we concluded, “we feel many like us will be scraping their 2990WX plans, and most will be moving to the 2950X instead.”

The tech community almost unanimously concluded that the issue was the memory controller configuration employed by AMD. The 2950X had two 8-core dies and two memory controllers, granting the Infinity Fabric between the dies roughly 50 Gbps throughput with 3200Mhz memory. The 2990WX has four 8-core dies, but the same two memory controllers – essentially halving the speed. It seemed logical that this was the problem, as the performance deficit only showed up in memory sensitive applications.

Wendell at LevelOneTech questioned this explanation, however, because the 2990WX performance was very different in Linux. Looking for a better explanation, he conducted a wide battery of tests using the Indigo bedroom render benchmark. He used two processors; the 2990WX and its EPYC counterpart, the 32-core 7551. He discovered that the vast majority of performance woes were caused by a Windows bug, rather than memory issues.

The main difference between the Threadripper and the EPYC is that the EPYC has four memory controllers, which should alleviate the supposed memory performance problems. With one memory controller per die, the EPYC also can switch between UMA (Uniform Memory Access) and NUMA (Non-Uniform Memory Access).

UMA gives each die greater memory bandwidth but introduces latency because memory requests may not go to the nearest controller. NUMA reduces latency by pairing each die with a controller, but it reduces bandwidth. Note, the 2990WX can only operate in NUMA because there are two dies per controller.

  Threadripper 2990WX EPYC 7551 NUMA EPYC 7551 UMA
Windows 10 1.5 1.3 3.0
Linux 3.5 3.0 3.0
Linux on Windows 10 1.6 N/A N/A

The fact that the 2990WX gets good performance in Linux but bad performance in Windows suggests that it’s not a hardware issue, and the same situation with the 7551 in NUMA reinforces that idea. Equally bad performance in Windows and Linux on Windows (using the Windows Subsystem for Linux to run the Linux version of Indigo on Windows) demonstrates that it isn’t a difference in the two versions of the application that change the performance.

Similarly bad performance between the 2990WX and 7551 on NUMA in Windows shows that it isn’t the Threadripper’s shortage of memory controllers that’s the issue because the 7551 has twice as many. (The 7551 performs slightly worse probably because it’s clocked slower.)

The 7551’s impressive performance in Windows on UMA could suggest that the applications are simply UMA optimized, but the identical performance between UMA and NUMA modes in Linux discredits that theory.

The only thing left then is Windows itself. To check if this was the case Wendell and Jeremy at Bitsum analyzed the Windows internal thread management software tags and discovered that all the threads spawned by the Indigo benchmark had “ideal CPU” tags that would send the process to a particular core.

Normally this would be good as cores closer to the memory controller can achieve better performance, but Windows wasn’t sending the threads there. Instead, it was throwing them around basically randomly. Windows was so busy reorganizing threads pointlessly, that it was expending 50% of the processor’s power just shuffling threads around.

This is why the 7551 could double its performance when switching between UMA and NUMA in Windows: on UMA, Windows assumes that all the cores are equal because they’re all paired up with a memory controller. On NUMA the bug kicks into effect and it goes back to shuffling.

Jeremy at Bitsum has implemented a fix in the CorePrio utility, which you can download now to check out the performance improvement. Tentatively titled the ‘NUMA Dissociator’ the fix essentially creates a phony UMA mode, where Windows no longer cares about sending the threads to an ‘ideal’ core. In Indigo the performance doubles to 3.0, and in 7-Zip compression, the performance jumps an impressive 71%.

Notably, though, the 2990WX is still 22% slower in Windows than Linux in 7-Zip, even with the fix. It’s possible that a properly implemented solution by Microsoft could narrow the gap even further. Considering how lazy Microsoft has been in not even searching for the problem, however, I wouldn’t be too optimistic that one is coming.

Permalink to story.

 
We saw something very similar back when the FX chips emerged to combat the Core chips. In Windows 7, rendering was roughly 12% slower than the same render and configuration on Mac or Linux. So many in the CGI world were touting the Mac as superior of course, along with the higher price tag.

In came Process Lasso, which closed that gap entirely. I was hitting margin-of-error in my tests, showing that the same chip could perform the same on any of the 3 OSes once you bypassed Windows core-scheduling silliness. Process Lasso is like a pro version of Task Manager basically, which optimizes core and thread usage much better. It was kind of a big "Oh wow!" moment on CGTalk when we were able to shut the Mac fanboys up finally. So a system costing less than half of what the Mac Pros cost was just as fast or faster, and even brought some parity between the FX and the Intel chips.

But since most people won't go there, the FX chips were considered weaker and slower in most benchmarks and tests - because they were still using Windows' unoptimized scheduling tech. I had no problem pushing the FX-8350 up to Intel's performance levels on air alone for awhile, until Intel trumped them with newer chip tech a year later or so.
 
We saw something very similar back when the FX chips emerged to combat the Core chips. In Windows 7, rendering was roughly 12% slower than the same render and configuration on Mac or Linux. So many in the CGI world were touting the Mac as superior of course, along with the higher price tag.

In came Process Lasso, which closed that gap entirely. I was hitting margin-of-error in my tests, showing that the same chip could perform the same on any of the 3 OSes once you bypassed Windows core-scheduling silliness. Process Lasso is like a pro version of Task Manager basically, which optimizes core and thread usage much better. It was kind of a big "Oh wow!" moment on CGTalk when we were able to shut the Mac fanboys up finally. So a system costing less than half of what the Mac Pros cost was just as fast or faster, and even brought some parity between the FX and the Intel chips.

But since most people won't go there, the FX chips were considered weaker and slower in most benchmarks and tests - because they were still using Windows' unoptimized scheduling tech. I had no problem pushing the FX-8350 up to Intel's performance levels on air alone for awhile, until Intel trumped them with newer chip tech a year later or so.

Unfortunately AMD is stuck with Windows and it's default scheduler as that's what most people will be using. It's just a shame that Windows appears to have the worst scheduler of any operating system on the market. I guess that's the luxury of having a monopoly.

The only project that is currently trying to make an OS that can run windows apps but on an open platform is ReactOS.
 
Understanding how major corporations have secret alliances with each other and corporate espionage is still very much alive one has to wonder what kind of agreements might exist between Microsoft and other companies (like Intel) to pull off this soft of thing. It doesn't have to cripple a company, just do enough damage to their reputation that it affects future sales. With Microsoft holding such a huge amount of the market, they risk very little to them by participating in such a practice so one has to wonder what, if anything would hold them back?
 
It's funny how all software problems affect AMD hardware and even (NOT) funnier how these problems are NOT AMD's fault.
 
I love the bias here.... even the subtitle to this article says it’s Window’s fault....

What came first, Windows 10 or the Threadripper?

Windows did! It’s therefore on AMD to design chips that work with Windows, not MS’ place to build Windows to work with AMD!

If a software “fix” is necessary, it is AMD’s responsibility to fix it.... that the chip runs faster on Linux than windows is actually damning for AMD... they KNOW that the majority of their users will be on Windows - why didn’t they design their chip accordingly?!?

And the fix that now does exist - why is it designed by a “regular guy”?!? Why wasn’t AMD working on fixing it?!? How were they “beaten” by one person?!?

Makes me rethink any AMD purchases down the line.
 
I love the bias here.... even the subtitle to this article says it’s Window’s fault....

What came first, Windows 10 or the Threadripper?

Windows did! It’s therefore on AMD to design chips that work with Windows, not MS’ place to build Windows to work with AMD!

If a software “fix” is necessary, it is AMD’s responsibility to fix it.... that the chip runs faster on Linux than windows is actually damning for AMD... they KNOW that the majority of their users will be on Windows - why didn’t they design their chip accordingly?!?

And the fix that now does exist - why is it designed by a “regular guy”?!? Why wasn’t AMD working on fixing it?!? How were they “beaten” by one person?!?

Makes me rethink any AMD purchases down the line.
You do not design chips for an OS, it's the other way around. This has always been the case. What normally you try to do is maintain compatibility with the instructions and extensions.

FYI it isn't AMD's job to fix the Windows kernel. All they can do is give the Windows devs information. Microsoft will not give AMD the keys to the kernel of their OS.

What the released utility does is just a patchwork that may or may not work in future Windows releases. MS needs to implement changes to the scheduler itself and you can be pretty sure that AMD has been doing their job to help the ms devs (just like how they worked before on helping to implement other scheduler changes).

As for your concerns about buying future AMD CPUs, why do you even care? Are you going to build a 32 core workstation and use very specific software that get impacted by the bug? No, what you are going to do is buy the best system for the money you have on hand for your specific workload. If even with the scheduler bug AMD is still a better value then why avoid it? O_o And your system could be even better if the bug ever gets fixed.
 
We saw something very similar back when the FX chips emerged to combat the Core chips. In Windows 7, rendering was roughly 12% slower than the same render and configuration on Mac or Linux. So many in the CGI world were touting the Mac as superior of course, along with the higher price tag.

In came Process Lasso, which closed that gap entirely. I was hitting margin-of-error in my tests, showing that the same chip could perform the same on any of the 3 OSes once you bypassed Windows core-scheduling silliness. Process Lasso is like a pro version of Task Manager basically, which optimizes core and thread usage much better. It was kind of a big "Oh wow!" moment on CGTalk when we were able to shut the Mac fanboys up finally. So a system costing less than half of what the Mac Pros cost was just as fast or faster, and even brought some parity between the FX and the Intel chips.

But since most people won't go there, the FX chips were considered weaker and slower in most benchmarks and tests - because they were still using Windows' unoptimized scheduling tech. I had no problem pushing the FX-8350 up to Intel's performance levels on air alone for awhile, until Intel trumped them with newer chip tech a year later or so.

Unfortunately AMD is stuck with Windows and it's default scheduler as that's what most people will be using. It's just a shame that Windows appears to have the worst scheduler of any operating system on the market. I guess that's the luxury of having a monopoly.

The only project that is currently trying to make an OS that can run windows apps but on an open platform is ReactOS.
I wonder how true that really is. By the time you have a workload that benefits from 32 cores, you have moved beyond the ordinary consumer / gaming market where Windows is the default choice. Depending on what you're doing Linux may already be the default choice. For the ones that are up for grabs, I'd think Microsoft would face at least some good reasons to invest effort into not further losing those markets.
 
I love the bias here.... even the subtitle to this article says it’s Window’s fault....

What came first, Windows 10 or the Threadripper?

Windows did! It’s therefore on AMD to design chips that work with Windows, not MS’ place to build Windows to work with AMD!

If a software “fix” is necessary, it is AMD’s responsibility to fix it.... that the chip runs faster on Linux than windows is actually damning for AMD... they KNOW that the majority of their users will be on Windows - why didn’t they design their chip accordingly?!?

And the fix that now does exist - why is it designed by a “regular guy”?!? Why wasn’t AMD working on fixing it?!? How were they “beaten” by one person?!?

Makes me rethink any AMD purchases down the line.
The bug certainly is Windows' fault. To get a bit technical, the reason the 2990WX and even the 2970WX are the first processors to be affected by the bug is that they're the first to employ more than two active dies, with four and three dies respectively.
Intel currently uses a ring bus system to connect all their cores together in their low core count and high core count processors, and to the operating system, it means that all the cores come in one big bunch. This has the advantage of bypassing the memory sensitivity of Ryzen and Threadripper, but it prevents them from manufacturing ultra-high core count (24+) cheaply.
AMD, on the other hand, has been innovative by being the first company to use chiplets (dies) to get more cores/performance without shrinking the manufacturing node, which is becoming increasingly difficult as evidenced by Intel postponing 10nm for several years. Threadripper uses the most basic version of the chiplet principle, where all the components are manufactured together with the same node, but it still lets them annihilate Intel in the prosumer space.
In the future processor manufacturers are expected to massively rely on chiplet designs to improve performance year on year (https://www.techspot.com/news/77397-opinion-chiplets-drive-future-semis.html).
The Windows bug damages performance for all high-core chiplet designs (probably including Qualcomm's upcoming ARM solutions too) and that is very disappointing on Microsoft's behalf because the entire industry is expected to start using chiplet designs similar to AMD. AMD is simply the first company to run across the issue because they're the furthest ahead in that area right now, but if Microsoft doesn't fix the issue soon, it'll damage performance in a lot of processors from multiple companies.
 
I love the bias here.... even the subtitle to this article says it’s Window’s fault....

What came first, Windows 10 or the Threadripper?

Windows did! It’s therefore on AMD to design chips that work with Windows, not MS’ place to build Windows to work with AMD!

If a software “fix” is necessary, it is AMD’s responsibility to fix it.... that the chip runs faster on Linux than windows is actually damning for AMD... they KNOW that the majority of their users will be on Windows - why didn’t they design their chip accordingly?!?

And the fix that now does exist - why is it designed by a “regular guy”?!? Why wasn’t AMD working on fixing it?!? How were they “beaten” by one person?!?

Makes me rethink any AMD purchases down the line.

/facepalm

The article (and wendel) already came to the conclusion that the problem lies with windows itself, not Threadripper or other AMD products.

If you follow the link to the fix itself you'll find this

"Dumping the thread info in real-time revealed the thread ideal processors are doled out differently after a fix is applied to a process, no longer constrained to a single NUMA node, but that should not itself been any problem if the scheduler wasn’t brain-dead. The thread ideal processors are supposed to be just a hint the scheduler uses, not gospel. Best guess is that it is doing a lot of core thrashing."

The problem at hand is that Microsoft's scheduler is not doing it's job, which FYI, is only editable by Microsoft. You are blaming AMD for not having a fix when that option is literally impossible.

Wendel is not a regular guy and the fix is only a bandaid until Microsoft officially fix their own scheduler.

Last, Intel and AMD do not design hardware around specific software. The only hardware that does that are ASICs. If you have to ask why, I'll direct you to purchase a copy of structured computer organization, a 101 on computer architecture. https://www.pearson.com/us/higher-e...puter-Organization-6th-Edition/PGM200985.html
 
I love the bias here.... even the subtitle to this article says it’s Window’s fault....
You appear to have read the article based on some of your comment, but this displayed attitude makes it seem like you didn't read the article, just the title and subtitle and decided to rant about bias first. Or there is a reading comprehension disconnect somewhere. Because if you had read the article, you would have clearly seen that the problem is in Windows. Not bias, just straight reporting of a fact. Immediately assuming it's a bias rather points towards your own potential bias perhaps?

If a software “fix” is necessary, it is AMD’s responsibility to fix it.... that the chip runs faster on Linux than windows is actually damning for AMD... they KNOW that the majority of their users will be on Windows - why didn’t they design their chip accordingly?!?
So they were supposed to gimp their own chip's performance to account for a flaw in Windows? You do realize that is what you are suggesting, right? The chip wasn't designed to "run faster on Linux" as you seem to be implying - Linux just actually uses the hardware correctly. Windows does not.

Look at it this way: Suppose someone created an amazing new engine for a car, and installed it in 2 different brands of car. One car always veered to the right because the wheel alignment was manufactured incorrectly. The other car drove perfectly fine. Are you saying that the engine builder was at fault for the incompetence of the first car's manufacturing?

And the fix that now does exist - why is it designed by a “regular guy”?!? Why wasn’t AMD working on fixing it?!? How were they “beaten” by one person?!?
Not a "regular" guy, if he's finding this kind of issue AND knows how to create a patch for it. Shouldn't the real question be why isn't Microsoft working on their own software to fix the flaw?
 
I love the bias here.... even the subtitle to this article says it’s Window’s fault....

What came first, Windows 10 or the Threadripper?

Windows did! It’s therefore on AMD to design chips that work with Windows, not MS’ place to build Windows to work with AMD!

If a software “fix” is necessary, it is AMD’s responsibility to fix it.... that the chip runs faster on Linux than windows is actually damning for AMD... they KNOW that the majority of their users will be on Windows - why didn’t they design their chip accordingly?!?

And the fix that now does exist - why is it designed by a “regular guy”?!? Why wasn’t AMD working on fixing it?!? How were they “beaten” by one person?!?

Makes me rethink any AMD purchases down the line.

I actually sort of agree with you. You see it a lot here, especially in games. It’s games developers fault if AMD is worse than Intel because they aren’t spending the cash on developing for the tiny minority that have more than 4 cores over relying on IPC. Lol.

But in this case I would say that the fault lies with the reviewers for calling out AMD when the 2990 dropped (although I can see why they did). Windows will need to be updated to work with the latest silicon, I’m don’t think they can just sit on their OS and hope the silicon follows suit.

Although I should point out that Windows could just ignore TR. And if they did AMD would lose out to the booing and hissing of its fans. Windows wouldn’t miss much. It’s because Windows has an effective monopoly. I along with most professionals couldn’t boycott windows even if we wanted to.

Personally I don’t give a dam who’s fault bad or good performance is. That’s for the fanboys to decide. The vast majority of end users will snub a bad product even if the reason it’s bad is not the fault of the manufacturer. It’s still bad, the solution would still be to buy a different product.
 
$100 says INTEL was part of this scheme. One thing I learned when I was a kid from the early 90's is that INTEL will do ANYTHING to keep AMD from taking their position.
 
You do not design chips for an OS, it's the other way around. This has always been the case. What normally you try to do is maintain compatibility with the instructions and extensions..
The problem with Windows is Satya Nadela. That's my story, and I'm sticking to it.

That notwithstanding, both AMD and Intel have already bent over to M$, by stating that all of their CPUs will only be fully compatible with Windows 10. Perhaps it's time they got a little love in return from M$, in the form of an update to allow full performance from the AMD 2990. But keep in mind, AMD chose to sort of cripple the Threadripper offering, by only including two memory controllers instead of four.

IIRC though, I seem to remember Windows patches specifically for AMD CPUs. I suppose you could interpret that as favoring Intel by extension. But with the great disparity of CPUs in service in Intel's favor, it's almost understandable.

My favorite conspiracy theory, is that someone in Samsung's design department, was heavily, "compensated by Apple", to inject the design flaw which caused the Note's batteries to explode. Or, customers were simply clamoring for too much, and might be the same people who precipitated the release of the "foldable", iPad Pro.

Back to topic, you have to include Adobe into the conspiracy as well. They only publish for Windows and Mac, and won't release for Linux, which stops a whole hell of a lot of people from abandoning Windows. Since they're now subscription only, and some sort of activation is required for all former products on optical discs, dating back 10 (?) years at least, I dunno how much help WINE would be.

Although it's nice to see them finally having been mostly put in their place with respect to that damned Flash.

$100 says INTEL was part of this scheme. One thing I learned when I was a kid from the early 90's is that INTEL will do ANYTHING to keep AMD from taking their position.
That's pretty much of a given in the business world. If I had a hundred bucks, I'd put it on the fact that Amazon will pretty much do anything to prevent Walmart from taking their place. C'est la guerre
 
Last edited:
I dont understand. Don't hardware manufacturers share their hardware specs with software firms (ie MS) prior to release so that the newly released hardware is supported by the software from the get go? Surely this is Windows's fault but AMD should also have worked with MS to solve this before the launch. Isn't this how it should have been? Or did AMD not test their chips properly? Now I think this is kind of a two sided liability
 
I dont understand. Don't hardware manufacturers share their hardware specs with software firms (ie MS) prior to release so that the newly released hardware is supported by the software from the get go? Surely this is Windows's fault but AMD should also have worked with MS to solve this before the launch. Isn't this how it should have been? Or did AMD not test their chips properly? Now I think this is kind of a two sided liability
Yes, MS most likely had access to AMD CPU samples well before they were even publicly announced.
 
So this hasn't been asked yet but is this going to cause a re-review after Microsoft releases a fix? I would like a see performance after an official patch personally.
 
I dont understand. Don't hardware manufacturers share their hardware specs with software firms (ie MS) prior to release so that the newly released hardware is supported by the software from the get go? Surely this is Windows's fault but AMD should also have worked with MS to solve this before the launch. Isn't this how it should have been? Or did AMD not test their chips properly? Now I think this is kind of a two sided liability

Microsoft having the specs is one thing. Microsoft actually deciding to test and do something with them is a whole other matter.

AMD's chips work properly. The Windows scheduler that is supposed to "intelligently" push tasks to different cores does not work as it is supposed to (which is explained in the article). To be fair, it's on on AMD to fix Microsoft's mistakes.
 
I thought everyone already knew that Windows performs like **** compared to Linux on the 2990WX? Phoronix released benchmarks on release...
 
$100 says INTEL was part of this scheme. One thing I learned when I was a kid from the early 90's is that INTEL will do ANYTHING to keep AMD from taking their position.
LOL. Intel banned AMD engineers from investigating the causes of poor performance in Windows? AMD devs do not use Linux/BSD/macOS?
 
So this hasn't been asked yet but is this going to cause a re-review after Microsoft releases a fix? I would like a see performance after an official patch personally.
Well, this is a "niche" issue, for gear heads and gamers

Nadella and M$ are trying to pack as much gimmickry an horse poop features into the OS as possible, in order to ram it down as many ordinary people's throats as possible.

Besides the X-Box, the Surface series is the only really successful hardware product M$ has had in years.

Their hardware was, (IMO, of course), crap. Their mice didn't really hold up any better than the turds you can buy at Micro Center for $3.99. (Of course I have no inhibitions about drinking, (and spilling) coffee, coke, or chocolate milk, or eating English muffins dripping with butter and honey on top of my peripherals either).

None of that withstanding, (or for that matter on topic), Steve Ballmer tried to counter the mobile device explosion in numbers with Windows 8, and we all know how well that turned out

Maybe Intel is bribing them to not fix the issue. Who knows?

From what I could gather from the article though, AMD cheaped out on the memory controllers in the "Threadripper" models, but not in their server chips.

Ideally, I'm sure they felt that would be a great sales talking point, as they tried to gain traction in the server market. It just backfired big time. Couple that with the complaining being done about Windows 10, "breaking things after every update", and both parties are at fault.
 
Back