Leaked AMD documents offer new Zen 3 details

onetheycallEric

Posts: 225   +47
Staff
Highly anticipated: While there's a Zen 3 announcement inbound, we're getting a few key details ahead of the October announcement thanks to leaked developer documents. While this leak doesn't paint the whole picture for what Zen 3 is going to look like, it suggests it should be another strong CPU series for AMD with lots of generational improvements.

Allegedly confidential documents have been leaked by Twitter user CyberPunkCat that seem to offer details on changes to Zen 3 that will come with the Ryzen 4000 desktop series, code named "Vermeer."

We know that AMD is taking the wraps off of Zen 3 in October, and the details found in the documents reiterate some things we already know, while offering bits of new information. The document appears to be a Processor Programming Reference (PPR) for AMD's Family 19h, Model 21h B0, which would be Zen 3. Previous Zen+ and Zen 2 architectures belong to AMD's Family 17h, with various models and revisions.

AMD usually makes this type of documentation available to developers after launch, so it isn't exactly privileged information. Furthermore, this kind of developer documents tend to be easily circulated -- just ask Intel.

The most notable changes to Zen 3 appear to be happening in the CCD/CCX configuration. Zen 3 will continue to make use of a MCM (multi-chip module), or chiplet design, that will use two CCDs and one I/O die. There will only be one CCX per CCD, and this CCX will consist of eight cores capable of running in either single-thread mode (1T) or two-thread SMT mode (2T). So, that's 16 total threads per CCX.

This may suggest that Zen 3 parts will top out at 16 cores, in the same fashion as the Ryzen 9 3950X. Though, we'll have to wait and see as AMD may well have some tricks up its sleeve.

Furthermore, AMD is reworking its cache subsystem. There will be a total of 32MB of L3 cache (as opposed to 16MB per CCX with Zen 2) shared across all eight cores in the CCX. While Zen 2 offered 32MB of L3 cache per CCD, it had to be shared between two separate complexes. There's also 512KB of L2 cache per core within the CCX, for a total of 4MB of L2 cache per CCD.

Interestingly, AMD is also beefing up the Scalable Data Fabric (SDF), which is the communication backbone of Infinity Fabric responsible for the transport of data and coherency between cores, memory controllers, and other I/O elements. The documents note that the SDF can now handle 512GB per DRAM channel. It looks like there could also be some minor changes to the Scalable Control Fabric (SCF), which is the other half of the Infinity Fabric that mainly handles signaling.

Elsewhere, Zen 3 looks to be bulking up the memory interface with two unified memory controllers (UMC), with each supporting one DRAM channel and each channel supporting two DIMMs. There will also be support for DDR4-3200, which was natively supported with Zen 2. It looks like Zen 3 will mostly retain the same features and connectivity for the Fusion Controller Hub (FCH) that were present in Zen 2.

In addition to some generational clock speed bumps, it looks like Zen 3 will further polish AMD's MCM approach, focusing on improving coherence and latency under the hood. We fully expect a measurable IPC improvement over Zen 2 parts as well.

Permalink to story.

 
The switch to a 'native' 8 core design with unified cache is an easy design win. Eliminates much of the inherent L3 cache latency disadvantage compared to Intel's parts. You can look at decent percentage point IPC improvements on that change alone, possibly fairly significant in terms of gaming performance.

Obviously the 16 core parts will still have that being two CCXs but it'll surely be reduced considerably there also. Better memory bandwidth and performance is another few percentage points. Ryzen is on for another decent IPC gain, potentially couple that with small clock gains and you're looking at another nice step in performance over the existing 3000 series.

Intel need to pull something good out the bag to stop the bleeding in the desktop market. Over to you Rocket Lake.
 
8c 16t on each separate CCX would make it possible to devote one CCX purely to gaming, which would be more than enough as things stand right now, and effectively have another 8c 16t CPU dedicated to running something else entirely at the same time, without any need for cross CCX communication to slow things down.

..Nice.
 
So it doesn't really seem to be a "new architecture", but the final evolution of the existing one.
Very interesting, now I'm eager to discover how this will translate in real world performance.
 
The switch to a 'native' 8 core design with unified cache is an easy design win. Eliminates much of the inherent L3 cache latency disadvantage compared to Intel's parts. You can look at decent percentage point IPC improvements on that change alone, possibly fairly significant in terms of gaming performance.

Obviously the 16 core parts will still have that being two CCXs but it'll surely be reduced considerably there also. Better memory bandwidth and performance is another few percentage points. Ryzen is on for another decent IPC gain, potentially couple that with small clock gains and you're looking at another nice step in performance over the existing 3000 series.

Intel need to pull something good out the bag to stop the bleeding in the desktop market. Over to you Rocket Lake.
well, if there is something where Intel is good, is in latency/RAM management...

AMD was lagging behind in that feld, and now Zen 3 will most probably fill the gap.
 
well, if there is something where Intel is good, is in latency/RAM management...

AMD was lagging behind in that feld, and now Zen 3 will most probably fill the gap.

I like AMD's processor options and we have 2 of them in the house, but frankly I don't believe Zen 3 will eliminate that latency gap with Intel. Assuming that continued latency gap, AMD's cache structure could still close the few performances gaps with Intel in the areas where they lag a bit (single threaded tasks like PS, Excel, some emulation) but I still bet that the other latency area, HiRR gaming, will still be a slight Intel lead.

However it's in everybody's interest for AMD to take the lead here, so I hope they can do it.
 
I like AMD's processor options and we have 2 of them in the house, but frankly I don't believe Zen 3 will eliminate that latency gap with Intel. Assuming that continued latency gap, AMD's cache structure could still close the few performances gaps with Intel in the areas where they lag a bit (single threaded tasks like PS, Excel, some emulation) but I still bet that the other latency area, HiRR gaming, will still be a slight Intel lead.

However it's in everybody's interest for AMD to take the lead here, so I hope they can do it.

So far every AMD Ryzen CPU has been cut-down version of server CPU. I expect that trend to continue and so memory latency remains problem.

AMD has no IPC or cache problems since Zen2 vs Intel. IPC is better and caches have same speed and L2 cache is bigger.

It's just many already obsolete software are optimized for Intel architectures (perhaps even on compiler level) and on those software AMD looks "slow". Another explanation is memory latency but as seen many times before, like with Quake 3 where Intel was supposed to be faster because more memory bandwidth or lower latency, Intel still stayed ahead even when AMD had superior bandwidth and latency. Game just had crappy optimizations for AMD CPU's but when game runs over 400 FPS, who cares.
 
So far every AMD Ryzen CPU has been cut-down version of server CPU. I expect that trend to continue and so memory latency remains problem.

AMD has no IPC or cache problems since Zen2 vs Intel. IPC is better and caches have same speed and L2 cache is bigger.

It's just many already obsolete software are optimized for Intel architectures (perhaps even on compiler level) and on those software AMD looks "slow". Another explanation is memory latency but as seen many times before, like with Quake 3 where Intel was supposed to be faster because more memory bandwidth or lower latency, Intel still stayed ahead even when AMD had superior bandwidth and latency. Game just had crappy optimizations for AMD CPU's but when game runs over 400 FPS, who cares.
OR, it's the well documented latency caused by infinity fabric linking the CCX's together in each CCD, hence why the 3300x, which has 4 cores on a cingle CCX, so dramatically outperforms the 3100 in games and can tangle with the likes of the 3700x in gaming benchmarks.

Zen 3 will fix this issue by eliminating the CCX altogether.

Not everything is due to lazy developers or intel shenanigans. Given zen outperforms intel in absolutely everything that isnt games or photoshop, I dont even know where this "compilers are holding zen back" arguments are coming from.
 
OR, it's the well documented latency caused by infinity fabric linking the CCX's together in each CCD, hence why the 3300x, which has 4 cores on a cingle CCX, so dramatically outperforms the 3100 in games and can tangle with the likes of the 3700x in gaming benchmarks.

Zen 3 will fix this issue by eliminating the CCX altogether.

Not everything is due to lazy developers or intel shenanigans. Given zen outperforms intel in absolutely everything that isnt games or photoshop, I dont even know where this "compilers are holding zen back" arguments are coming from.
Every single hardreset comment about AMD unfortunately is biased. It never is their fault. Every time It is someone else fault.
By the way you are right, it is a well known design “limit” of their MCM approach and it’s the reason they designed Zen 3 in a different way, in order to mitigate the “issue “ (CCX will still be there, but in an 8 core configuration instead of 4+4).
I don’t know if there will still be a gap compared to Intel solutions but I think they will close the gap enough to claim zen 3 to be the faster CPU even for gaming... until Rocket Lake maybe.
 
OR, it's the well documented latency caused by infinity fabric linking the CCX's together in each CCD, hence why the 3300x, which has 4 cores on a cingle CCX, so dramatically outperforms the 3100 in games and can tangle with the likes of the 3700x in gaming benchmarks.

That's just bad programming. When using single threaded software, there should not be any issues with latencies involved on CCX/IF design. In case there is, that is software, not hardware problem.

Zen 3 will fix this issue by eliminating the CCX altogether.

We still don't know what is exact core arrangement. It could be like this (1-8 = cores):

1234-L3 cache-5678

Or it could be like:

12345678-L3 cache

First case, there is still CCX and all penalties that come from CCX design except L3 cache penalty between two cache slices. It's just CCX and CCD are exactly same things now.

Remembering AMD said previously CCX is group of cores(!) connected to same L3 cache, it essentially should mean lower arrangement is used. First leaked photo long time ago said it's above arrangement. We'll see.

Not everything is due to lazy developers or intel shenanigans. Given zen outperforms intel in absolutely everything that isnt games or photoshop, I dont even know where this "compilers are holding zen back" arguments are coming from.

It comes from the fact that some software IS clearly trying to do their own core scheduling and if that was done for Intel CPU's, it works badly with Ryzen.

Like this: https://community.amd.com/community...ance-updates-for-ryzen-customers?sf92082303=1

Rise of the Tomb Raider splits rendering tasks to run on different threads,” Crystal Dynamics said. “By tuning the size of those tasks – breaking some up, allowing multicore CPUs to contribute in more cases, and combining some others, to reduce overheads in the scheduler – the game can more efficiently exploit extra threads on the host CPU.

So clearly task scheduling was made for something else than Ryzen and just adjusting it for Ryzen give huge performance improvement.

Another example on same pagem, Ryzern opmitizations gave 204 772%, yes that is 204 thousand percent, improvement.

Not necessarily just compiler but compiler and other optimizations too.

Every single hardreset comment about AMD unfortunately is biased. It never is their fault. Every time It is someone else fault.
By the way you are right, it is a well known design “limit” of their MCM approach and it’s the reason they designed Zen 3 in a different way, in order to mitigate the “issue “ (CCX will still be there, but in an 8 core configuration instead of 4+4).
I don’t know if there will still be a gap compared to Intel solutions but I think they will close the gap enough to claim zen 3 to be the faster CPU even for gaming... until Rocket Lake maybe.

Because 99,9% of times it is someone's else fault. Basically there is two ways to handle thread scheduling on games. 1. let Windows handle it and 2. do it yourself. When choice 1 is used, when Microsoft updates Windows, there is no real need to update game as Microsoft does the job (or not). On choice 2, game developer should update game every time new CPU is launched. Very rarely they do so. If they don't, it's entirely their fault if game runs poorly on new CPU design, not AMD's. Very simple.

We still don't Know if there is 4+4 or 8+0 arrangement on cores. That is because AMD now seems to consider 4+4 to be same CCX if those cores are connected to same L3 cache. We already know they are. They probably skipped previous requirement that cores on CCX must be on same "group". And 8+0 is still not confirmed, it may well be 4+4. We'll see that.
 
That's just bad programming. When using single threaded software, there should not be any issues with latencies involved on CCX/IF design. In case there is, that is software, not hardware

It doesn’t work like that.
Even single threaded software (is there any ?) doesn’t really use ONE SINGLE CORE to fulfill the task, for thermal reasons.
So CCX/IF latency still is relevant.

But AS USUAL with you is never AMD’s fault: it is bad programming.
Unless we are speaking about AMD drivers: in that case isn’t bad programming, it is hardware fault (even if the graphic card was replaced).
We know: AMD can’t be wrong. Never.
 
It doesn’t work like that.
Even single threaded software (is there any ?) doesn’t really use ONE SINGLE CORE to fulfill the task, for thermal reasons.
So CCX/IF latency still is relevant.

Of course there is. And when looking for maximum performance on single threaded task, you simply lock that task into single core. That was very common practice when looking for world records on software like PiFast or SuperPi.

So when looking for maximum single thread performance, process should be locked on one single core and also it should not use SMT. Anything else just slows it down.

But AS USUAL with you is never AMD’s fault: it is bad programming.
Unless we are speaking about AMD drivers: in that case isn’t bad programming, it is hardware fault (even if the graphic card was replaced).
We know: AMD can’t be wrong. Never.

Of course it's bad programming. There is nothing really wrong with AMD CPU design, it's just program doesn't understand it. Basically same if you say there was something badly wrong with Intel's first hyper threading CPU, because Windows 2000 never supported hyper threading *nerd*

Who said that replacement card could also not be defective? It's very common practice to replace "defective" card with another "defective" card and then hope that "defective" card works on different user. That is: replacement parts are not always new.
 
Back