Testing 3rd-Gen Ryzen DDR4 Memory Performance and Scaling

treetops · Aug 10, 2019

I was just skimming over that calculator a few days ago, nice comparison, thanks.

msroadkill612 · Aug 10, 2019

Evernessince said:
pcnthuziast said:

I didn't find my 32gb set of 2666 to be pointless or a poor value at the time of purchase and iirc it was nearly 50 less than the 3k kit. In reality I don't need any upgrades atm, but have been feeling the urge and my cpu is the oldest part in my system. If I was going to upgrade I would spend 400 and not a penny more. Going AMD would necessitate new ram imho and intel would not.

Click to expand...

https://pcpartpicker.com/list/ydMzGG

vs

https://pcpartpicker.com/list/8nnPtg

- Stock to stock these systems will have margin of error gaming performance of each other even with a 2080 Ti.
-The AMD system is still cheaper even with the RAM and that's considering I gave the 9700K the benefit of the doubt with one of the cheapest CPU cooler and Z390 motherboard
- The 9700K will require more investment to OC then the build linked. A 212 Evo can not handle a 9700K overclocked.
- The comparison becomes worse when you consider that you can resell your current DDR4 and the Ryzen system becomes even better value. In fact it'll get darn close to that magic $400 figure you spoke of.

pcnthuziast said:

A 9700k and a board can indeed be had for 400 and what AMD currently offers for that amount isn't as good for gaming. I'm sure you'll try to prove me wrong and if you can show me a better deal for 400, that would certainly be useful.

Click to expand...

No, a 9700K and board can not be had for $400 or less retail. The cheapest motherboard is a MSI-B360M Micro ATX and even that comes out to $426.89 with the 9700K. Not that you can seriously recommend any of the B360 motherboards as they all will limit the processor to 95W, which will cripple gaming performance of the 9700K and they disable overclocking. At that point the 3700X is better. In fact you can go cheaper on the 3700X's motherboard as it sips power in comparison to the 9700K. It does not require as beefy a VRM and AMD does not charge extra for overclocking. In fact you can't even put your RAM past 2666 on a B360 motherboard. Not that the AMD board I included is cheap by any measure, it's one of the top B450 motherboards and can handle a 3900X and perhaps even a 3950X.

That's all without mentioning that the 3700X massively outperforms the 9700K in multi-threaded workloads and has higher IPC in productivity applications. It will also surely last longer due to having double the threads and the platform upgradability. Not to mention the lower power consumption and lack of security holes.

How about a source for that $400 claim you made earlier eh? From everything I'm seeing, there are a lot more points on AMD's side.

Ta for taking the time to blow off his BS. AFAIK, the 400 series am4 mobos have a precious 4 extra pcie 3 lanes, allowing what I like to call a "storage co-processor" - a true ~4GB/s direct linked nvme w/ ~500,000 iops, in addition to ones normal 16 lane gpu.

I concede that until x570, intel did seem to have a better x4 pcie 3 lane chipset, but it was just a chipset - a laggy lane multiplexor, which quickly became saturated if you actually used those features - like dual (gimped) nvme ports. I would take the extra true lanes from amd thanks.

w/ x570 of course, the chipset is the equal of 8 pcie 3 lanes, & intel is yet again, left completely in the dust.

msroadkill612 · Aug 10, 2019

AFAICT, the conclusion should really be:

There is a substantial measurable difference in raw performance (aida e.g), but the apps we tested made no use of it.

It beats me why you go on for pages of tests on games, simply to show ram isnt the bottle neck for games as we now know them, which blind freddy knows.

with the advent of a pcie 4 32GB/s gpu interconnect, I think game coders may begin to have a re-think about the utility of system ram as a gpu cache supplement.

neeyik · Aug 10, 2019

msroadkill612 said:
with the advent of a pcie 4 32GB/s gpu interconnect, I think game coders may begin to have a re-think about the utility of system ram as a gpu cache supplement.

It would be desperately slow as GPU cache - even on a PCI Express 6.0 interface, with its 126 GiB/s of bandwidth, it would be like having the memory performance of a GeForce GTX 560 Ti. Plus DDR4 just isn't as good for graphics applications as GDDR6 is.

m3tavision · Aug 10, 2019

Wow.

Reading the interwebs, it seems this scaling review has sparked/spawned a LOT of spin-off discussions. Go job Steve.

m3tavision · Aug 10, 2019

msroadkill612 said:
AFAICT, the conclusion should really be:

There is a substantial measurable difference in raw performance (aida e.g), but the apps we tested made no use of it.

It beats me why you go on for pages of tests on games, simply to show ram isnt the bottle neck for games as we now know them, which blind freddy knows.

with the advent of a pcie 4 32GB/s gpu interconnect, I think game coders may begin to have a re-think about the utility of system ram as a gpu cache supplement.

I am still wondering at what level (of RDNA's architecture) does the shader cache exist... and if it can be extended off-chip using Infinity Fabric2.0 to another ringbus within another GPU cluster, etc.

Very wide 4k GPU.

neeyik · Aug 10, 2019

m3tavision said:
I am still wondering at what level (of RDNA's architecture) does the shader cache exist... and if it can be extended off-chip using Infinity Fabric2.0 to another ringbus within another GPU cluster, etc.

Very wide 4k GPU.

Depends in what you mean by shader cache. The L2 cache in RDNA connects to everything else via an Infinity Fabric, but as of yet, there’s no onboard system to connect to another Fabric system. That means all transactions would have to be done via the PCIe interface, which is just too slow to be viable for grouping caches.

m3tavision · Aug 10, 2019

neeyik said:
Depends in what you mean by shader cache. The L2 cache in RDNA connects to everything else via an Infinity Fabric, but as of yet, there’s no onboard system to connect to another Fabric system. That means all transactions would have to be done via the PCIe interface, which is just too slow to be viable for grouping caches.

Infinity fabric can go chiplet to chiplet.

neeyik · Aug 11, 2019

It’s just an interconnect system to it go from anything to anything. The point about it is that there is, currently, no ‘NV Link’ style system in RDNA - I.e. there’s no way to connect two graphics cards via an IF system; it can only be done via PCIe, which is far too slow for the intended cache grouping you suggested.

But let’s say AMD did add in an IF system that allowed you to connect 2 graphics cards together so that the L2 caches were grouped. First big problem is that the L2 cache would need a serious redesign to allow this, as it currently only provides access to the IF system, the L1 caches, and the control hub. Secondly, the cache-to-cache latency would be somewhat poor, possibly to the point of negating any benefit to pooling the caches.

You could go down the route of have two chips on one board to mitigate this slightly but as Navi 10 is so small, it would be easier to simply have a larger chip with more cache.

msroadkill612 · Aug 11, 2019

neeyik said:
It would be desperately slow as GPU cache - even on a PCI Express 6.0 interface, with its 126 GiB/s of bandwidth, it would be like having the memory performance of a GeForce GTX 560 Ti. Plus DDR4 just isn't as good for graphics applications as GDDR6 is.

Except I didnt say "as gpu cache". I said as a supplement to gpu cache.

L3 cache would be "desperately slow" as L1 cache too, but they team nicely.

It has always been an option to include some system ram in the cache pool, but most game code's DNA harks back to when that option was a laughable 8GB/s on pcie 2 x16.

32GB/s is still slow as u say, but it is quite another level.

It is the ~same bandwidth as that which APUs enjoy, & they game pretty well SOLELY on system ram.

Badelhas · Aug 11, 2019

So for the 3600 a cheap 16gb of Ddr4 3600 cl16 should be plenty, correct?

neeyik · Aug 11, 2019

msroadkill612 said:
Except I didnt say "as gpu cache". I said as a supplement to gpu cache.

Apologies for that.

L3 cache would be "desperately slow" as L1 cache too, but they team nicely.

Not sure what L3 cache you're referring to here - if you're referring to CPU L3 cache, it could hardly be classed as slow given that it runs at local clock rate.

It has always been an option to include some system ram in the cache pool, but most game code's DNA harks back to when that option was a laughable 8GB/s on pcie 2 x16.

32GB/s is still slow as u say, but it is quite another level.

It is the ~same bandwidth as that which APUs enjoy, & they game pretty well SOLELY on system ram.

APUs using system RAM for the GPU local memory works because the system memory isn't treated as cache nor acts as a supplement; it works no different to, say, the GDDR6 found on a graphics card. The difference is that the latencies are significantly higher and it's only dual channel at best; unlike, for example, an RX 5700 XT which is 8 channel. And yes, they game well enough, but at resolutions 1080p or less and with reduced graphics settings - partly because the GPU is scaled right down in terms of unified shader cores, texture units, and ROPs, but also because the local memory system actually isn't local. System memory has to cope with other systems making demands on it. If an APU had something like a full Vega 20 chip in it, the performance would be hugely constrained by using the system memory, to the point that all those extra CUs, ROPs, etc would be wasted.

Let's go back to the original statement:

with the advent of a pcie 4 32GB/s gpu interconnect, I think game coders may begin to have a re-think about the utility of system ram as a gpu cache supplement.

Now by 'GPU cache supplement', do you mean providing additional L3/L4-type cache to the GPU or just an additional storage area for the GPU to use? If it was the former, then the bandwidth and latency issue just doesn't make it sensible to do so - for example, the L2 cache in a Radeon RX 5700 XT has a bandwidth of 512 B per clock cycle to the internal IF interconnect system. At a core clock of 1600 MHz, that's 800 GiB/s of bandwidth. The GDDR6 is about half that value to the IF system. PCIe 4.0 is just nowhere near this; even PCIe 6.0 is still too short.

On the other hand, as a general supplementary data store, there's definitely some merit in using it, but it's mitigated by the fact that $350 dollar graphics cards have as much local memory as many PCs do for system memory.

pcnthuziast · Aug 12, 2019

I spend my money how I want and some very respected content creators and tech enthusiasts agree that for my particular use case and budget, the 9700k is exactly the right choice for me. I never made any sweeping claims, but somehow people seem to have construed that. I didn't for example say that the 9700k is "better" in any way, than the 3700x. I have said things here in the forums about never being "impressed" by amd products, but I am rarely impressed in life. I have always respected amd and made objective decisions when I spent my money. Timing is a big factor and I also haven't built umpteen systems.

Didn't even read much after my last post and stepped away from the forum for a couple days because I feel people (perhaps me too at times) take all this too seriously. It's not serious for me, just a fun hobby!

Testing 3rd-Gen Ryzen DDR4 Memory Performance and Scaling

treetops

Posts: 3,064 +785

msroadkill612

Posts: 116 +43

msroadkill612

Posts: 116 +43

neeyik

Posts: 2,963 +3,645

m3tavision

Posts: 1,733 +1,510

m3tavision

Posts: 1,733 +1,510

neeyik

Posts: 2,963 +3,645

m3tavision

Posts: 1,733 +1,510

neeyik

Posts: 2,963 +3,645

msroadkill612

Posts: 116 +43

Badelhas

Posts: 140 +110

neeyik

Posts: 2,963 +3,645

pcnthuziast

Posts: 1,595 +1,409

Similar threads

Latest posts