AMD Navi vs. Nvidia Turing: An Architecture Comparison

I do not see it a good investment to be buying a video card later this year with HDMI 2.0 and DisplayPort 1.4, when HDMI 2.1 has been released, and DisplayPort 2.0 is almost here.

Looks like video card manufacturers want to suck you in for this one, so next year they will just update the ports, and get your money the second time. They won't suck me into this.
 
I think the next Radeon Chips 5800 Series will feature such connections. Since they will be pushing larger amounts of pixels.

The 5700XT really doesn't do 4k above 60Hz so why spend the chip resources. FreeSync2 gets around most of the issues of bandwidth, but HDMI 2.1 should be supported in more powerful graphic cards.
 
I do not see it a good investment to be buying a video card later this year with HDMI 2.0 and DisplayPort 1.4, when HDMI 2.1 has been released, and DisplayPort 2.0 is almost here.

Looks like video card manufacturers want to suck you in for this one, so next year they will just update the ports, and get your money the second time. They won't suck me into this.

Unless you need the high refresh rate, HDR modes or even buy monitors or TVs that actually support those features or that can be playable with current GPUs, your point is null.
 
Nvidia / Samsung / Ampere

That's all I needed to hear.
Nvidia's gonna kill the game - again.
It's possible that, like with Intel and their 10nm node, Nvidia will have trouble getting high clocks. It might be one of the reasons why they haven't talked much about it and why they'll, "presumably", focus on power efficiency. This should also mean that they'll be able to add more RT and Tensor cores into the GPU to make them more viable.

We'll probably not see the same jump in performance like with Pascal (vs Maxwell) with their first gen GPUs on 7nm. I fully expect them to just refine Turing. I'll be happy if they manage to get 15-20% by just adding more cores even though they'll prolly not increase the clocks. The increased complexity of Nvidia's GPUs makes the transition more tricky.

Hopefully AMD manages to get something out for the high end market by the end of the year or in early 2020 and TSMC's 7nm+ will not be delayed. AMD needs to introduce ray-tracing together with the 7nm+ node in some shape or form (late 2020 or 2021?). I think Nvidia's 2nd gen RT cores should finally be good enough to get more devs to use this feature (thanks 1st gen beta testers :D)
 
It's possible that, like with Intel and their 10nm node, Nvidia will have trouble getting high clocks. It might be one of the reasons why they haven't talked much about it and why they'll, "presumably", focus on power efficiency. This should also mean that they'll be able to add more RT and Tensor cores into the GPU to make them more viable.

We'll probably not see the same jump in performance like with Pascal (vs Maxwell) with their first gen GPUs on 7nm. I fully expect them to just refine Turing. I'll be happy if they manage to get 15-20% by just adding more cores even though they'll prolly not increase the clocks. The increased complexity of Nvidia's GPUs makes the transition more tricky.

Hopefully AMD manages to get something out for the high end market by the end of the year or in early 2020 and TSMC's 7nm+ will not be delayed. AMD needs to introduce ray-tracing together with the 7nm+ node in some shape or form (late 2020 or 2021?). I think Nvidia's 2nd gen RT cores should finally be good enough to get more devs to use this feature (thanks 1st gen beta testers :D)

Getting to play Cyberpunk 2077 with all the RT will be enough to justify my 2080 Ti purchase :D, just like I bought the Titan X Maxwell when Witcher 3 came out (GTX 980 was shuttering like hell).
Anyways Turing will be stronger than ever with the releases of upcoming Unreal Engine 4 games:

_Borderlands 3 (Sep 2019)
_The Outer Worlds (Sep 2019)
_Shenmue 3 (2019)
_Star Wars Jedi: Fallen Order (2019)
_Final Fantasy VII Remake (2020)
_Outriders (2020)
_System Shock (2020)
_Vampire: The Masquerade – Bloodlines 2 (2020)
_S.T.A.L.K.E.R 2 (2021)
...
And a whole bunch of other games but these are the ones I'm interested in. Also upcoming RTX games include Control (2019), MechWarrior 5: Mercenaries (2019), Cyberbunk 2077 (2020).
 
Last edited:
Nice article! There are a few points that weren't entirely accurate. Navi is not a vector processing architecture. With GCN AMD moved to scalar processing just like Nvidia and that continues with Navi.

The article also lists the 2070 super as having 48 SMs and being able to track 6144 threads. It only has 40 SMs and can track 40,960 threads across the chip (same as 5700xt). A Turing SM and Navi CU can both track 1024 threads at a time and execute 64 FP instructions per clock so they're perfectly matched for FP though Turing has a slight advantage with the concurrent INT pipeline.
 
Nice article! There are a few points that weren't entirely accurate. Navi is not a vector processing architecture. With GCN AMD moved to scalar processing just like Nvidia and that continues with Navi.
Arguably, they're both scalar-vector architectures, but AMD themselves class the CUs as vector processors:

https://www.techpowerup.com/gpu-specs/docs/amd-gcn1-architecture.pdf
https://developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf

The vector ALUs operate on one work-item element (pixel, vertex, etc) but with multiple values per work-item, hence why they are more VGPRs per vector ALU compared to SGPRs for each scalar ALU.

Also, plenty of universities use GPUs as example as vector processors, for example:

https://www.archive.ece.cmu.edu/~ec...seth-740-fall13-module5.1-simd-vector-gpu.pdf
http://courses.csail.mit.edu/6.888/spring13/lectures/L14-gpus.pdf
http://twins.ee.nctu.edu.tw/courses/ca_13/lecture/CA_lec08-chpater_4-vector_processing.pdf

Of course, Nvidia class their CUDA cores as scalar and although one could argue that they function just like AMD's CUs do, Nvidia are very explicit that the CUDA cores operate on one data value, not values.

This strikes me as being more a case of architectural semantics but it's perhaps a valid argument - one just needs to convince AMD and Nvidia though.

The article also lists the 2070 super as having 48 SMs and being able to track 6144 threads. It only has 40 SMs and can track 40,960 threads across the chip (same as 5700xt).
Good catch about the SMs and the thread count; I'd misread the CUDA documentation - I'll amend the article now.

Edit: Interestingly, AMD's documents contradict themselves slightly - in one of them, it says "up to 20 waves per SIMD32" but in another it says "16 waves per SIMD32." Typo, perhaps?
 
Last edited:
The biggest problem is that game developers are bribed by Nvidia. So now AMD has to bribe them too. Resources are spent on corruption, instead of making chips better.

I mean, just think about this: Everyone knows that AMD chips are better for cryptomining. Which proves that AMD has faster hardware. It's just that cryptomining software is made to use it optimally (because you earn more money that way). While games are made to run faster on whoever bribed the developers more. And that's how Nvidia wins. If Nvidia was really faster, CUDA units would outperform AMD Stream processors in cryptomining too.

Which didn't happen. CUDA was obviously much slower in all the compute benchmarks. Since game calculations are nothing else than massive parallel computing, but Nvidia is somehow faster in that, the only way it can happen is that code is specially made to run slower on AMD. In other words, bribery and corruption. That's why I don't support Nvidia. I don't want to support criminal behavior, even if that would bring me a few FPS more. In criminally written games.

That's why I don't buy those games either. If I'm tempted to play such games, I'll get myself pirated versions, so I don't finance the corrupt game developers. Pirates seem to be more honest than them.
 
The biggest problem is that game developers are bribed by Nvidia. So now AMD has to bribe them too. Resources are spent on corruption, instead of making chips better.

I mean, just think about this: Everyone knows that AMD chips are better for cryptomining. Which proves that AMD has faster hardware. It's just that cryptomining software is made to use it optimally (because you earn more money that way). While games are made to run faster on whoever bribed the developers more. And that's how Nvidia wins. If Nvidia was really faster, CUDA units would outperform AMD Stream processors in cryptomining too.

Which didn't happen. CUDA was obviously much slower in all the compute benchmarks. Since game calculations are nothing else than massive parallel computing, but Nvidia is somehow faster in that, the only way it can happen is that code is specially made to run slower on AMD. In other words, bribery and corruption. That's why I don't support Nvidia. I don't want to support criminal behavior, even if that would bring me a few FPS more. In criminally written games.

That's why I don't buy those games either. If I'm tempted to play such games, I'll get myself pirated versions, so I don't finance the corrupt game developers. Pirates seem to be more honest than them.

In other words, you are clueless.

Nvidia gpus 1070ti/1080ti were actually more profitable and had higher return of investment due to having lower power consumption and being more versatile. Raw power isn't everything, AMD could say a lot about it, their cards always having more TFlops than Nvidia, yet being slower in gaming.

People were buying AMD cards first because of being cheaper, then the situation changed.
 
So it is official finally, Nvidia architecture is better despite being on the older node and having like half of the chip wasted for raytracing.

Well 1660 Ti (TU 116) on 7nm would cannibalize the entire Navi and almost all the current Turing lineups. At +25% perf uplift 7nm TU 116 should be on par with 5700 while consuming just 120W. However Nvidia ain't gonna shoot themselves in the foot so it might be while before we see a die shrunk TU 116.
 
https://devblogs.nvidia.com/nvidia-turing-architecture-in-depth/
Cite:
First, the Turing SM adds a new independent integer datapath that can execute instructions concurrently with the floating-point math datapath. In previous generations, executing these instructions would have blocked floating-point instructions from issuing.

Effectively this was changed in Volta first.

A full Kepler GK110 (780Ti, Titan Black) had 5 GPC, each holding 3 SMX, each 192 Cores. 2880 Cores
https://tpucdn.com/gpu-specs/images/g/136-block-diagram.jpg

A full Kepler GK104 (680, 770) had 4 GPC, each holding 2 SMX, each 192 Cores. 1536 Cores
https://tpucdn.com/gpu-specs/images/g/108-block-diagram.jpg

Further the TPC moniker has been used internally for a long time, but for example in Maxwell a TPC was the same with a SMM.

The split came with the Tesla GP100, earlier than Volta.
https://www.guru3d.com/news-story/n...ecap-full-gpu-has-3840-shader-processors.html
 
Last edited:
Vegas scalar register (SGPR) is only 3.2KB each, from Hawaii until Polaris it was 4KB
and HD7870 had 8KB each.
 
In other words, you are clueless.

Nvidia gpus 1070ti/1080ti were actually more profitable and had higher return of investment due to having lower power consumption and being more versatile. Raw power isn't everything, AMD could say a lot about it, their cards always having more TFlops than Nvidia, yet being slower in gaming.

People were buying AMD cards first because of being cheaper, then the situation changed.


As we now know, nobody cared back then about power consumption, did they? And it seems that efficiency is important to you, so now, or in the future RDNA should be right up your ally.

But Maxxi, You can't protect Nvidia anymore, Jensen can't control all the channels. The truth is out. Major discount incoming for RTX cards soon.

AMD has a class act going and it seems the entire gaming industry is on-board.
 
As we now know, nobody cared back then about power consumption, did they? And it seems that efficiency is important to you, so now, or in the future RDNA should be right up your ally.

But Maxxi, You can't protect Nvidia anymore, Jensen can't control all the channels. The truth is out. Major discount incoming for RTX cards soon.

AMD has a class act going and it seems the entire gaming industry is on-board.


AMD GCN has been in consoles for years, yet NVIDIA IS FASTER. So quit your tales, no one is interested in. All of the RX series including VEGA are using GCN architecture.

Turing chip is on an older node, 2x larger, half of the die is dedicated for raytracing, yet it consumes as much power as 7nm underperforming RDNA and offers more performance.

So yeah, when comes to efficiency, NVIDIA is more efficient.

I am sorry, that AMD got you again, I see their marketing worked. It was a few years ago when the current consoles were annouced with AMD cpus and GCN gpus. "Intel is done, Nvidia is done. All the game gonna be optimized for AMD and will perform better on it." people were saying.

We know how it turned out.. but but but but.

Now the circlejerk has started again. I am really suprised how often the same trick keeps working again and again. Now go read AMD tweets how RDNA is awesome, how is cheaper to manufacture whilst the opposite is true because 7nm is significantly expensive with lower yields than 12nm. Go celebrate with them that after a year they realised something which can barely match mainstream Nvidia gpus card whilst being on a better node and having no raytracing support. Oh god, RDNA is trully awesome and superior.

AMD marketing: So after a year we finally have released 2 gpus on the 7nm node. Nvidia is using the 12 nm node. Despite the fact of using the clearly superior node and not supporting raytracing, thus not wasting additional die space which is turned off most of time, OUR CARDS CAN ONLY MATCH MAINSTREAM NVIDIA GPUS. *amd panics* What we gonna do? What we gonna do? You know what, we cut the prices and pretend it was intended, we will hype our RDNA like never before and tweet how we played Nvidia and how their chips are bigger and more expensive to make. Our fan base is dumb enough to believe anything we say and anyone with decent knowledge will not buy our gpus, so no loss! #RDNA4LIFE #RDNA4EVER
 
Last edited:
For the general understanding this article is very good, espacially with the previous game render pipeline article as background.
thanks for both articles (y) (Y)

There are some sites on the net that show "perf. per watt" diagrams, in those one can see for example that in 1080p max details over a good amound of games:

There is some kind of plateau at 85% of 1660Ti eff., where RX 5700 non-XT is nearly on par with all the RTX cards and the GTX 1080 and 1070.

The next plateau at 75% of 1660Ti eff. holds the 1080Ti, 1070Ti, 5700XT and 1060 6GB.

below that it gets obvious, that the 5700 non-XT is about 38% more efficient than a Radeon VII on the same 7nm process.

and yes, nvidia will smash the efficiency charts again with 7nm.
and yes, the non-RTX Turing cards are more efficient,
but if its just because the chipsize or because it holds the Tensor and RT-Cores?
I don´t know, but the 1050Ti was very efficient compared to the other pascals.
 
Back