Spec guess for the top-end RTX 3000 (GA102) chip

neeyik

Posts: 2,963   +3,630
Staff member
While on a coffee break from work, I cobbled together the known specs of the past/present best GPUs from Nvidia, including the current A100 chip. In all cases, the details cover the full die version of those chips, even though the likes of Volta was never used in this configuration. Then I had a go at configuring some possible cases for the forthcoming GA102 chip (assuming that this is what it's going to be called) - I.e. the version of Ampere that's going to be used in the top-end of RTX 3000/Titan cards. I also stuck in the specs from 'Moore's Law is Dead' leak/guess.

GA102 guess.png

Some things to clarify with the assumptions - we know what the Ampere architecture is like for the GA100, and my guess work on the basis that many of the new aspects will be carried through to the consumer versions. Namely, the improvements in FP16 throughput (twice the rate as in Turing) and tensor FP16 FMA+FP16 accumulate (4 times better). Moore's Law is Dead stated that the number of Tensor Cores has doubled but this will have been a misunderstanding: the GA100 has half the number of Tensor Cores, compared to Volta/Turing, but they're 4 times more capable - ergo, twice as good overall. The biggest unknown is the change to the RT Cores - MLiD is claiming 4 times better; I've been more conservative and gone with twice as good.

I've guessed at 4 possible variations to the GA102 - one where it's not much different to the TU102, in terms of shader count, and one where it's almost a carbon copy of the GA100 (though neither are very likely). I've gone with two very similar, full die guesses as the in-between versions. Such chips would only be used in the likes of Titan or Quadro cards; the GeForce RTX 3080 Ti Super Duper would definitely use a cut down chip, with at least 1 GPC disabled. Nvidia has been pretty consistent with some overall structural layouts from Pascal through to Ampere (namely the SMs and TPC configuration), so it's a safe bet to assume they won't change.

So if we pick the 3rd GA102 guess of mine, how does it fair against the TU102 in the table? Well, the pixel, texture, and FP32 output rates are both 48% higher; FP16 is an enormous is 256% better, and the tensor/ray cast rate are both just under 200% up.

The last time I did this with Navi chips, I was quite a bit out with the RX 5500, so I'm probably way off the mark!
 
Updated and altered chart, incorporated the various 'leaks':

ampere guess.png

The one thing the leaked details do tell us is that the GA102 is a big chip - there's quite a bit disabled for 3090. The reason for this is that the various internal ratios for the SMs and TPCs hasn't changed, so to get 5248 CUDA cores, the GPU for the 3090 must have 8 GPCs, as it calculates to around 7.5 GPCs.

So it looks like a complete full GA102 GPU will be packed 6144 CUDA cores - 33% more than in the TU102. Possibly for a new Titan, more likely for Quadros.

Edit: Based solely on these guesstimated figures, here's how the 3080 and 3090 compare to previous models:

ampere comp.png
 
Last edited:
Back