AMD Polaris 10 performance will reportedly be on par with Radeon 390/390X

Shawn Knight

Posts: 15,285   +192
Staff member

AMD’s next generation Polaris GPUs are expected to break cover as early as next month. Built using a 14-nanometer FinFET manufacturing process, the new Polaris 10 will reportedly replace existing Radeon R9 390 cards according to “well-informed” sources as reported by Fudzilla.

The publication says its sources are confident that Polaris 10 should match or outperform Radeon R9 390 cards and in certain circumstances, trump the R9 390X by a healthy margin.

These days, AMD’s R9 390X starts around $399 or so with non-“X” variants going for just over $300. With Polaris 10, AMD is reportedly hoping to bring the price down to $299 at launch. As part of its marketing, AMD will apparently tout the card’s low power consumption and power-per-watt metrics – areas it feels it can compete soundly against Nvidia.

The Polaris 11, meanwhile, is expected to replace the Radeon 370 and take on the GeForce 950 in terms of performance. Both Polaris 10 and Polaris 11 could arrive at Computex in June.

Nvidia, meanwhile, is working on its own next-generation graphics core, codenamed Pascal. The first card based on Pascal debuted last month although it’s not for consumers. The Tesla P100 is a high-performance compute (HPC) card that’s designed for use in some of the world’s fastest supercomputers.

AMD recently launched a new website that provides details on its Polaris architecture though there’s nothing that we didn't already know.

Permalink to story.

 
So you are telling me that Polaris 10 will be a 390 that is 10-20 dollars cheaper, and use .5 or so the power? I've been waiting for this???
It's not exactly news. we've known what Polaris will be like for quite some time now. It's actually a pleasant surprise to know that it will be at around the 390x in terms of performance (or better in some cases). As for the price, it should be much cheaper than just 10-20 bucks.
But it's not just lowered power draw, the new cards should also have new features like HDMI 2.0b, full DX12 support, dedicated hardware for streamers, HDR support, etc.
 
So you are telling me that Polaris 10 will be a 390 that is 10-20 dollars cheaper, and use .5 or so the power? I've been waiting for this???

Think GTX 980/R9 390X performance for $100 less than they are now, with all modern features, and 50% lower TDP compared to the 980. It's an all around win to me.
 
Based on some leaked specs, Nvidia's GTX 1080 will have the exact same shader, texture unit, and ROP count as the GTX 980. The only difference is potential for 8GB of GDDR5X. The GTX 1080 will tout twice as many CUDA modules, but each CUDA module for Pascal is half the core count of Maxwell CUDAs. The GTX 1080 will have 32 CUDA modules.

The GTX 1080's performance boosts will come from improvements in Pascal architecture, die shrink, and better RAM capability. It will not have more shaders and such.

So, AMD's Polaris 10 is expected to be a more efficient R9 390X, possibly the R9 480X. The GTX 1080 is not going to be staggering better than the 980. Fury X and Titan X replacements are not expected until end of 2016 or 2017. AMD has room in the time table to release a Fury X with 8GB of HBM2.
 
So you are telling me that Polaris 10 will be a 390 that is 10-20 dollars cheaper, and use .5 or so the power? I've been waiting for this???

Let's not forget that Polaris 10 probably won't just be 1 card. It's the high end range cards.
 
Read this and the original Fudzilla article. Performance makes sense but price does not. Slightly better than a 390 is not going to cut it at $300. Unless the Polaris performance figures are coming from a card with a very modest clock and much greater overclock potential, not sure this is all that accurate.
 
Doesn't this article say that " its sources are confident that Polaris 10 should match or outperform Radeon R9 390 cards and in certain circumstances, trump the R9 390X by a healthy margin."
I read that as, it'll pretty much be the same as a 390, but for some specialized things that nobody will ever benefit from in reality, it'll rock a 390X.

then it says"These days, AMD’s R9 390X starts around $399 or so with non-“X” variants going for just over $300. With Polaris 10, AMD is reportedly hoping to bring the price down to $299 at launch."

So, I read that as, non "X" 390 cards go for just over 300, and AMD is hoping to sell us this new polaris 10 card for 299. So one dollar? I mean, "just over 300, to 299, doesn't really get me that interested. Not sure why they keep mentioning the 390X, as from the early stuff I've seen here, it's not at the X level.

I know people have been saying this isn't going to be all that, and to wait for Vega, but I was sort of hoping that wouldn't be needed, but performance wise this looks like it is shaping up to be a sidegrade. Either would be an upgrade for me, but my OC'd 1gb 7850 is actually doing a pretty good job still. still, wasn't the 390 basically a 290, and wasn't that sort of a 7 class something before that. I know you get modern connectivity and HDR, DX12 and better support for vcr's and betamax's, but really, I'd rather see actual graphics performance increases.
Anyway, it looks like many of you called this one right some while ago.
 
So you are telling me that Polaris 10 will be a 390 that is 10-20 dollars cheaper, and use .5 or so the power? I've been waiting for this???
me too... time to upgrade from NVidia mid gtx 960 (and jump ship to amd high-end).. 'perhaps'... :)
 
I could see Polaris 10 sitting in between the 390 and 390x performance wise, wavering closer to the 390 most of the time. The big changes should be a much better power usage, meaning lower TDP, meaning quieter and cooler card, witch could give polaris a chance at being made into a mini ITX card (sapphire) to give people looking for a good mini ITX card another option from AMD besides the r9 nano and the R7 family. If the lower power comes with more overclocking head room that would give amd a bonus there and allow the cards to show a better price/performance ratio. Neither Polaris nor pascal look to be a big performance shift just some better power economy and once they get the manufacturing a bit cheaper should be a better price/performance over the last gen.
 
"Premium Game Streaming" from AMD's website

According to the website Polaris will offer a near zero performance loss streaming option as well. Looks like a counter to shadowplay although hopefully AMD comes up with a better name.

To other's around, AMD is targeting performance per watt. I'm guessing we'll see a large jump in power efficiency over the 390 and 390x with a modest performance boost at a lower price tag. The plus side of this is that AIB partners will be able to come out with awesome overclocked versions of the cards with great coolers. It sounds like a win-win. People with weaker power supplies will be able to pick up a reference card that's actually quiet and cool while hardcore users will either be able to overclock or buy an extreme 2nd party card.

At this point we do know that Nvidia's pascal will have higher power consumption than Maxwell. What I don't know is if Nvidia is going to include Async Compute support in their cards. It really is a must have going forward with DX 12 and VR devices.
 
To other's around, AMD is targeting performance per watt. I'm guessing we'll see a large jump in power efficiency over the 390 and 390x
You'd bloody well hope so. Hawaii will be close to being three years older than Polaris when it launches and very likely sport a reduced 1:16 rate FP64 compared to Hawaii's Radeon-neutered 1:8 as well as (hopefully) some form of SIMD power gating and variable SIMD width to better optimize for on-the-fly workgroup variability (something their competitor has also been refining).
At this point we do know that Nvidia's pascal will have higher power consumption than Maxwell.
Do we? I certainly don't and I don't consider myself a graphics noob. If you can supply some proof I'd love to increase my knowledge base.
What I don't know is if Nvidia is going to include Async Compute support in their cards. It really is a must have going forward with DX 12 and VR devices.
Async compute at the top end provides a negligable uplift. What it is essentially showing is that AMD's graphics pipeline was underutilizing its ALU's - too many remaining idle when they shouldn't be. As the AotS dev man said:
Saying that Multi-Engine (aka Async Compute) is the root of performance increases on Ashes between DX11 to DX12 on AMD is definitely not true. Most of the performance gains in AMDs case are due to CPU driver head reductions. Async is a modest perf increase relative to that.
As the Hitman dev man said in the same article:
On the other hand, it’s quite surprising to read that even AMD cards merely got a 5-10% performance boost, especially after AMD endorsed HITMAN’s implementation as the best one yet. Async Compute, which has been used for SSAA (Screen Space Anti Aliasing), SSAO (Screen Space Ambient Occlusion) and the calculation of light tiles in HITMAN, was also “super hard” to tune; according to IO Interactive, too much Async work can even make it a penalty, and then there’s also the fact that PC has lots of different configurations that need tuning.

You can then add in developers having to code more intensively for individual graphics architectures - and in some case, specific GPUs - and I don't think DX12 will be as prevalent as you might seem to think. The other side of that equation is also the rumours that Polaris might also lack conservative rasterization - since its development (like Pascal's) has been years in gestation. If that is the case then, AMD might tout Async compute - which would give the midrange CPU/GPU combos a lift, but could suffer a bloody nose if Just Cause 3 is any indication - because a swathe of AAA titles utilizing it (including those based on UE4 in DX11 and 12) are waiting in the wings.
 
This sounds good. 390 performance, sometimes as high as a 390x, but with power consumption closer to a 7850, modern features of the newest GCN branch, and a pricetag $100 lower than current sounds like a great deal, and is a win for AMD as well, since the much smaller die will be more profitable than the 390x GPU.

My only question will be FP performance. the 290x's FP performance makes it fantastic at folding@home and the like, easily scoring over 10x higher than a 770(329k points per day vs 24k). If polaris can match this, then I'm sold.
 
"Premium Game Streaming" from AMD's website

According to the website Polaris will offer a near zero performance loss streaming option as well. Looks like a counter to shadowplay although hopefully AMD comes up with a better name.

Well they have had the hardware (VCE) to use shadowplay like applications since the 7000 series, I can say plays.tv that they started bundling with the driver package streamed with almost no performance drop on both my old 390x and my new r9 nano, but it also worked well on the GTX 960 I had for messing around with. You can also just use open broadcast which has almost zero performance loss when streaming on AMD cards in some of my more stream happy friends opinions (they were using that before the plays.tv was bundled).
 
This is disappointing to say the least. I was really hoping that AMD was going to be able to do this time around what Nvidia did last time. When it released the 970 as a mid range (~$300--the same segment Polaris is supposedly aimed at) it surpassed the performance of the former high-end 780 and cut power at the same time. In other words, it re-defined mid-range and set a new performance bar. AMD should be trying to do something similar now. I can only assume they're hoping game developers will do it for them by taking advantage of async computing and DX12. At least some games seem to be doing this so it may bode well for them, but if Nvidia's Pascal is similar to the performance jump Maxwell was (probably not but who knows?) it may still be able to win out with raw power.

Again... disappointing.
 
You'd bloody well hope so. Hawaii will be close to being three years older than Polaris when it launches and very likely sport a reduced 1:16 rate FP64 compared to Hawaii's Radeon-neutered 1:8 as well as (hopefully) some form of SIMD power gating and variable SIMD width to better optimize for on-the-fly workgroup variability (something their competitor has also been refining).

Do we? I certainly don't and I don't consider myself a graphics noob. If you can supply some proof I'd love to increase my knowledge base.

Async compute at the top end provides a negligable uplift. What it is essentially showing is that AMD's graphics pipeline was underutilizing its ALU's - too many remaining idle when they shouldn't be. As the AotS dev man said:

As the Hitman dev man said in the same article:


You can then add in developers having to code more intensively for individual graphics architectures - and in some case, specific GPUs - and I don't think DX12 will be as prevalent as you might seem to think. The other side of that equation is also the rumours that Polaris might also lack conservative rasterization - since its development (like Pascal's) has been years in gestation. If that is the case then, AMD might tout Async compute - which would give the midrange CPU/GPU combos a lift, but could suffer a bloody nose if Just Cause 3 is any indication - because a swathe of AAA titles utilizing it (including those based on UE4 in DX11 and 12) are waiting in the wings.

When Nvidia revealed the specs from the GP100 they released what we can expect from Pascal as a whole.

http://wccftech.com/nvidia-pascal-specs/

The highest end GM200 from maxwell consumes 50 less watts than GP100 in the Tesla P100 and that's with HBM. You should expect that video cards lower on the scale without HBM will consume more power.
 
When Nvidia revealed the specs from the GP100 they released what we can expect from Pascal as a whole.

http://wccftech.com/nvidia-pascal-specs/

The highest end GM200 from maxwell consumes 50 less watts than GP100 in the Tesla P100 and that's with HBM. You should expect that video cards lower on the scale without HBM will consume more power.
No. That does not track at all. Simple arithmetic would show how erroneous that assumption is
P100 has a 1:2 FP64 rate. GM200 has a 1:32 rate.
P100 has 1,920 FP64 units that each use around twice the power of a comparable FP32 ALU. GM200 has 96 FP64 units.
This is the exact reason that GK110 disables the boost of Titan cards when enabling FP64 at its native 1:3 rate:
Tapping in to the full-speed FP64 CUDA cores requires opening the driver control panel, clicking the Manage 3D Settings link, scrolling down to the CUDA – Double precision line item, and selecting your GeForce GTX Titan card. This effectively disables GPU Boost, so you’d only want to toggle it on if you specifically needed to spin up the FP64 cores.

So if you were making a like for like comparison:
P100 (current spec as opposed to full die) 3584 FP32 + 3584 FP32 equivalent (1792 FP64*2) = 7,168
GM200 : 3072 FP32 + 192 FP32 equivalent (96 FP64*2) = 3264

P100 max clock under TDP : 1480MHz
GM200 max clock under TDP: 1114MHz

So, P100 is feeding 119% more FP32 ALU equivalents at a 32.9% higher clock than GM200, yet its TDP is rising by only 20%........so how do you equate Pascal requiring more power than Maxwell for GeForce cards??? Especially as the ALU equivalent count increase between GP104 and GK104 is only 25%.
 
This is disappointing to say the least. I was really hoping that AMD was going to be able to do this time around what Nvidia did last time. When it released the 970 as a mid range (~$300--the same segment Polaris is supposedly aimed at) it surpassed the performance of the former high-end 780 and cut power at the same time. In other words, it re-defined mid-range and set a new performance bar. AMD should be trying to do something similar now. I can only assume they're hoping game developers will do it for them by taking advantage of async computing and DX12. At least some games seem to be doing this so it may bode well for them, but if Nvidia's Pascal is similar to the performance jump Maxwell was (probably not but who knows?) it may still be able to win out with raw power.

Again... disappointing.
Then how is polaris disappointing? Keep in mind the 780 was not the largest chip in the kepler family, that would be the 780ti/titan x. Similarly, the 290x was an add-on to the GCN family.

If polaris 10 comes out at $300 (which it sounds like it will) and it outperforms the 780 equivalent of the GCN series (7970/290 performance level) it will fit the exact same role as the 970 did. It isnt redefining anything, but it is taking flagship level performance and putting it in the $300 arena, while pulling power much closer to a 7850/7870.

More importantly, we've seen how well the GCN series has aged. The same will most likely go for polaris.
 
No. That does not track at all. Simple arithmetic would show how erroneous that assumption is
P100 has a 1:2 FP64 rate. GM200 has a 1:32 rate.
P100 has 1,920 FP64 units that each use around twice the power of a comparable FP32 ALU. GM200 has 96 FP64 units.
This is the exact reason that GK110 disables the boost of Titan cards when enabling FP64 at its native 1:3 rate:


So if you were making a like for like comparison:
P100 (current spec as opposed to full die) 3584 FP32 + 3584 FP32 equivalent (1792 FP64*2) = 7,168
GM200 : 3072 FP32 + 192 FP32 equivalent (96 FP64*2) = 3264

P100 max clock under TDP : 1480MHz
GM200 max clock under TDP: 1114MHz

So, P100 is feeding 119% more FP32 ALU equivalents at a 32.9% higher clock than GM200, yet its TDP is rising by only 20%........so how do you equate Pascal requiring more power than Maxwell for GeForce cards??? Especially as the ALU equivalent count increase between GP104 and GK104 is only 25%.

Seeing as the Titan X, a card with gimped FP performance, maintained all of the power consumption of it's business class equivalent the Tesla M40 I think my assumption is fair.

http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-x/specifications

The P100 is using multi-purpose FP CUDA cores just like the GM200 GPU.
 
Seeing as the Titan X, a card with gimped FP performance, maintained all of the power consumption of it's business class equivalent the Tesla M40 I think my assumption is fair.
?????
Titan X and the Tesla M40 have the same 1:32 FP64 rate.
All you are doing is comparing two basically identical cards since they are both GM200 and GM200's native FP64 rate is 1:32.

The P100 is using multi-purpose FP CUDA cores just like the GM200 GPU.
You seem to have a fundamental block. Yes, P100 uses the same ALUs as GM200, but is uses MORE of them. The FP64 units - like the load/store units and SFU's are seperate logic blocks to the FP32 ALUs.
The P100 has 3582 FP32 ALUs + 1792 FP64 ALUs (each effectively two FP32 ALU's ganged per FP64).............300W Board power
The GM200 has 3072 FP32 ALUs + 96 FP64 ALUs...................................................................................................250W Board power

FP64 units take up die real estate and require input power to function.

I really don't know how I can explain this more simply....so I won't try.
 
?????
Titan X and the Tesla M40 have the same 1:32 FP64 rate.
All you are doing is comparing two basically identical cards since they are both GM200 and GM200's native FP64 rate is 1:32.


You seem to have a fundamental block. Yes, P100 uses the same ALUs as GM200, but is uses MORE of them. The FP64 units - like the load/store units and SFU's are seperate logic blocks to the FP32 ALUs.
The P100 has 3582 FP32 ALUs + 1792 FP64 ALUs (each effectively two FP32 ALU's ganged per FP64).............300W Board power
The GM200 has 3072 FP32 ALUs + 96 FP64 ALUs...................................................................................................250W Board power

FP64 units take up die real estate and require input power to function.

I really don't know how I can explain this more simply....so I won't try.

You don't have to keep listing the number of FP units, these were all in plain sight to begin with. You are saying that P100 consumes more power because of its increase resources. Once again, duh. The point that you are contesting is that the increased power usage of the P100 would not translate to the lower end cards. My titan x and Tesla M40 comparison proves that wrong. You must have misunderstood something because your replies have been completely off base.
 
You are saying that P100 consumes more power because of its increase resources.
Because of the power usage intensive architectural considerations to be exact.
The point that you are contesting is that the increased power usage of the P100 would not translate to the lower end cards.
Correct. The power budget of GeForce cards is alleviated by a reduction in intensive power features - FP64, large register and cache.
My titan x and Tesla M40 comparison proves that wrong.
Well, no it doesn't. The Tesla M40 doesn't utilize double precision to any degree and neither does the Titan X (219 vs 192 GFLOPS separated solely by clock speed difference), so how is that analogous to the P100 which is a double precision monster and the GeForce cards which will in all likelihood have the FP64's stripped out to save power and die space?

I can appreciate that you either think Pascal will be power hungry - or more likely, hope this is the case, but extrapolating the facts from the P100 don't bear it out. Even a cursory look at the known specification makes your argument odd in the extreme:
GM200 6.8 TFLOPS FP32 / 250W = 27.2 GFLOPS/watt
P100 : 10.6 TFLOPS FP32 / 300W = 35.33 GFLOPS/watt............and this is the BEST case scenario for Maxwell since P100 is designed for FP16 and FP64 workloads, not gaming workloads.
GM200 FP16 27.2 GFLOPS/watt........P100: 70.66 GFLOPS/watt
GM200 FP64 0.86 GFLOPS/watt.......P100: 17.68 GFLOPS/watt

The only way you can extrapolate Pascal GeForce cards using more power than than Maxwell is if their clocks are much more aggressive than the 32.9% increase P100 has over the M40 ( 1480MHz vs 1114MHz), and Pascal is going to offer a huge performance uplift over Maxwell (spoiler alert: It wont 30-40% for GP104 over GM104 should cover it). Anyhow, this won't be resolved until independent benchmarking takes place, so I'll leave it in abeyance and I'll revisit the subject when the numbers are in.
 
Polaris 10 card will maybe be $250 at release. If the numbers for performance are accurate then the price would make sense. You can buy a 390 right now for $285 so Polaris 10 for $300 with similar performance shouldn't happen. AMD also stated that the Polaris cards are targeting the mid-range consumer with initial release, which fits more closely with a sub-$250 price point. Additionally, Polaris and Pascal will have different target audiences when first released, with Pascal being more high-end and Polaris being the mid-range. So we should be looking at something priced significantly below the $350-400 that a gtx1070 may launch at.
 
Because of the power usage intensive architectural considerations to be exact.

Correct. The power budget of GeForce cards is alleviated by a reduction in intensive power features - FP64, large register and cache.

Well, no it doesn't. The Tesla M40 doesn't utilize double precision to any degree and neither does the Titan X (219 vs 192 GFLOPS separated solely by clock speed difference), so how is that analogous to the P100 which is a double precision monster and the GeForce cards which will in all likelihood have the FP64's stripped out to save power and die space?

I can appreciate that you either think Pascal will be power hungry - or more likely, hope this is the case, but extrapolating the facts from the P100 don't bear it out. Even a cursory look at the known specification makes your argument odd in the extreme:
GM200 6.8 TFLOPS FP32 / 250W = 27.2 GFLOPS/watt
P100 : 10.6 TFLOPS FP32 / 300W = 35.33 GFLOPS/watt............and this is the BEST case scenario for Maxwell since P100 is designed for FP16 and FP64 workloads, not gaming workloads.
GM200 FP16 27.2 GFLOPS/watt........P100: 70.66 GFLOPS/watt
GM200 FP64 0.86 GFLOPS/watt.......P100: 17.68 GFLOPS/watt

The only way you can extrapolate Pascal GeForce cards using more power than than Maxwell is if their clocks are much more aggressive than the 32.9% increase P100 has over the M40 ( 1480MHz vs 1114MHz), and Pascal is going to offer a huge performance uplift over Maxwell (spoiler alert: It wont 30-40% for GP104 over GM104 should cover it). Anyhow, this won't be resolved until independent benchmarking takes place, so I'll leave it in abeyance and I'll revisit the subject when the numbers are in.

Facepalm on my part. I totally forgot the M40 doesn't use double precision. Comparing apples to oranges.

I guess it's really going to come down to how well Nvidia can scale down that and gate that architecture.
 
Back