Nvidia's first Pascal GPU is the Tesla P100 for HPC

Scorpus

Posts: 2,162   +239
Staff member

Nvidia's first Pascal-based graphics card isn't a GeForce SKU for consumers; instead it's the Tesla P100, a high-performance compute (HPC) card with a brand new GP100 GPU on-board.

The Tesla P100 is an incredibly powerful card, boasting 10.6 TFLOPs of single-precision performance, 5.3 TFLOPs of double precision, and a whopping 21.2 TFLOPs of half precision. This is a huge performance increase over Nvidia's past single and even dual-GPU compute cards: the Telsa M40, for example, used a fully-unlocked Maxwell GM200 GPU and achieved just 6.8 TFLOPs of single/half precision performance, and just 213 GFLOPs of dual precision.

The GP100 GPU is built on a TSMC 16nm FinFET process, and with 15.3 billion transistors on-die, it has a rated TDP of 300W. The Tesla P100 uses a partially-disabled version of this GPU, featuring just 56 of 60 SMs in a working state, leaving the card with 3,584 CUDA cores. This core is clocked at 1,328 MHz with boost clocks up to 1,480 MHz.

As for memory, Nvidia has loaded this card with 16 GB of HBM2 clocked at 1.4 Gbps, providing a huge 720 GB/s of bandwidth. Unlike with AMD's first-generation HBM graphics cards, Nvidia has been able to include more than 4 GB of memory thanks to HBM2's lack of restrictions in this department.

Nvidia is already volume producing the Tesla P100 for use in systems like its very own DGX-1, which will be available in June, as well other systems from IBM, Dell and Cray.

It could be some time before we see the GP100 transition from Telsa cards to GeForce due to cost and production concerns. However we do now know that Pascal can scale to incredibly powerful GPUs, and this could make the next few releases of consumer cards very exciting.

Permalink to story.

 
a whopping 21.2 TFLOPs of half precision

What the hell is half-precision, and why anyone would care about that? Is that another marketing BS, so that at 1/1000 precision they could claim infinite performance? :)

Nvidia has loaded this card with 16 GB of HBM2
So, if we do not see HBM2 in desktop video cards, we will know - Tesla ate it all. Thanks a bunch!

this could make the next few releases of consumer cards very exciting
They have been riding it on the excitement factor way too long, it is starting to turn sour.

Maybe AMD will deliver something to shake nVidia off its apathy.
 
Last edited:
a whopping 21.2 TFLOPs of half precision

What the hell is half-precision, and why anyone would care about that? Is that another marketing BS, so that at 1/1000 precision they could claim infinite performance?

Nvidia has loaded this card with 16 GB of HBM2
So, if we do not see HBM2 in desktop video cards, we will know - Tesla ate it all. Thanks a bunch!

this could make the next few releases of consumer cards very exciting
They have been riding it on the excitement factor way too too long, it is starting to turn sour.

Maybe AMD will deliver something to shake nVidia off its apathy.
Half-precision is basically 16-bit instead of the usual 32-bit for floating precision. Basically more precise than 8-bit but less precise than full floating points. The advantage is that it uses half the memory space and memory bandwidth compared to full floating points yet provide "good enough" precision. From what I could tell, it's supported well and is included in CPUs from Intel, ARM, and AMD. So it's not some bullshit.
 
a whopping 21.2 TFLOPs of half precision

What the hell is half-precision, and why anyone would care about that? Is that another marketing BS, so that at 1/1000 precision they could claim infinite performance?

Nvidia has loaded this card with 16 GB of HBM2
So, if we do not see HBM2 in desktop video cards, we will know - Tesla ate it all. Thanks a bunch!

this could make the next few releases of consumer cards very exciting
They have been riding it on the excitement factor way too too long, it is starting to turn sour.

Maybe AMD will deliver something to shake nVidia off its apathy.
Half-precision is basically 16-bit instead of the usual 32-bit for floating precision. Basically more precise than 8-bit but less precise than full floating points. The advantage is that it uses half the memory space and memory bandwidth compared to full floating points yet provide "good enough" precision. From what I could tell, it's supported well and is included in CPUs from Intel, ARM, and AMD. So it's not some bullshit.

Thank you for the information. I was wondering more about this myself.
 
What the hell is half-precision, and why anyone would care about that? Is that another marketing BS, so that at 1/1000 precision they could claim infinite performance? :)
Deep learning is a recurrent/recursive process. Speed and threads in flight generally trump absolute accuracy
vY1YkX5.jpg
So, if we do not see HBM2 in desktop video cards, we will know - Tesla ate it all. Thanks a bunch!
A little while ago some tinfoil hat wearers proposed that Nvidia wouldn't even have access to HBM2 because AMD had first dibs on production and AMD owned the HBM patents. Tesla chewing up all production sounds only marginally less ludicrous. Unless Samsung and Hynix have scoured Asia for an OCD afflicted labour force to handmake the chips, production should ramp like any other memory product. By the time Nvidia's/AMD's higher volume (non-pro) products are ready, HBM2 chips should be a commodity product.
They have been riding it on the excitement factor way too long, it is starting to turn sour.
Can't say that I'm particularly affected by the "excitement factor". Most of the articles and "news" has been speculation, clickbait, and broad strokes from technology conferences. Even this announcement leaves too much unanswered. GP100/P100 looks increasingly like a compute only chip - a massive increase in registers mirrors that of the GPGPU-only GK 210. A 1:2 FP64 rate also makes little sense for a gaming GPU when it sends the power budget through the roof. Reworking the logic blocks to cull FP64 and the four NVLink interfaces, and using the saved die space to increase core/ROP/TAU/GPC/SM count seems like a better balance for a gaming chip - and is something I certainly wouldn't bet against happening.
Maybe AMD will deliver something to shake nVidia off its apathy.
You are complaining about "riding the excitement factor too long", yet AMD basically announced Vega, the high end Fiji replacement, around a year before its proposed introduction
 
Last edited:
Deep learning is a recurrent/recursive process. Speed and threads in flight generally trump absolute accuracy
vY1YkX5.jpg

A little while ago some tinfoil hat wearers proposed that Nvidia wouldn't even have access to HBM2 because AMD had first dibs on production and AMD owned the HBM patents. Tesla chewing up all production sounds only marginally less ludicrous. Unless Samsung and Hynix have scoured Asia for an OCD afflicted labour force to handmake the chips, production should ramp like any other memory product. By the time Nvidia's/AMD's higher volume (non-pro) products are ready, HBM2 chips should be a commodity product.

Can't say that I'm particularly affected by the "excitement factor". Most of the articles and "news" has been speculation, clickbait, and broad strokes from technology conferences. Even this announcement leaves too much unanswered. GP100/P100 looks increasingly like a compute only chip - a massive increase in registers mirrors that of the GPGPU-only GK 210. A 1:2 FP64 rate also makes little sense for a gaming GPU when it sends the power budget through the roof. Reworking the logic blocks to cull FP64 and the four NVLink interfaces, and using the saved die space to increase core/ROP/TAU/GPC/SM count seems like a better balance for a gaming chip - and it something I certainly wouldn't bet against happening.

You are complaining about "riding the excitement factor too long", yet AMD basically announced Vega, the high end Fiji replacement, around a year before its proposed introduction

I was like a SC2 Archon this morning - sarcasm overwhelming. Will learn to append smileys where appropriate next time :)
 
Deep learning is a recurrent/recursive process. Speed and threads in flight generally trump absolute accuracy
vY1YkX5.jpg

A little while ago some tinfoil hat wearers proposed that Nvidia wouldn't even have access to HBM2 because AMD had first dibs on production and AMD owned the HBM patents. Tesla chewing up all production sounds only marginally less ludicrous. Unless Samsung and Hynix have scoured Asia for an OCD afflicted labour force to handmake the chips, production should ramp like any other memory product. By the time Nvidia's/AMD's higher volume (non-pro) products are ready, HBM2 chips should be a commodity product.

Can't say that I'm particularly affected by the "excitement factor". Most of the articles and "news" has been speculation, clickbait, and broad strokes from technology conferences. Even this announcement leaves too much unanswered. GP100/P100 looks increasingly like a compute only chip - a massive increase in registers mirrors that of the GPGPU-only GK 210. A 1:2 FP64 rate also makes little sense for a gaming GPU when it sends the power budget through the roof. Reworking the logic blocks to cull FP64 and the four NVLink interfaces, and using the saved die space to increase core/ROP/TAU/GPC/SM count seems like a better balance for a gaming chip - and is something I certainly wouldn't bet against happening.

You are complaining about "riding the excitement factor too long", yet AMD basically announced Vega, the high end Fiji replacement, around a year before its proposed introduction

Vega was only on the AMD roadmap. It wasn't discussed. If you equate that to "announcing" then you have to say the same for the cards on Nvidia's roadmap as well.
 
1/2 speed for DP compute is fantastic. I hope they don't cripple DP in consumer cards to something like 1/8 or worse.
Half-precision is basically 16-bit instead of the usual 32-bit for floating precision. Basically more precise than 8-bit but less precise than full floating points. The advantage is that it uses half the memory space and memory bandwidth compared to full floating points yet provide "good enough" precision. From what I could tell, it's supported well and is included in CPUs from Intel, ARM, and AMD. So it's not some bullshit.
Exactly. It works well for computations that do not need significant accuracy.
 
1/2 speed for DP compute is fantastic. I hope they don't cripple DP in consumer cards to something like 1/8 or worse.

Exactly. It works well for computations that do not need significant accuracy.

100% chance of them crippling DP performance in consumer cards. If they didn't there would be next to no difference between the consumer cards and workstation cards and definitely no reason to spend 10x the money on workstation cards. This has been going on forever and will continue.
 
Vega was only on the AMD roadmap. It wasn't discussed. If you equate that to "announcing" then you have to say the same for the cards on Nvidia's roadmap as well.
Exactly my point.
Both vendors have been riding the hype train with future product announcements. Nvidia with Pascal and Volta, AMD with Polaris, Vega, Navi....and Bristol Ridge for that matter (which has been front and centre in AMD's public AM4 socket plans for at least ten months). Both vendors are fighting a marketing battle for future purchases - the decision of which they hope will reflect in current buying trends. Hardly a new strategy - Intel have been actively touting the 200-series Union Point chipset and its feature set well ahead of Kaby Lake/Cannonlake's introduction.
1/2 speed for DP compute is fantastic. I hope they don't cripple DP in consumer cards to something like 1/8 or worse.
If GP100 makes it to a consumer series, it will probably retain the 1:2 FP64 rate. Contrary to Evernessince's assertion, The GTX Titan, Titan Black, Titan Z, and Titan X - despite being consumer cards- all feature full rate FP64 (1:3).
As for the other GPUs, I doubt that you'll see any meaningful double precision from either vendor. GP100 includes 1,920 dedicated FP64 cores. In a gaming GPU these would be largely unused - makes more sense to cull them and either reduce die size and power draw, or rework the GPC/SM's to have a higher ALU/ROP count.
 
Last edited:
Exactly my point.
Both vendors have been riding the hype train with future product announcements. Nvidia with Pascal and Volta, AMD with Polaris, Vega, Navi....and Bristol Ridge for that matter (which has been front and centre in AMD's public AM4 socket plans for at least ten months). Both vendors are fighting a marketing battle for future purchases - the decision of which they hope will reflect in current buying trends. Hardly a new strategy - Intel have been actively touting the 200-series Union Point chipset and its feature set well ahead of Kaby Lake/Cannonlake's introduction.

If GP100 makes it to a consumer series, it will probably retain the 1:2 FP64 rate. Contrary to Evernessince's assertion, The GTX Titan, Titan Black, Titan Z, and Titan X - despite being consumer cards- all feature full rate FP64 (1:3).
As for the other GPUs, I doubt that you'll see any meaningful double precision from either vendor. GP100 includes 1,920 dedicated FP64 cores. In a gaming GPU these would be largely unused - makes more sense to cull them and either reduce die size and power draw, or rework the GPC/SM's to have a higher ALU/ROP count.

I'm tired of the hype in the PC industry lately. Most of it's turned out to be fluff. Even if both AMD's and Nvidia's cards turn out to be awesome, there hasn't been a game that's really warranted an expensive GPU purchase in some time. First gen HMDs don't look like they are quite there either.
 
The real question is, can the consumer cards based on this play top-of-the-line games at 4k with highest detail?

So far, you need at least 2 Titan Xs to do this - and not even that setup is perfect....
 
The real question is, can the consumer cards based on this play top-of-the-line games at 4k with highest detail?

So far, you need at least 2 Titan Xs to do this - and not even that setup is perfect....

I'm guessing the consumer version of this card will be able to do 4k near 60 FPS. It won't be a steady 60 all the time but it will be around that most of the time. This assumes that the increase in transistor count results in a linear performance increase. I have heard that Pascal might support Async Compute on the hardware and it'll be interesting to see how they implemented it. AMD has Async Compute engines at the core of it's architecture so it's hard to imagine that Nvidia's implementation is going to be better especially considering Pascal is only supposed to be Maxwell 2.0. Really no time to revise the architecture in 1 year.
 
So, if we do not see HBM2 in desktop video cards, we will know - Tesla ate it all. Thanks a bunch!
Yup.

I will be very disappointed if all the consumer cards don't use HBM2 after seeing this.

8GB VRAM better be standard on everything; with 16GB+ on high end cards. Enough of this restrictive VRAM era, especially on NVIDIA's side until now.
 
I will be very disappointed if all the consumer cards don't use HBM2 after seeing this.
Well, prepare to be disappointed:
1. HBM2 is still some time away from volume ramping
2. GPU + HBM2 + Interposer package assembly is not straightforward. There are significant yield issues at every stage of the assembly process. Between the cost of components, the cost of assembly, and the cost of QA ( X-ray metrology to confirm microbumping and TSV's are completely viable*) you won't see HBM2 on anything lower than the top tier cards. Pretty hard to justify the expenditure on cards that sell for $200, or even $350, especially when the GPU is going to be the limiting factor rather than bus width.
*
BiNQWrw.jpg
8GB VRAM better be standard on everything
Yeah, good luck with that. The largest discrete graphics market falls in the $99-$149-$199 segments. Adding vRAM might be a great marketing point, but at these performance levels the GPUs are going to choke to death from lack of ALUs or rasterization well before they saturate the framebuffer.
 
Well, prepare to be disappointed:
1. HBM2 is still some time away from volume ramping
2. GPU + HBM2 + Interposer package assembly is not straightforward. There are significant yield issues at every stage of the assembly process. Between the cost of components, the cost of assembly, and the cost of QA ( X-ray metrology to confirm microbumping and TSV's are completely viable*) you won't see HBM2 on anything lower than the top tier cards. Pretty hard to justify the expenditure on cards that sell for $200, or even $350, especially when the GPU is going to be the limiting factor rather than bus width.
*
BiNQWrw.jpg

Yeah, good luck with that. The largest discrete graphics market falls in the $99-$149-$199 segments. Adding vRAM might be a great marketing point, but at these performance levels the GPUs are going to choke to death from lack of ALUs or rasterization well before they saturate the framebuffer.

Yep, HBM2 production lines aren't supposed to start till this summer too. Even longer till any real volume. I think we should pin both AMD and Nvidia's big consumer cards around when AMD vega is due to release as that is when AMD expects HBM2 to be ready for market. That is, not until late in the 2nd quarter.
 
Yep, HBM2 production lines aren't supposed to start till this summer too. Even longer till any real volume. I think we should pin both AMD and Nvidia's big consumer cards around when AMD vega is due to release as that is when AMD expects HBM2 to be ready for market. That is, not until late in the 2nd quarter.
Just a point of reference... 2nd quarter has already begun... "late 2nd quarter" is next month.... summer is actually 3rd quarter....
 
Yep, HBM2 production lines aren't supposed to start till this summer too. Even longer till any real volume. I think we should pin both AMD and Nvidia's big consumer cards around when AMD vega is due to release as that is when AMD expects HBM2 to be ready for market. That is, not until late in the 2nd quarter.
The beginning of the HBM2 ramp started some time ago. The DGX-1 modules indicate that Pascal qualified on both HBM(1) and HBM2
3-1080.4210393052.jpg



Quanta's Pascal based system demonstrated at GTC also tends to indicate that while HBM2 isn't at volume (read commodity) production levels, initial production is well underway
original.jpg
 
Back