NVIDIA Pascal Architecture Details, Tesla P100, GP100 GPU

Posted by Scott Michaud | Apr 6, 2016 | Graphics Cards, Shows and Expos | 54

Manufacturer: NVIDIA

NVIDIA Pascal Architecture Details, Tesla P100, GP100 GPU

93% of a GP100 at least…

Big Pascal finally embraces FP64 performance!

NVIDIA has announced the Tesla P100, the company's newest (and most powerful) accelerator for HPC. Based on the Pascal GP100 GPU, the Tesla P100 is built on 16nm FinFET and uses HBM2.

NVIDIA provided a comparison table, which we added what we know about a full GP100 to:

	Tesla K40	Tesla M40	Tesla P100	Full GP100
GPU	GK110 (Kepler)	GM200 (Maxwell)	GP100 (Pascal)	GP100 (Pascal)
SMs	15	24	56	60
TPCs	15	24	28	(30?)
FP32 CUDA Cores / SM	192	128	64	64
FP32 CUDA Cores / GPU	2880	3072	3584	3840
FP64 CUDA Cores / SM	64	4	32	32
FP64 CUDA Cores / GPU	960	96	1792	1920
Base Clock	745 MHz	948 MHz	1328 MHz	TBD
GPU Boost Clock	810/875 MHz	1114 MHz	1480 MHz	TBD
FP64 GFLOPS	1680	213	5304	TBD
Texture Units	240	192	224	240
Memory Interface	384-bit GDDR5	384-bit GDDR5	4096-bit HBM2	4096-bit HBM2
Memory Size	Up to 12 GB	Up to 24 GB	16 GB	TBD
L2 Cache Size	1536 KB	3072 KB	4096 KB	TBD
Register File Size / SM	256 KB	256 KB	256 KB	256 KB
Register File Size / GPU	3840 KB	6144 KB	14336 KB	15360 KB
TDP	235 W	250 W	300 W	TBD
Transistors	7.1 billion	8 billion	15.3 billion	15.3 billion
GPU Die Size	551 mm²	601 mm²	610 mm²	610mm²
Manufacturing Process	28 nm	28 nm	16 nm	16nm

This table is designed for developers that are interested in GPU compute, so a few variables (like ROPs) are still unknown, but it still gives us a huge insight into the “big Pascal” architecture. The jump to 16nm allows for about twice the number of transistors, 15.3 billion, up from 8 billion with GM200, with roughly the same die area, 610 mm², up from 601 mm².

A full GP100 processor will have 60 shader modules, compared to GM200's 24, although Pascal stores half of the shaders per SM. The GP100 part that is listed in the table above is actually partially disabled, cutting off four of the sixty total. This leads to 3584 single-precision (32-bit) CUDA cores, which is up from 3072 in GM200. (The full GP100 architecture will have 3840 of these FP32 CUDA cores -- but we don't know when or where we'll see that.) The base clock is also significantly higher than Maxwell, 1328 MHz versus ~1000 MHz for the Titan X and 980 Ti, although Ryan has overclocked those GPUs to ~1390 MHz with relative ease. This is interesting, because even though 10.6 TeraFLOPs is amazing, it's only about 20% more than what GM200 could pull off with an overclock.

Pascal's advantage is that these shaders are significantly more complex. First, double-precision performance is finally at a 1:2 ratio with single-precision, which is the highest proportion for both to be first-class citizens. (You can compute two, 32-bit values for each 64-bit one with enough parallelism in your calculations.) This yields a double-precision performance of 5.3 TeraFLOPs at stock clocks, and with just 56 operational SMs, for GP100. Compare this to GK110's 1.7 TeraFLOPs, or Maxwell's 0.2 (yes, 0.2) TeraFLOPs, and you'll see what a huge upgrade this is in calculations that need extra precision (or range).

Second, NVIDIA has also added FP16 values as a first-class citizen too, yielding a 2:1 performance ratio with FP32. This means that, in situations where 16-bit values are sufficient, you can get a full, 2x speed-up by dropping to 16-bit. GP100, with 56 SMs enabled, will have a peak performance of 21.2 TeraFLOPs.

You can multiply by 60/56 to see what the full GP100 processor could be capable of, but we're not going to do that here. The reason why: FLOP rating is also dependent upon the clock rate. If GP100's 1328 MHz (1480 MHz boost) is conservative, as we found on GM200, then this rate could get much higher. Alternatively, if NVIDIA is cherry-picking the heck out of GP100 for Tesla P100, the full chip might be slower. That said, enterprise components are usually clocked lower than gaming ones, for consistency in performance and heat management, so I'd guess that the number might actually go up.

Third, yes this list is continuing, there is a whole lot more memory performance. GP100 increases the L2 Cache from 3MB with GM100 to 4MB with GP100. Since Maxwell, NVIDIA can disable L2 Cache blocks (remember the 970?) so we're not sure if this is its final amount, but I expect that it will be. 4MB is a nice, round number, and I doubt they would mess with the memory access patterns of a professional GPU for scientific applications.

They also introduced this little thing called "HBM2" that seems to be making waves. While it will not achieve the 1TB/s bandwidth that was rumored, at least not in the 16GB variant announced today, 720 GB/s is nothing to sneer at. This is a little more than double what the Titan X can do, and it should be lower latency as well. While NVIDIA hasn't mentioned this, lower latency means that a global memory access should take fewer cycles to complete, reducing the stall in large tasks, like drawing complex 3D materials. That said, GPUs already have clever ways of overcoming this issue, such as parking shaders mid-execution when they hit a global memory access, letting another shader do its thing, then returning to the original task when the needed data is available. HBM2 also supports ECC natively, which allows error correction to be enabled without losing capacity or bandwidth. It's unclear whether consumer products will have ECC, too.

Pascal also introduces two new features: NVLink and Unified Memory. NVLink is useful for multiple GPUs on an HPC cluster, allowing them to communicate at a much higher bandwidth. NVIDIA claims that Tesla P100 will support four "Links", yielding 160 GB/s in both directions. For comparison, that is about half of the bandwidth of Titan X's GDDR5, which is right there on the card beside it. This also plays in with Unified Memory, which allows the CPU to share memory space with the GPU. Developers could write serial code that, without performing a copy, can be modified by a GPU for a burst of highly-parallel acceleration.

Where can you find this GPU? Well, let's hear what Josh has to say about it on the next page.

Video News

About The Author

Scott Michaud

Scott joined PC Perspective in May 2011. Prior to PC Perspective, Scott has worked on personal projects and has completed degrees in Physics and Education from Queen's University. While he does not write for other hardware sites, Scott works full-time as a software developer for Eliot Research & Consulting. He is also a geek, go figure.

54 Comments

Anonymous on April 6, 2016 at 6:35 am

I am not sure how they are
I am not sure how they are fitting that on an interposer. I thought that HBM2 die stacks are specified to be 92 mm2 compared to 49 mm2 for HBM1. Four of them should be close to 400 mm2. With a 610 mm2 GPU die, that would require a 1000 mm2 interposer. Is that a possibility? I guess they could be using only two 8 GB stacks, but they would need to increase the clock speed significantly to go from 512 (spec) to 720 GB/s.
Reply
- Pixy Misa on April 6, 2016 at 8:32 am
  
  As I understand it, silicon
  As I understand it, silicon interposers aren’t restricted by the reticle size. The features are large enough that there isn’t an alignment problem; they can take up the entire wafer if need be.
  Reply
  - Anonymous on April 8, 2016 at 7:16 am
    
    That is probable a lot more
    That is probable a lot more expensive if they have to resort to making them larger than the reticule size.
    Reply
- Josh Walrath on April 6, 2016 at 12:55 pm
  
  I’m not entirely sure either.
  
  I'm not entirely sure either. Yes, you can have multiple exposures to make the interposer larger, but I think that how they are getting around it is that the 4 GB HBM2 dies are much smaller than the 8 GB HBM2 units.
  Reply
  - Anonymous on April 6, 2016 at 6:20 pm
    
    What about Asynchronous
    What about Asynchronous Compute for Pascal, has there been any improvement in Pascal over Maxwell for Nvidia’s new Pascal micro-Arch to better schedule processor threads to better utilize the GPU’s core execution resources. Is there still the need to wait until the end of a draw call to schedule graphics or compute threads in Nvidia’s new Pascal based GPUs or have they improved their thread scheduling granularity to a point that little execution resources are left idle while there is work backed up in the execution queues!
    Reply
    - Scott Michaud on April 6, 2016 at 8:39 pm
      
      These issues aren’t going to
      
      These issues aren't going to be discussed at a CUDA/OpenCL summit. No idea.
      Reply
      - Anonymous on April 6, 2016 at 10:33 pm
        
        And yet at the Register, a
        And yet at the Register, a non gaming website!!!
        
        “Software running on the P100 can be preempted on instruction boundaries, rather than at the end of a draw call. This means a thread can immediately give way to a higher priority thread, rather than waiting to the end of a potentially lengthy draw operation. This extra latency – the waiting for a call to end – can really mess up very time-sensitive applications, such as virtual reality headsets. A 5ms delay could lead to a missed Vsync and a visible glitch in the real-time rendering, which drives some people nuts.
        
        By getting down to the instruction level, this latency penalty should evaporate, which is good news for VR gamers. Per-instruction preemption means programmers can also single step through GPU code to iron out bugs.”(1)
        
        (1)
        http://www.theregister.co.uk/2016/04/06/nvidia_gtc_2016/
        
        It looks like the websites the cover mostly professional server news have a better handle on the technical aspects of new GPU hardware more so than the gaming websites! I can not wait for the Zen server SKUs to be reviewed at the Register for some more information on that front! It looks like Nvidia fixed that on their server/HPC GPU variants so hopefully the same can be said for the consumer variants!
        Having unallocated compute resources when the queue is backed up is very bad for processor utilization so it’s good that Nvidia fixed that! The HPC/workstation market is not going to tolerate any hardware gimping form Nvidia in this matter so keep up the improvements Nvidia, asynchronous compute in very important!
        Reply
        
        Allyn Malventano on April 14, 2016 at 9:19 pm
        
        The fact that they think a
        
        The fact that they think a single draw call can stall the pipeline for 5ms and cause a missed VR sync pretty much means they are only going off of one line of information and trying to write much smarter than they are. 50ns is a far cry from 5ms.
        Reply
  - Anonymous on April 8, 2016 at 8:01 am
    
    Looking at the Anandtech
    Looking at the Anandtech article (again):
    
    http://www.anandtech.com/show/9969/jedec-publishes-hbm2-specification
    
    The slide titled “Mechanical Outline : Molded KGSD” indicates that the 40 mm2 and the 92 mm2 is the size of the package in the specification. The chips would need to be this size for the micro-bumps to line-up with the pads or micro-bumps on the interposer. The micro-bump array is part of the JEDEC specification. Unless they are making non-standard HBM2, it doesn’t look like it changes anything if they are going with 4 GB x 4 packages. It says 16 GB and a 4096-bit interface which does imply a 4×4 system. It looks like they will be ~92 mm2 each and the interposer will need to be around 1000 mm2. Is the picture in the article supposed to be an actual device or just a mockup? We already know these things will not be cheap, so they may just be making interposers larger than the reticle. Don’t expect such a device in the consumer market anytime soon, if ever though.
    Reply
- Scott Michaud on April 6, 2016 at 8:43 pm
  
  It’s definitely 4x4GB.
  
  It's definitely 4x4GB.
  Reply
- Anonymous on April 8, 2016 at 7:18 am
  
  It is actually about 40 mm2
  It is actually about 40 mm2 for HBM1, not 49.
  Reply
- Anonymous on April 8, 2016 at 7:18 am
  
  It is actually about 40 mm2
  It is actually about 40 mm2 for HBM1, not 49.
  Reply
Anonymous on April 6, 2016 at 6:44 am

It is surprising that they
It is surprising that they are making such a large die at 16 nm. I suspect that when, or if, any of these make it to the consumer market that they will be significantly more expensive than the Titan X was. There is no way that they are going to be able to make a 600+ mm2 die on 16 nm with yields anywhere close to what was possible with 28 nm. They may have a huge number of defective parts that can be salvaged though, so perhaps that can do a very cut down consumer version.
Reply
- Spunjji on April 6, 2016 at 1:41 pm
  
  I will not be surprised if
  I will not be surprised if this ends up as another Fermi – late, initially low-yielding and brute forcing its way to performance dominance. If so then it will probably really shine in its second iteration when they bring down the defect rate and/or with the smaller die variants.
  
  Funnily enough it looks like AMD are taking the same approach they did back then, with a smaller and more area-efficient product.
  Reply
  - Josh Walrath on April 6, 2016 at 3:24 pm
    
    It is interesting to see that
    
    It is interesting to see that NV might be taking that route again. But at least with this implementation, there is no other competition at this extremely high end. Does not matter how hot it runs or yields, if they sell each card for 15K then they are more than covering their expenses getting these parts out.
    Reply
- Anonymous on April 6, 2016 at 6:10 pm
  
  If they are clocking their
  If they are clocking their GPUs higher then maybe the 16nm process in engineered to have a larger pitch(distance between circuits/gates) than would normally be used for a GPU’s layout. CPUs have their layouts less dense to run at higher clocks but that does not negate the circuit gains from having 14nm/16nm gate sizes, it just means that the circuits are spaced fruther apart for better heat dissipation. Circuit pitch(spacing) plays an important part in the thermal ability of a processors to handle higher clock speeds at the cost of space savings even at 14nm/16nm gate sizes with their inherent advantages. So maybe the larger die size is a trade off for higer clock speeds at the cost of some loss of space savings for these high end server/pro SKUs.
  Reply
  - Scott Michaud on April 6, 2016 at 8:41 pm
    
    Yeah, it’s interesting that
    
    Yeah, it's interesting that NVIDIA's enterprise clock is around 400 MHz higher than it used to be. I'm curious to see how high consumer will go.
    Reply
[CoFR]Prodeous on April 6, 2016 at 8:11 am

Seeing Fury X single
Seeing Fury X single precision being at 8.6 Gigaflops, Pascals 10.6 Gigaflops doesn’t sound such a major improvement.

Will admit that dual precision is where Pascal shines, exceeding 5 Gigaflops. Nicely done.

I am highly confused about the 1/2 precision. It feels like such a marketing play.. after all 20 Gigaflops of 1/2 precision is a big number. My question is what would this be used for?
Reply
- [CoFR]Prodeous on April 6, 2016 at 8:12 am
  
  Correction. Teraflops, not
  Correction. Teraflops, not Gigaflops on all numbers …
  Reply
- Vicen on April 6, 2016 at 8:41 am
  
  Usually raw teraflops is not
  Usually raw teraflops is not a very good measure of how fast you can compute in practice, memory bandwith tends to be the bottleneck. In this case I would expect the performance of the P100 exceed the Fury X by a larger marging than 20%, for the deep learning or the CFD that we use the GPUs for the P100 could easily run twice as fast as the Fury.
  Reply
  - [CoFR]Prodeous on April 6, 2016 at 1:04 pm
    
    Is FP16 or half-precision
    Is FP16 or half-precision used for deep learning?
    
    With regards to 20% margin. For me it is just about their claim. I agree that actual performance might be bigger. HBM2 will in itself play a big role.
    
    So i’m not even going to guess the actual performance.
    
    Will admit I am really really happy that they are back to 1/2 speed dual precision compared to 1/32? in the previous ones.
    Reply
    - Josh Walrath on April 6, 2016 at 3:25 pm
      
      Yeah, there are a lot of
      
      Yeah, there are a lot of workloads where FP16 is more than adequate for their needs. Deep learning is one of them.
      Reply
- renz on April 6, 2016 at 2:59 pm
  
  Deep learning. SP might be
  Deep learning. SP might be underwhelming but then again you have to look how well nvidia GK110/GK210 defend their position vs much superior amd hawaii firepro.
  Reply
- Anonymous on April 6, 2016 at 6:31 pm
  
  That is done to compete with
  That is done to compete with the Xeon Phi, and also the requirments of the server/HPC market for which this SKU is the intended target. So that DP FP to SP FP ratio is better for more DP compute performence that the server/HPC market requires. So this SKU is ahead of the Xeon Phi by a larger margin.
  Reply
Anonymous on April 6, 2016 at 1:09 pm

“AMD’s Fury may be the first
“AMD’s Fury may be the first GPU to feature an interposer and HBM1 memory, but the P100 will quickly outclass this product. If high end enthusiasts can even get a hold of it…”

And therein lies the rub. AMD was able to do it and get it into the hands of high-end gamers. NVIDIA can’t even do that.
Reply
- Anonymous on April 6, 2016 at 1:39 pm
  
  Well, it hardly matters if
  Well, it hardly matters if the memory isn’t bottlenecking and considering how well Ti-line performs compared to Fury, it is to be said, that in gaming it really doesn’t, at least, not yet. AMD failed on relying too much on HBM-technology too soon, because it seems to be evident that they can’t bring it to consumer products in any meaningful (monetizing) way before Nvidia does so too (HBM2). I doubt Fury brought them enough marketshare and profits compared to the costs of making the product in the first place.
  Reply
  - Spunjji on April 6, 2016 at 1:42 pm
    
    Agreed, but it does mean
    Agreed, but it does mean they’re on their second-gen product with the tech. They have already worked out the complex inter-company relationships needed to get these products running and out the door.
    
    Whether that gives them any benefit /in practice/ remains to be seen.
    Reply
  - Anonymous on April 6, 2016 at 10:57 pm
    
    Really AMD developed the HBM
    Really AMD developed the HBM technology/standard with SK Hynix, and AMD will also be using HBM2, I do not see how that can be a fail for AMD when AMD is already demonstrating consumer based polaris SKUs. AMD is the co-creator of HBM, I doe not see Nvidia being in the fron lines of developing any open standards/JEDEC standards for ALL to use like AMD did with HBM. Hopefully AMD will make some inroads into the HPC workstation market with their server/HPC APUs on an interposer AMD needs that business more than ith needs only the consumer side of things.
    
    Google will be using Power9 based servers so AMD needs to maybe get a power9 license from openpower, and do x86, ARM(K12), and some power based GPU acceleration products. Nvidia has the power GPU acceleration market all to itself currently.(1) PCI 3.0 is not fast enough for the HPC/exascale market so Both AMD and Nvidia will have to compete with Nvida currently leading in the server/HPC market, and x86 based SKUs are no longer the only game in town across all markets except the PC/laptop market but that will change too for the PC/laptop market if some power8/power licensee builds a PC variant using Power ISA based CPUs.
    
    (1)
    
    http://www.theregister.co.uk/2016/04/06/google_power9/
    Reply
    - Anonymous on April 6, 2016 at 11:26 pm
      
      P.S. another article or
      P.S. another article or Google and Rackspace using Power9’s
      
      http://www.theregister.co.uk/2016/04/06/google_rackspace_power9/
      
      LOOK out makers of x86 only based products(Intel) as at least AMD has its K12 custom ARM cores(Jim Keller designed). OpenPower is licensing power8’s, and newer designs, and Nvidia has a lead supplying GPU accelerators for Power8/9 based systems! Better look into that market also AMD and get a power8/9 license and integrate your GCN based Polaris/Vega IP into that power ISA based marketplace.
      Reply
    - Anonymous on April 9, 2016 at 11:19 pm
      
      Well maybe AMD will be
      Well maybe AMD will be working with Intel on a server based option since they been getting so chummie lately.
      Reply
jcaf77 on April 6, 2016 at 4:36 pm

so… when are the consumer
so… when are the consumer enthusiast cards coming out????
Reply
- Josh Walrath on April 6, 2016 at 5:17 pm
  
  My guess would be 2017?
  
  My guess would be 2017?
  Reply
  - jcaf77 on April 6, 2016 at 6:30 pm
    
    so i should go ahead and buy
    so i should go ahead and buy the 980 ti then… btw josh, you are hilarious. my type of humor
    Reply
    - Josh Walrath on April 7, 2016 at 4:19 am
      
      Thanks for watching. Don’t
      
      Thanks for watching. Don't invest in 980Ti yet… give a couple of weeks for more rumors and leaks to come out before spending $600 on a card that could be overshadowed in 3 months.
      Reply
  - svnowviwvn on April 8, 2016 at 6:39 pm
    
    Wrong.
    July
    Wrong.
    
    July 2016.
    
    http://techreport.com/news/29961/rumor-nvidia-to-launch-gtx-1080-and-gtx-1070-at-computex
    
    Nvidia will unveil a consumer Pascal chip to the public at Computex 2016, in the form of its GeForce GTX 1080 and 1070 cards. Digitimes says that card makers will fire up mass production of Pascal-based GeForces during July. Asus, Gigabyte, and MSI are among the players expected to show cards at Computex.
    Reply
    - John H on April 15, 2016 at 3:22 pm
      
      1070/1080 are considered
      1070/1080 are considered ‘high end’ (GP104) – the 1080ti/Titan are ‘enthusiast (GP100 core). The 1080 will likely be faster than a 980Ti.. but later a 1080Ti / Titan will come out that smack that down pretty strongly.. The card is likely either to be a ‘holiday shopping special’ or a 2017 card, based on the timing of everything here.
      
      I bought a 980Ti a month ago to replace my 970 SLI and am very happy; although I bought it because i wanted to play Elite Dangerous at a reasonable setting on my Oculus. If you can wait – the new 1070 or 1080 should be a bargain compared to 980Ti pricing and offer => performance. Not to mention AMD has a whole new generation coming soon too..
      Reply
skysaberx8x on April 6, 2016 at 4:38 pm

shut up and take my money 🙂
shut up and take my money 🙂
Reply
Idiot on April 6, 2016 at 5:06 pm

Didn’t read the article. Does
Didn’t read the article. Does Pascal fully support Async Compute? Thanks.
Reply
- Josh Walrath on April 6, 2016 at 5:18 pm
  
  They didn’t go into that
  
  They didn't go into that level of granularity or address that exact question.
  Reply
  - funandjam on April 6, 2016 at 7:17 pm
    
    I get how async compute works
    I get how async compute works in gaming. But their keynote was about sever, development and professional technologies and applications. Would any of those technologies presented benefit from async compute in any significant way?
    Reply
    - Scott Michaud on April 6, 2016 at 8:36 pm
      
      Not sure, but I doubt it
      
      Not sure, but I doubt it affects OpenCL or CUDA. It's designed to independently load the 3D and compute engines, but the former isn't used there.
      Reply
      - funandjam on April 6, 2016 at 8:46 pm
        
        If what you say is true, then
        If what you say is true, then that makes sense of why it wasn’t brought up in their keynote.
        
        Now we just wait and see when they announce about the consumer desktop GPU side of things.
        Reply
Danny on April 6, 2016 at 5:22 pm

If Pascal performs like crap
If Pascal performs like crap in DirectX 12 performance, I will therefore buy AMD Polaris, plus I don’t want to spend ridiculous amount of money on G-Sync monitor since there’s less benefit with high refresh rate monitor. Moreover, not all games would work with G-Sync, I don’t want to cope with extra input latency which majority PC gamers strongly prefer Vertical-Sync Off.

I don’t about NVIDIA, but it seems like NVIDIA is stepping into monopoly position based on their business standpoint, things like Gameworks, G-Sync, PhysX, etc. But in the meantime, I’d wait until there’s a real benchmark between Polaris and Pascal GPU. And hopefully, next gen. GPU announce at Computex 2016 in Taipei, Taiwan on either end of May or June.
Reply
- Scott Michaud on April 6, 2016 at 8:38 pm
  
  Generally speaking, it makes
  
  Generally speaking, it makes sense to choose the best results for your budget. If Pascal under-performs, then it makes sense to use Polaris. Likewise, vice-versa. We'll see.
  Reply
  - Anonymous on April 9, 2016 at 11:20 pm
    
    Like the guy said if its the
    Like the guy said if its the same he does not want a gsync monitor and Nvidia wont support Freesync like Intel.
    Reply
- Allyn Malventano on April 6, 2016 at 10:00 pm
  
  >> Moreover, not all games
  
  >> Moreover, not all games would work with G-Sync,
  
  I've played a lot of games on FreeSync / G-Sync and the only real requirement appears to be full screen. G-Sync also has a mode that works on desktop / windowed games, but not as well as the full-screen experience.
  
  >> I don't want to cope with extra input latency which majority PC gamers strongly prefer Vertical-Sync Off.
  
  VRR displays continue to draw at the 'max' speed even when in varying frame rates, which means that the speed of the scan at 40 FPS is just as fast as it is at 144/165. The end result is that the advantage to running VSYNC-off is nearly negligible.
  Reply
john vitz on April 6, 2016 at 9:50 pm

No tdp improvement. One day
No tdp improvement. One day a jillion watt gpu and it will be norm
Reply
- JesusBaltan on April 11, 2016 at 3:32 pm
  
  no improvement?
  look at the
  no improvement?
  
  look at the clock frequencies of the GPU and then we’ll talk
  Reply
Anonymous on April 7, 2016 at 2:08 am

And again, we get a nice
And again, we get a nice little 3D rendering of “Pascal,” that’s pretty dandy
But where the hell is a physical pascal chip? Seriously, something must have gone WAY wrong if they haven’t even showed a chip publicly, let alone in a demo..

Meanwhile RTG has been setting up demos for what, three months now?

Something has definitely gone wrong.
Reply
Mandrake on April 7, 2016 at 6:36 am

Nice write up! For those who
Nice write up! For those who didn’t see it, Nvidia did indeed have a P100 demo at GTC.

http://a.disquscdn.com/uploads/mediaembed/images/3460/3367/original.jpg

It appears to be eight GP100 GPUs running in parallel.
Reply
PL on April 7, 2016 at 1:02 pm

Am I reading this right? “GPU
Am I reading this right? “GPU Die Size 551 mm2 601 mm2 610 mm2 610mm2”

61cm² dies? that’s almost as big as my intire case… someone get their metric system wrong?
Reply
- Tim Verry on April 7, 2016 at 6:44 pm
  
  6.1cm^2 dies
  
  6.1cm^2 dies
  Reply
- Scott Michaud on April 8, 2016 at 4:26 am
  
  100mm2 = 1cm2
  1cm2 = 1cm *
  
  100mm² = 1cm²
  
  1cm² = 1cm * 1cm = 10mm * 10mm = 100mm²
  Reply
  - John H on April 15, 2016 at 3:22 pm
    
    How many inches is that?
    
    😉
    How many inches is that?
    
    😉
    Reply

NVIDIA Pascal Architecture Details, Tesla P100, GP100 GPU

93% of a GP100 at least…

Big Pascal finally embraces FP64 performance!

Video News

About The Author

Scott Michaud

54 Comments

Leave a reply Cancel reply

Latest Podcasts

Archive & Timeline

Previous 12 months

Explore: All The Years!

Shop new Deals of the Day at GameStop.com!

User login status

NVIDIA Pascal Architecture Details, Tesla P100, GP100 GPU

93% of a GP100 at least…

Big Pascal finally embraces FP64 performance!

Video News

About The Author

Scott Michaud

Related Posts

OCZ Announces 8800 GTX Card

NVIDIA Announces RTX 30 Series: RTX 2080 Ti Performance Now Starts at $499

GDC 15: AMD Mantle Might Be Dead as We Know It: No Public SDK Planned

What, me jealous? Four weeks with SLI’d GTX 980s

54 Comments

Leave a reply Cancel reply

Latest Podcasts

Archive & Timeline

Previous 12 months

Explore: All The Years!

Shop new Deals of the Day at GameStop.com!

User login status