A new architecture with GP104
We have a review of the GeForce GTX 1080 Founders Edition for you. It’s the new king. Get in here and read it. Now.
Table of Contents
- Asynchronous compute discussion
- Is only 2-Way SLI supported?
- Overclocking over 2.0 GHz
- Dissecting the Founders Edition
- Benchmarks begin
- VR Testing
- Impressive power efficiency
- Performance per dollar discussion
- Ansel screenshot tool
The summer of change for GPUs has begun with today’s review of the GeForce GTX 1080. NVIDIA has endured leaks, speculation and criticism for months now, with enthusiasts calling out NVIDIA for not including HBM technology or for not having asynchronous compute capability. Last week NVIDIA’s CEO Jen-Hsun Huang went on stage and officially announced the GTX 1080 and GTX 1070 graphics cards with a healthy amount of information about their supposed performance and price points. Issues around cost and what exactly a Founders Edition is aside, the event was well received and clearly showed a performance and efficiency improvement that we were not expecting.
The question is, does the actual product live up to the hype? Can NVIDIA overcome some users’ negative view of the Founders Edition to create a product message that will get the wide range of PC gamers looking for an upgrade path an option they’ll take?
I’ll let you know through the course of this review, but what I can tell you definitively is that the GeForce GTX 1080 clearly sits alone at the top of the GPU world.
GeForce GTX 1080 Specifications
Much of the information surrounding the specifications of the GTX 1080 were revealed last week with NVIDIA’s “Order of 10” live stream event. There are some more details we can add now to clock speeds that should paint a very interesting picture of where NVIDIA has gone with the GTX 1080 and GP104 GPU.
GTX 1080 | GTX 980 Ti | TITAN X | GTX 980 | R9 Fury X | R9 Fury | R9 Nano | R9 390X | |
---|---|---|---|---|---|---|---|---|
GPU | GP104 | GM200 | GM200 | GM204 | Fiji XT | Fiji Pro | Fiji XT | Hawaii XT |
GPU Cores | 2560 | 2816 | 3072 | 2048 | 4096 | 3584 | 4096 | 2816 |
Rated Clock | 1607 MHz | 1000 MHz | 1000 MHz | 1126 MHz | 1050 MHz | 1000 MHz | up to 1000 MHz | 1050 MHz |
Texture Units | 160 | 176 | 192 | 128 | 256 | 224 | 256 | 176 |
ROP Units | 64 | 96 | 96 | 64 | 64 | 64 | 64 | 64 |
Memory | 8GB | 6GB | 12GB | 4GB | 4GB | 4GB | 4GB | 8GB |
Memory Clock | 10000 MHz | 7000 MHz | 7000 MHz | 7000 MHz | 500 MHz | 500 MHz | 500 MHz | 6000 MHz |
Memory Interface | 256-bit G5X | 384-bit | 384-bit | 256-bit | 4096-bit (HBM) | 4096-bit (HBM) | 4096-bit (HBM) | 512-bit |
Memory Bandwidth | 320 GB/s | 336 GB/s | 336 GB/s | 224 GB/s | 512 GB/s | 512 GB/s | 512 GB/s | 320 GB/s |
TDP | 180 watts | 250 watts | 250 watts | 165 watts | 275 watts | 275 watts | 175 watts | 275 watts |
Peak Compute | 8.2 TFLOPS | 5.63 TFLOPS | 6.14 TFLOPS | 4.61 TFLOPS | 8.60 TFLOPS | 7.20 TFLOPS | 8.19 TFLOPS | 5.63 TFLOPS |
Transistor Count | 7.2B | 8.0B | 8.0B | 5.2B | 8.9B | 8.9B | 8.9B | 6.2B |
Process Tech | 16nm | 28nm | 28nm | 28nm | 28nm | 28nm | 28nm | 28nm |
MSRP (current) | $599 | $649 | $999 | $499 | $649 | $549 | $499 | $329 |
There are two direct comparisons worth looking at with the GeForce GTX 1080. Both the GTX 980 and the GTX 980 Ti are competitors to the GTX 1080 – the GTX 980 in terms of GPU-specific placement and the GTX 980 Ti in terms of “king of the hill” single GPU consumer graphics card performance leadership. (The Titan X is obviously faster than the 980 Ti, but not by much, and its price tag puts it in a different class.)
With 2560 CUDA cores, the GTX 1080 has 10% fewer than the GTX 980 Ti but 25% more than the GTX 980. Those same ratios apply to the texture units on the cards as well, though the GTX 980 and GTX 1080 both are configured with 64 raster operators (ROPs). The GTX 980 Ti has 96 ROPs, an increase of 50%. Despite the modest advances the new GTX 1080 has over the GTX 980, and the supposed deficit it has when compared to the GTX 980 Ti, this new card has something else on its side.
Clock speed.
The GTX 1080 will have a base clock speed of 1607 MHz and a rated Boost clock of 1733 MHz! The base clock is 60% higher than the GTX 980 Ti and 42% higher than the GTX 980 and that is clearly where the new GP104 GPU gets so much of its performance.
A quick glance at the memory specifications indicates that the move to GDDR5X (G5X) has helped NVIDIA increase performance here as well. With just a 256-bit memory bus the GTX 1080 produces 320GB/s of bandwidth via a 10 Gbps / 5.0 GHz speed, outpacing the GTX 980 by 42% yet again. The GTX 980 Ti and Titan X do have higher total memory throughputs though, with 384-bit buses measured at 336 GB/s, but NVIDIA has made improvements in the compression algorithms with Pascal that should increase effect bandwidth even above that.
The first consumer GPU we have seen built on the 16nm (or 14nm) FinFET process consists of 7.2 billion transistors but only has a rated TDP of 180 watts. That is slightly higher than the GTX 980 (165 watts) but significantly lower than the GTX 980 Ti (250 watts). After looking at performance results I think you’ll be impressed with the performance/watt efficiency improvements that NVIDIA has made with Pascal, despite the increased transistor count and clock speeds.
Pascal and GP104 Architecture – How we got the GTX 1080
How does the GTX 1080 get this level of clock speed improvement and performance uptick over Maxwell? Pascal combines a brand new process technology and a couple of interesting architecture changes to achieve the level of efficiency we see today.
One interesting change visible in the block diagram above is a shift to embedding five SMs (simultaneous multiprocessor) into a single GPC (Graphics Processing Cluster). This changes the processing ratios inside the GPU when compared to Maxwell that had four SMs for each GPC. Essentially, this modification puts more shading horsepower behind each of the raster engines of the GPC, a balance that NVIDIA found as an improvement for the shifting workloads of games.
16nm FinFET Improvements and Challenges
The first and easily most important change to Pascal is the move away from the 28nm process technology that has been in use for consumer graphics cards since the introduction of the GeForce GTX 680 back in March of 2012. Pascal and GP104 are built around the 16nm FinFET process from TSMC and with it come impressive improvements in power consumption and performance scaling.
A comment on YouTube properly summed up this migration in a way that I think is worth noting here.
Using Intel parlance, Pascal is a tick and tock in the same refresh (making up for Kepler>Maxwell being no tick and half a tock), so it's understandable that it's blowing the doors off the 980 that it replaces. -Jeremiah aka Critical Hit
Who knew such interesting commentary could come from YouTube, right? But it is very much the case that the GPU industry had some “pent up” ability to scale that was being held back by the lack of a step between 28nm and 16/14nm process nodes. (20nm just didn’t work out for all parties involved.) Because of it, I think most of us expected Pascal (and in theory AMD’s upcoming Polaris architecture) to show accelerated performance and efficiency with this generation.
Migrating from 28nm to 16nm FinFET is not a simple copy and paste operation. As NVIDIA’s SVP of GPU Engineering, Jonah Alben, stated at the editor’s day earlier this month, “some fixes that helped with 28nm node integration might actually degrade and hurt performance or scaling at 16nm.” NVIDIA’s team of engineers and silicon designers worked for years to dissect and perfect each and every path through the GPU in an attempt to improve clock speed. Alben told us that when Pascal engineering began optimization, the Boost clock was in the 1325 MHz range, limited by the slowest critical path through the architecture. With a lot of work, NVIDIA increased the speed of the slowest path to enable the 1733 Boost clock rating they have on the GTX 1080 today.
Optimizing to this degree allows NVIDIA to increase clock speeds, increase CUDA core counts and increase efficiency on GP104 (when compared to GM204) all while moving the die size from 398 mm2 to 314 mm2.
Simultaneous Multi-Projection - A new part of the PolyMorph Engine
The only true addition to the GPU architecture itself is the inclusion of a new section to the PolyMorph Engine, now branded as version 4.0. The Simultaneous Multi-Projection block is at the end of the geometry portion of the pipeline but before the rasterization step. This block creates multiple projection schemes from a single geometry stream, up to 16 of them, that share a single viewpoint. I will detail the advantages that this feature will offer for gamers in both traditional and VR scenarios, but from a hardware perspective, this unit provides impressive functionality.
Software will be able to tell Pascal GPUs to replicate geometry in the stream up to 32 times (16 projections x 2 projection centers) without overhead affecting the software as that geometry flows through the rest of GPU. All of this data stays on chip and is hardware accelerated, and any additional workload that would go into setup, OS handling or geometry shading is saved. Obviously all of the rasterized pixels that are created by the multiple projections will have to be shaded, so that compute workload won’t change, but in geometry heavy situations the performance improvements are substantial.
Displays ranging from VR headsets to multiple-monitor Surround configurations will benefit from this architectural addition.
Updated Memory - GDDR5X and New Compression
If you thought the 28nm process on the GTX 980 and GM204 was outdated, remember that GDDR5 memory was first introduced in 2009. That is what made AMD’s move to HBM (high bandwidth memory) with the Fiji XT GPU so impressive! And while NVIDIA is using HBM2 for the GP100 GPU used in high performance computing applications, the consumer-level GP104 part doesn’t follow that path. Instead, the GTX 1080 will be the first graphics card on the market to integrate GDDR5X (G5X).
GDDR5X was standardized just this past January by JEDEC so it’s impressive to see an implementation this quickly with GP104. Even though the implementation on this GPU runs at 5.0 GHz, quite a bit slower than the 7.0 GHz the GTX 980 runs at with GDDR5 (G5), it runs at double the data rate, hitting 10 Gbps of transfer. The result is a total bandwidth rate of 320 GB/s with a 256-bit bus.
NVIDIA talked quite a bit about the design work that went into getting a GDDR5X memory bus to operate at these speeds, throwing impressive comparisons around. Did you know that NVIDIA’s new memory controller has only about 50 ps (picoseconds) to sample data coming at this speed, a time interval that is so small, light can only travel about half an inch in its span? Well now you do.
I am not underselling the amount of work the memory engineers at NVIDIA went through to implement G5X at these speeds, including the board and channel design necessary to meet the new tolerances. Even better, NVIDIA tells us that the work they put into the G5X integrate on the GTX 1080 will actually improve performance for the GTX 1070 with G5 memory.
NVIDIA has also improved on the memory compression algorithms implemented on the GPU to improve effective memory bandwidth through the product. Compressing data with a lossless algorithm as it flows inside the GPU and in and out of GPU memory reduces the amount of bandwidth required for functionality across the board. It’s an idea that has been around for a very long time, though as algorithms improve, we see it as an additive change to GPU memory interface performance.
Maxwell introduced a 2:1 delta color compression design that looked at pixel color values in a block and stored them in a few of fixed values as possible, using offsets from those fixed values to lower the size of the data to be stored. Pascal improves on the 2:1 ratio algorithm to enable it to be utilized in more situations, but also adds support for a 4:1 and 8:1 option. The 4:1 algorithm looks for blocks where the pixel changes are much smaller, and can be represented by even less data on the offset. And if you are lucky enough to utilize the 8:1 algorithm, it combines the 4:1 option with the 2:1 to look for blocks that share enough data that they can be compressed against each other.
These images above show a screenshot from Project CARS and compares memory compression from Maxwell to Pascal. Every pixel that is compressed in at least a 2:1 delta color algorithm is color pink; Pascal has definitely improved.
In general, compression algorithm changes over Maxwell give GP104 an effect 20% increase in memory bandwidth over the GTX 980. Combining that with the 40% improvement in rated bandwidth between the two cards and you have a total improvement of 1.7x on effective memory performance between the GTX 1080 and the GTX 980.
Stop drinking the nvidia
Stop drinking the nvidia Kool-aid Ryan, consistently comparing the 1080 to the 980 like nvidia wants you to. You should be comparing it to the 980 TI if you want to give a true impartial review. 980 TI had the same launch price, and is consider to be the card to beat.
980Ti launched at a higher
980Ti launched at a higher MSRP. Also, I included 980Ti in VR results. 1080 is marketed as their mid-level GPU (as in there will be a 1080Ti), so 1080 to 980 is fair…
*edit* wait a sec, Ryan *did* test 980Ti. What point were you trying to make exactly again?
Unfortunately Nvidia hasn’t
Unfortunately Nvidia hasn’t given a date to the $599 1080. What we have is the $699 Founders Edition.
As of now the 1080 will launch higher then the 980 TI.
Will Nvidia extend the time
Will Nvidia extend the time for founders edition sales and push back non founders cards………you bet ! Watch for customers getting pissed off once again !
Nice higher end enthusiast
Nice higher end enthusiast card. Can’t wait for the 1080ti and Titan variants that will eventually follow.
When the prices drop
When the prices drop substantially, a second 980Ti will be my choice.
Rather that than ditch the current card..
Pity there wasn’t SLI results for 980Ti SLI shown too. Yeah, no SLI for DX12, but like a lot of people I’m still on Win7.
Good review, thanks!
Good review, thanks!
Holy crap this review is
Holy crap this review is good. New DX12 tool, VR test and power measuring. The amount of data crammed in here was fantastic. Great work Ryan, Allyn, et al.
The GPU ain’t bad neither.
Thanks, it is nice to see
Thanks, it is nice to see someone appreciate how much work it takes to put those together.
Thanks! It was a lot of work
Thanks! It was a lot of work for sure!
Kind of disappointing for a
Kind of disappointing for a new architecture on a double node shrink. Also worrying pricing: if this midrange card is $700, are we going to see $1000 high end (1080ti)? $1500 titan?
The 980ti was released at a lower price and not that much lower performance. One would expect two node shrinks and a $50 increase to provide more than 20% or so additional performance. Even if this is a $600 card, that’s still not much performance considering how long we’ve been waiting for those shrinks – and this is not a $600 card. The product that Ryan received and reviewed costs $700. Sure, you will be able to get similar cards for less in the future, but that’s the case for all graphics cards, and you can’t judge a product based on future price drops.
The tech (and especially frequency) is pretty nice, but this is a midrange card released for above high-end price.
Currently the fastest single
Currently the fastest single GPU gaming card = midrange? Lol.
If you don’t like the price don’t buy it. There’s plenty of opportunity for competition to come in and offer a better product at a lower price. Until that happens Nvidia will extract as much profit as possible.
I suspect supply is going to
I suspect supply is going to be constrained, which is probably most of the reason for the higher Founder’s edition price. If the Ti version is based on the 600 mm2 part, then I would expect prices to be ridiculously high. Yields on 16 nm will not be anywhere near as high as it is for 28 nm parts. They probably do have a huge number of defective parts though and these may be functional enough to be sold as a 980 Ti of some flavor eventually. Perhaps they will have multiple configurations of such salvage parts. It will have to wait for HBM2 production to reduce memory prices before they can make a consumer grade part.
Considering the tone of the reviews, it was probably worth it to do an early launch, even if it ends up being a bit of a paper launch, to get all of the favorable publicity. This way, Nvidia’s cards get compared to old 28 nm parts rather than AMD’s 14 nm parts. It is obvious that the 16 nm part should have significantly better power efficiency than 28 nm parts. Comparisons with 28 nm parts are mostly irrelevant. We don’t how this compares until we have AMD parts for comparison.
allyn, what’s the reason
allyn, what’s the reason behind u guys use driver 348.13??
It was the newest we were
It was the newest we were provided at the time of testing. OC was done with a slightly newer build.
Can you add another DX12
Can you add another DX12 title please ashes of the singularity because we only have one real DX12 title Hitman,until we have another game to try out
(gear of war was just a bad port… so I guess it doesn’t really count)
Because it looks like the ASYNC is closing the gap in fps.
I know Nvidia is using a different type of “async” let’s say on the software side of things.
It doesn’t really matter how they do it, what will at the end is how good games will run.
Will PCPer conduct GPGPU
Will PCPer conduct GPGPU testing for both CUDA and OpenCL performance?
I understand the 1080 is marketed toward gaming, but would be great for video processing as well!
Wow, a lot of whining go on
Wow, a lot of whining go on here in the comments. You know if Ryan didn’t benchmark the game you are interested in seeing, PC Per isn’t the only review site. There are 1080 reviews on all the major sites: Guru3D, HardOCP, Tweak Town, Tom’s Hardware, etc. All the whiners are acting as if they have to be paying registered users of other sites to view their reviews, so they want PC Per to benchmark the games they wanted to see. Literally no intelligent person gets their news / product reviews from a single source. There is nothing wrong with this review. What is the problem is that the results don’t fit the whiners agenda. So what would you have Ryan do, find the 1 or 2 games that AMD can compete against the 1080 with and only benchmark them. I’m not sure there are that many games the Fury X could compete with the 1080. So that review would be very interesting, here are all the new technologies that come with this new GPU, these are the hardware specs, oh and sorry no benchmarks today (whispers behind his hand “because AMD couldn’t beat or compete against it”), so sorry.
As to pricing, sure it could be cheaper, so could Tesla’s Model S, or anything that’s new and awesome. However that’s not how things work. If you don’t like the pricing, don’t buy one. Free market will dictate how much they go for. So if nVidia finds these are sitting on the shelf for a month with little turnover, then the price will come down. Don’t hold your breath for that to happen though. These $700 Founders Editions are going to fly off the shelf. For one reason, anyone wanting to water cool these right away is going to want them. It’s almost guaranteed that there will be waterblocks for these “reference” cards on day 1. Even if nVidia had to run the fan at 100% to hit 2.1GHz, that’s still on air. I can’t believe anyone is complaining about a GPU that can run @ 2.1GHz on air. Before this, that sort of clocking required serious cooling, at the very least custom watercooling or more likely LN2. So if it’ll run @ 2.1GHz on air with the fan at maximum, what can we expect it to run under water? The benchmark kings must be creaming their jeans right now. They are shining their LN2 pots right now just waiting to get their hands on a 1080. How long before we start seeing Fire Strike Ultra scores showing up at the top of the benchmark charts with GPU clocks north of 3GHz?!?
I for one am not disappointed. This thing is a beast. For those wanting more, just wait for the 1080Ti. I’d tell you to wait for Polaris 10, but what would be the point? They’ve already said these cards are going to be mid level cards with a reduced power profile. Not much to get excited about. Also, if rumors are to be believed, we may not see Polaris until October now. According to the rumors Polaris failed validation, so it looks like it might need another silicon spin.
I’m not trying to beat up on AMD, I root for all the new hardware. I really liked the 390/X series, the Fury series was also decent when it released. But now they aren’t the new shiny anymore. Obviously their prices are going to need to come down, especially when the $600 1080’s appear in a month and half. Timing isn’t great for AMD, they didn’t get nearly long enough after the release of the Fury series being on top. They really needed another 6 months. Unfortunately for them, their release schedule for new high end products are spaced out too far apart. It always seems that when AMD releases the new top dog, nVidia is right around the corner with something to slap it back down. Then nVidia releases something even faster before AMD has even had a chance to respond to the previous release.
TL;DR
TL;DR
The asynchronous compute
The asynchronous compute still sounds like it will be mostly unusable. Even if it can preempt at the instruction level, it sounds like it still needs an expensive context switch. A100 microseconds seems like a really slow context switch comparatively speaking.
It would be great to get more in depth analysis of this without spin. It seemed to me that multi-GPU was going to be implemented partially with asynchronous compute going forward. With the asynchronous compute that AMD has implemented for years, compute can be handled by any available compute resource, regardless of location. They could even run on an integrated GPU to supplement the main GPU.
The current techniques are probably a bit of a dead end. AFR and split screen type rendering just doesn’t scale well. Splitting up the workload with finer granularity will be required to make good use of multiple GPUs. Nvidia is holding back the market in this case. If developers want to support multi-GPU on the large installed Nvidia base, then they will not be able to use asynchronous compute to achieve it. Hopefully the market can move forward with asynchronous compute independently of Nvidia due to the installed base in all of the consoles. It will be worthwhile to implement asynchronous compute features for the consoles, so PC ports for AMD cards can hopefully make use of that.
The multi-projection stuff
The multi-projection stuff seems interesting, but it also seems like something that can probably be done quite efficiently in software. It would be good if you can get info from game developers on how this will compare to software solutions. I tend to think that VR will be best served by multi-GPU set-ups, as long as the software support is there. Nvidia seems to have gone the big GPU route, so it is not in their best interest to support multi-GPU set-ups; this opens them to more competition from makers of smaller GPUs. This may not only be AMD going forward, especially in mobile.
Really comprehensive analysis
Really comprehensive analysis here team, thanks for putting it together! Videos with Tom are quite informative too. It will be hard to justify the price for me personally, but image how great it would fold! 1070 might be more for me 😉
1. HDR – what kind ? Dolby
1. HDR – what kind ? Dolby Vision ? Standard Dynamic Range (SDR) ?
or HDR10 ?
the HDR have in the card 1000 nits or 4000 nits ?
Unregistered As usual what supports the card’s hardware or software ?
2. in the picture
http://www.guru3d.com/index.php?ct=a…=file&id=21784
Write Contras over 10,000 :1 with nvidia 1080
In other words previous video cards we received the Contras 2000:1 ???
3. What color system supports nvidia 1080 video card ?
Rec.2020 color gamut or rec P3 ? or only rec 709 ?
http://www.guru3d.com/articles_pages…_review,2.html
If anyone has an answer that will bring links
It is not question of logic
Any GPU can support any color
Any GPU can support any color system – the catch is having enough additional bit depth available to prevent obvious banding when operating in that expanded system. Support for the color system itself really boils down to the display capability and calibration.
It is interesting to note
It is interesting to note that Ashes disables Async Compute if it detects an nVidia card – including Pascal, so we still don’t have an accurate representation of how nVidia really does in that benchmark/game.
I’m upgrading from my 970 to
I’m upgrading from my 970 to this beast of a card. Simply brilliant for new game’s.
Been seeing some thermal
Been seeing some thermal throttling on benchmark results while testing takes place in an actual case and not an open test bench. Might be something to look into. Considering most people don’t game on a bench.
Hi Ryan/Allyn
Would it be
Hi Ryan/Allyn
Would it be possible (when the o/c and custom boards come out) to include results for 980Ti in 2 way SLI for comparison please?
There must be many of us seeing discounted 980Tis now, and a second one would probably be >= a single 1080. So if you already have a 980Ti, adding a second one will keep you going for a couple of years.. And obviate the need to replace a 980Ti.
I know the results are there in other articles, but seeing them on the same graph means so much more!
Still can’t afford this new
Still can’t afford this new flashy stuff, intel hd 4400 forever
Ran some numbers for myself
Ran some numbers for myself and it seems the flops per clock on NVidia gpus hasn’t increased at all, all the way back to the gtx 680. The gtx 580 seem to have twice the flops per clock of the current NVidia gpus. Not sure where the miracles they claim are coming from.