The GM204 Architecture
NVIDIA’s new GM204 GPUs are finally revealed.
James Clerk Maxwell's equations are the foundation of our society's knowledge about optics and electrical circuits. It is a fitting tribute from NVIDIA to include Maxwell as a code name for a GPU architecture and NVIDIA hopes that features, performance, and efficiency that they have built into the GM204 GPU would be something Maxwell himself would be impressed by. Without giving away the surprise conclusion here in the lead, I can tell you that I have never seen a GPU perform as well as we have seen this week, all while changing the power efficiency discussion in as dramatic a fashion.
To be fair though, this isn't our first experience with the Maxwell architecture. With the release of the GeForce GTX 750 Ti and its GM107 GPU, NVIDIA put the industry on watch and let us all ponder if they could possibly bring such a design to a high end, enthusiast class market. The GTX 750 Ti brought a significantly lower power design to a market that desperately needed it, and we were even able to showcase that with some off-the-shelf PC upgrades, without the need for any kind of external power.
That was GM107 though; today's release is the GM204, indicating that not only are we seeing the larger cousin of the GTX 750 Ti but we also have at least some moderate GPU architecture and feature changes from the first run of Maxwell. The GeForce GTX 980 and GTX 970 are going to be taking on the best of the best products from the GeForce lineup as well as the AMD Radeon family of cards, with aggressive pricing and performance levels to match. And, for those that understand the technology at a fundamental level, you will likely be surprised by how much power it requires to achieve these goals. Toss in support for things like a new AA method, Dynamic Super Resolution, and even improved SLI performance and you can see why doing it all on the same process technology is impressive.
The NVIDIA Maxwell GM204 Architecture
The NVIDIA Maxwell GM204 graphics processor was built from the ground up with an emphasis on power efficiency. As it was stated many times during the technical sessions we attended last week, the architecture team learned quite a bit while developing the Kepler-based Tegra K1 SoC and much of that filtered its way into the larger, much more powerful product you see today. This product is fast and efficient, but it was all done while working on the same TSMC 28nm process technology used on the Kepler GTX 680 and even AMD's Radeon R9 series of products.
The fundamental structure of GM204 is setup like the GM107 product shipped as the GTX 750 Ti. There is an array of GPCs (Graphics Processing Clustsers), each comprised of multiple SMs (Streaming Multiprocessors, also called SMMs for this Maxwell derivative) and external memory controllers. The GM204 chip (the full implementation of which is found on the GTX 980), consists of 4 GPCs, 16 SMMs and four 64-bit memory controllers.
Each SMM features 128 CUDA cores, or stream processors, bringing the total for this product to 2048. That is significant drop from the 2880 CUDA cores found in the GTX 780 Ti (full GK110 chip) but as you'll soon find out, thanks to the higher clock speeds and performance efficiency changes, the GTX 980 matches or beats the GTX 780 Ti in every test we have run.
The SMM also features an improved Polymorph engine for geometry processing and 8 texture units. All of the above combines for a total of 128 texture units, a 256-bit memory bus, 64 ROPs (raster operators), and 2MB of L2 cache.
The GeForce GTX 680, the GK104 based graphics card that was launched in March of 2012, provides some interesting comparisons. First, the obvious: the GTX 980 will have 33% more processor cores, higher clock speeds, and thus a much higher peak compute rate. Texture unit count remains the same but the doubling of ROP units gives the Maxwell GPU better performance in high resolution anti-aliasing. Memory bandwidth is increased by a modest amount, but NVIDIA has made more enhancements to improve in that area as well with GM204.
Look at those bottom four statistics though: GM204 is 1.66 billion transistors larger and has a 35% larger die size yet is able to quote a TDP that is 30 watts lower than GK104 using the same 28nm process. If you take performance into consideration though, the GTX 980 should be going up against the GK110-based GTX 780 Ti – a GPU that has a 250 watt TDP and a 7.1 billion transistor count (and a die size of 551 mm^2). As we dive into the benchmarks on the following pages, you will gain an understanding of why this is so interesting.
The SMMs of Maxwell are a fundament change when compared to Kepler. Rather than a single block of 192 shaders, the SMM is divided into four distinct blocks that each have a separate instruction buffer, scheduler, and 32 dedicated, non-shared CUDA cores. NVIDIA states that this simplifies the design and scheduling logic required for Maxwell saving on area and power. Pairs of these blocks are grouped together and share four texture filtering units and a texture cache. Shared memory is a different pool of data that is shared amongst all four processing blocks of the SMM.
With these changes, the SMM can offer 90% of the compute performance of the Kepler SM but with a smaller die area that allows NVIDIA to integrate more of them per die. GK104 had 8 SMs (1536 CUDA cores) while GM204 has 16 SMs (2048 CUDA cores) giving it a 2x SM density advantage.
NVIDIA's Jonah Alben indicated to us that the 192-core based SMs used in Kepler seemed great at the time but that they introduced quite a few inefficiencies due to a non-power-of-2 count. It was more difficult for the scheduling hardware to keep the cores full and processing on data and the move to 128-core SMMs helps address this. Speaking of scheduling, there were efficiency changes there as well. The arrangement of instruction scheduling now occurs much earlier in the pipeline, preventing re-scheduling in many instances, helping to keep power down and performance high.
Other than the dramatic changes to the SMM, the 2 MB L2 cache that NVIDIA has implemented on Maxwell is another substantial change. Considering that the Kepler had an L2 cache implementation at 512 KB, we are seeing an 8x increase in available capacity which should reduce the demand on the integrated memory controller of GM204 dramatically.
Texture fill rate between the GK104 and GM204 increases by 12% (thanks to clock speeds, not texture unit counts) though pixel fill rate more than doubles, going from 32.2 Gpixels/s to 72 Gpixels/s on GTX 980.
A 256-bit memory bus might seem like a downgrade for a flagship card as the GTX 780 Ti featured a 384-bit offering (though the GTX 680 featured a 256-bit controller as well). But there are several changes NVIDIA has made to improve memory performance with that smaller bus. First, the clock speed of the memory is now 7.0 GHz and the GM204 cache is larger and more efficient, reducing the number of memory requests that have to be made into DRAM.
Another change is the implementation of a third-generation delta color compression algorithm that attempts to lower the bandwidth required for any single operation. The compression happens both when data is written out to memory and when it is read again for the application, attempting to get as high as 8:1 compression on blocks of matching color value (4×2 pixel regions). The delta color compression compares neighboring color blocks and attempts to minimize the number of bits stored by looking at color differences. Obviously, if the data is very random and cannot be compressed at all, then it will just be written to the memory in a 1:1 mode.
NVIDIA claims that Maxwell requires 25% less memory bandwidth on a per-frame basis when you combine the improved caching and compression techniques on Maxwell. As a result, even though the raw GB/s values of GM204 are only marginally higher than that of GK104, the effective memory bandwidth of the new GTX 980/970 cards will appear much better to developers and in games.
There are other changes in GM204 that do not exist in GM107 to help improve performance of certain features that NVIDIA is bringing to the market. Those help build the foundation for VXGI global illumination and more.
It is great to see a high
It is great to see a high performing card not pushing the limits of power consumption and cooling. I would caution against assuming that all of these gpus will be able to reach such high clock speeds though. If you significantly reduce the thermal limitations, you will bump up against another limitation quite quickly, and these gpus are already clocked high and the voltage is already high. We may see overclocks vary considerably, as we do with some cpus.
I don’t care much about power consumption in a desktop system, and most high-end pc gamers don’t seem to care either. I only really care about the load power consumption from a noise perspective, as long as it isn’t too ridiculous; I don’t need a space heater. The increased performance and new features are more of a selling point as far as I am concerned. I think my next system should support hardware HEVC decode/encode. I may be getting a new laptop before I build a new desktop, so will the 980 gpu make a good laptop part? It seems that a lot of performance comes from the high clock, which may not be doable in notebook thermal limitations. The wider chips may actually do better if you push them down to much lower power.
GTX 970 is the real game
GTX 970 is the real game changer. it obsoletes AMD’s R9 290 series and embarasses them in perf/watt. AMD will have to price R9 290 at $279 – $299 and R9 290X at $379 – $399.
Sorry but that is a stupid
Sorry but that is a stupid statement to make; how can this card render another card obsolete?
I merely makes the argument to purchase the R9 series a little harder.
From this this review it
From this this review it really has tempered my impression of what I seen the 970 in performance against a 290, they seem to spar in this data verse other reviews. I’ve seen other reviews that seem to make the 970 more often as strong as a 290X. While power reduction is noteworthy isn’t by Ryan number the 290 is something close to using 25% more? It’s good don’t wrong, but I might say when a nice OC custom 290 for $300 or less it’s still in the fray.
Thanks for the review Ryan,
Thanks for the review Ryan, seems like the 970 will be a lot more popular, myself i just bought a fresh new 980 from NewEgg, i am upgrading from a 560ti no joking, think ill see any difference? 😉
Thank you for testing CF vs
Thank you for testing CF vs SLI. So it seems XDMA engine shows its strengths and 290X CF is practically tied with 980 SLI. So “upgrading” would be pointless if you own 290X CF setup like I do other than getting lower power consumption.
Ryan, if you guys don’t mind
Ryan, if you guys don’t mind ofc, would you kindly test the HDMi 2.0 port with an actual TV like fx UE55HD6900 for 4K(UHD really) desktop’ping?
I would appriciate that so much, maybe keep 1x 970/980 fixed and then include like 4 different UHD TV’s all over 40 inches ofc, as the gain from normal monitor distance is where a 4K/UHD really shines in desktop usage. Ie using a 40-65 inch display from 2 feet max is where you can really use the UHD and not feel like looking for your magnifier glass.
Great job as usual!
We are working on getting in
We are working on getting in an HDMI 2.0 display currently!
Seams the after market ones
Seams the after market ones that don’t use the reference PCB have HDMI 1.4a instead of HDMI 2.0. Even if they have the same or similar layout connector as the reference they use HDMI 1.4a
So all of them.
MSI
Zotac
Gigabyte
EVGA
etc..
If you want HDMI 2.0 for 4k@60hz you need to get the refence model or else you be limited to 4k@30hz.
With all the 4k monitors coming out. This should be pointed out.
Umm….or you can just use
Umm….or you can just use the displayport, right?
Hes talking about Samsung UHD
Hes talking about Samsung UHD TVs. They don’t use DisplayPort. They all have 4 HDMI 2.0.
Sorry, but I think the memory
Sorry, but I think the memory interface is just not enough for a high-end product. There seems to be effectively no progress to last generation. I guess the next higher NVIDIA GPU will have more bandwidth.
Otherwise it looks great.
While I too was worried about
While I too was worried about a 256-bit memory bus, but clearly the performance is able to keep up. Dismissing a part because of an arbitrary spec rather than its performance doesn't make sense.
techpowerup tested the 970 in
techpowerup tested the 970 in SLI, and it kept up pretty good with the 295×2, yea at highest rez like 4k and 5760×1080 295×2 generally was faster, but 970 wasn’t that far off. So even with 256bit bus it uses it well. Even with half the ram path. Most were around 10% difference.
just bout a 4790k with a H75
just bout a 4790k with a H75 cooler to go with my MSI 970 Frozer, or what ever its name is. Yet I bout a IPS 27 intch screen , when G Sink comes out with IPS screens this one can find a new home. I too have been waiting on these new cards. Your show is the reason I bought the 970. A big hug over frame rating too, forcing the new path of all venders to not put out junk drivers etc. great work PC crew.
Still rocking the awful green
Still rocking the awful green logo that throws of my theme.Gguess ill get acx again sucks becasue they overheat very quickly on air sandwiched.
edit
on tri sli
edit
on tri sli
All I can say is that Nvidia
All I can say is that Nvidia hit a homerun to center field. Very Impressive results!! The Maxwell Architecture really flexed its muscle. The GTX 980 Is A BEAST! Thanks Ryan for the great review and thanks for adding Sli benchmarks and the video!!!
Ill wait for the real maxwell
Ill wait for the real maxwell the GM110 and even then you have to wait an additional year becasue the first batch are not flagship the TI’s are.
GPU market is becoming a joke not to mention the quality of games make it not even worth spending the money hell even my 780 are not being utilized to full extent becasue all these shit games are running on old/unoptimized engines.
A clean victory for Nvidia,
A clean victory for Nvidia, definitely! It’s been a long time since Nvidia was the clear winner on all fonts with a new GPU product release.
I see only only one player on
I see only only one player on the field and you already call victory ? You have information we don’t have ?
Or are you comparing nvidia’s 2014 arch against AMD’s 2011 ?
I see only only one player on
I see only only one player on the field and you already call victory ? You have information we don’t have ?
Or are you comparing nvidia’s 2014 arch against AMD’s 2011 ?
Any news on when they expect
Any news on when they expect these cards to go on sale?
now
evga had instock couple
now
evga had instock couple hours ago but not out of stock
I see how your are Ryan, not
I see how your are Ryan, not saying anything Wednesday nite, lol The 970 looks to be the sweet spot for most, anything about a 960 ?
Sorry, I couldn’t say
Sorry, I couldn't say anything! 🙂
Ryan we knew what was up on
Ryan we knew what was up on Wednesday when the 900 series story popped up and you excused yourself from talking about it along with Allyn. So most of us in the chat knew you had the cards for sure there, not like those box’s sitting there didn’t confirm it either.
The thing I’m impressed with
The thing I’m impressed with is the reduced power demand. I’m actually a little surprised at the higher 970 idles, but…. I’ve heard the 390X is going to be about 300W. It also may not hit the market until 12/1. If it doesn’t totally own this ….
What? No DP 1.3? They’ve
What? No DP 1.3? They’ve had a whole 4 days since it was officially released!
So, I have a single 760 right
So, I have a single 760 right now. Get a second 760 and SLI with the price drop, or save for another month or two and get a 970?
Ryan? Whats the exact setting
Ryan? Whats the exact setting in your catalyst? Was the framepacing active? I’m curious about the frametimes with the 290X. I mean: AMD has XDMA, framepacing on/off capability and mantle (bf4/thief/sniper elite3). So why they’re so bad. Thats hilarious since they twittered a lot about it and as i see nothing dramaticly changed until now.
In addition: a very nice review (as always) and thanks for the hard work. Can you tell us something about the voltage-range? Is it ~1.2 Volts again?
There is much talk about
There is much talk about maxswell’s new features but most of them are not really all that new or remarkable. Consequently IPC did not significantly increase when compared to Kepler, so performance per transistor per clock is only slightly improved. What makes Maxwell really notable is it’s ability to reach previously unseen high frequencies with remarkably low voltage.
I’m completely lost as to how it does that, and can’t find any explanation in any of the reviews. This kind of effect I would expect form node shrink, or form implementation of advance transistor types such as finFET or FD-SOI, but Maxwell is made on standard bulk planar28nm FETs so I don’t understand how nvidia is able to make them stable as such high frequencies as such low voltage.
If someone has explanation please tall me.
[Troll]
Alien Technology
[Troll]
Alien Technology Bro!
[/Troll]
The stability at a lower
The stability at a lower voltage and higher clocks is easy to explain because Ryan already explained in the article that SoC experience …low voltage experience from mobile products has helped nvidia innovate in areas that AMD has no experience in yet(if ever)…..nvidia trying to squeeze water out the rock that was tegra and drip into maxwell
I don’t think that “mobile
I don’t think that “mobile SoC experience” is adequate explanation. Mobile chips sacrify frequency to get minimal voltage, and being optimized for such low voltages and frequencies they don’t handle high frequencies well.
Maxwell behaves differently, and handles high frequencies excellently.
Besides even if “mobile SoC experience” is the answer it tells us nothing about what concrete technical feature imported form SoC is providing the effect.
In all seriousness i think it
In all seriousness i think it had something to do with the 192 nodes vs the now 128 nodes?
Fantastic review. The best i
Fantastic review. The best i have seen. WOW. much time.