The GM204 Architecture
NVIDIA’s new GM204 GPUs are finally revealed.
James Clerk Maxwell's equations are the foundation of our society's knowledge about optics and electrical circuits. It is a fitting tribute from NVIDIA to include Maxwell as a code name for a GPU architecture and NVIDIA hopes that features, performance, and efficiency that they have built into the GM204 GPU would be something Maxwell himself would be impressed by. Without giving away the surprise conclusion here in the lead, I can tell you that I have never seen a GPU perform as well as we have seen this week, all while changing the power efficiency discussion in as dramatic a fashion.
To be fair though, this isn't our first experience with the Maxwell architecture. With the release of the GeForce GTX 750 Ti and its GM107 GPU, NVIDIA put the industry on watch and let us all ponder if they could possibly bring such a design to a high end, enthusiast class market. The GTX 750 Ti brought a significantly lower power design to a market that desperately needed it, and we were even able to showcase that with some off-the-shelf PC upgrades, without the need for any kind of external power.
That was GM107 though; today's release is the GM204, indicating that not only are we seeing the larger cousin of the GTX 750 Ti but we also have at least some moderate GPU architecture and feature changes from the first run of Maxwell. The GeForce GTX 980 and GTX 970 are going to be taking on the best of the best products from the GeForce lineup as well as the AMD Radeon family of cards, with aggressive pricing and performance levels to match. And, for those that understand the technology at a fundamental level, you will likely be surprised by how much power it requires to achieve these goals. Toss in support for things like a new AA method, Dynamic Super Resolution, and even improved SLI performance and you can see why doing it all on the same process technology is impressive.
The NVIDIA Maxwell GM204 Architecture
The NVIDIA Maxwell GM204 graphics processor was built from the ground up with an emphasis on power efficiency. As it was stated many times during the technical sessions we attended last week, the architecture team learned quite a bit while developing the Kepler-based Tegra K1 SoC and much of that filtered its way into the larger, much more powerful product you see today. This product is fast and efficient, but it was all done while working on the same TSMC 28nm process technology used on the Kepler GTX 680 and even AMD's Radeon R9 series of products.
The fundamental structure of GM204 is setup like the GM107 product shipped as the GTX 750 Ti. There is an array of GPCs (Graphics Processing Clustsers), each comprised of multiple SMs (Streaming Multiprocessors, also called SMMs for this Maxwell derivative) and external memory controllers. The GM204 chip (the full implementation of which is found on the GTX 980), consists of 4 GPCs, 16 SMMs and four 64-bit memory controllers.
Each SMM features 128 CUDA cores, or stream processors, bringing the total for this product to 2048. That is significant drop from the 2880 CUDA cores found in the GTX 780 Ti (full GK110 chip) but as you'll soon find out, thanks to the higher clock speeds and performance efficiency changes, the GTX 980 matches or beats the GTX 780 Ti in every test we have run.
The SMM also features an improved Polymorph engine for geometry processing and 8 texture units. All of the above combines for a total of 128 texture units, a 256-bit memory bus, 64 ROPs (raster operators), and 2MB of L2 cache.
The GeForce GTX 680, the GK104 based graphics card that was launched in March of 2012, provides some interesting comparisons. First, the obvious: the GTX 980 will have 33% more processor cores, higher clock speeds, and thus a much higher peak compute rate. Texture unit count remains the same but the doubling of ROP units gives the Maxwell GPU better performance in high resolution anti-aliasing. Memory bandwidth is increased by a modest amount, but NVIDIA has made more enhancements to improve in that area as well with GM204.
Look at those bottom four statistics though: GM204 is 1.66 billion transistors larger and has a 35% larger die size yet is able to quote a TDP that is 30 watts lower than GK104 using the same 28nm process. If you take performance into consideration though, the GTX 980 should be going up against the GK110-based GTX 780 Ti – a GPU that has a 250 watt TDP and a 7.1 billion transistor count (and a die size of 551 mm^2). As we dive into the benchmarks on the following pages, you will gain an understanding of why this is so interesting.
The SMMs of Maxwell are a fundament change when compared to Kepler. Rather than a single block of 192 shaders, the SMM is divided into four distinct blocks that each have a separate instruction buffer, scheduler, and 32 dedicated, non-shared CUDA cores. NVIDIA states that this simplifies the design and scheduling logic required for Maxwell saving on area and power. Pairs of these blocks are grouped together and share four texture filtering units and a texture cache. Shared memory is a different pool of data that is shared amongst all four processing blocks of the SMM.
With these changes, the SMM can offer 90% of the compute performance of the Kepler SM but with a smaller die area that allows NVIDIA to integrate more of them per die. GK104 had 8 SMs (1536 CUDA cores) while GM204 has 16 SMs (2048 CUDA cores) giving it a 2x SM density advantage.
NVIDIA's Jonah Alben indicated to us that the 192-core based SMs used in Kepler seemed great at the time but that they introduced quite a few inefficiencies due to a non-power-of-2 count. It was more difficult for the scheduling hardware to keep the cores full and processing on data and the move to 128-core SMMs helps address this. Speaking of scheduling, there were efficiency changes there as well. The arrangement of instruction scheduling now occurs much earlier in the pipeline, preventing re-scheduling in many instances, helping to keep power down and performance high.
Other than the dramatic changes to the SMM, the 2 MB L2 cache that NVIDIA has implemented on Maxwell is another substantial change. Considering that the Kepler had an L2 cache implementation at 512 KB, we are seeing an 8x increase in available capacity which should reduce the demand on the integrated memory controller of GM204 dramatically.
Texture fill rate between the GK104 and GM204 increases by 12% (thanks to clock speeds, not texture unit counts) though pixel fill rate more than doubles, going from 32.2 Gpixels/s to 72 Gpixels/s on GTX 980.
A 256-bit memory bus might seem like a downgrade for a flagship card as the GTX 780 Ti featured a 384-bit offering (though the GTX 680 featured a 256-bit controller as well). But there are several changes NVIDIA has made to improve memory performance with that smaller bus. First, the clock speed of the memory is now 7.0 GHz and the GM204 cache is larger and more efficient, reducing the number of memory requests that have to be made into DRAM.
Another change is the implementation of a third-generation delta color compression algorithm that attempts to lower the bandwidth required for any single operation. The compression happens both when data is written out to memory and when it is read again for the application, attempting to get as high as 8:1 compression on blocks of matching color value (4×2 pixel regions). The delta color compression compares neighboring color blocks and attempts to minimize the number of bits stored by looking at color differences. Obviously, if the data is very random and cannot be compressed at all, then it will just be written to the memory in a 1:1 mode.
NVIDIA claims that Maxwell requires 25% less memory bandwidth on a per-frame basis when you combine the improved caching and compression techniques on Maxwell. As a result, even though the raw GB/s values of GM204 are only marginally higher than that of GK104, the effective memory bandwidth of the new GTX 980/970 cards will appear much better to developers and in games.
There are other changes in GM204 that do not exist in GM107 to help improve performance of certain features that NVIDIA is bringing to the market. Those help build the foundation for VXGI global illumination and more.
From the benchmarks it seems
From the benchmarks it seems obvious to me that the game is CPU bottlenecked. The highest FPS is outside of combat. During combat the FPS all drops to 100fps no matter 1440p or 4k. This is my experience as well, from looking at CPU/GPU usage while playing Skyrim. Since it’s limited to 2 cores, basically a highly overclocked 4 core is the way to go. However, with something like ENB, I wonder if that’ll shift enough of the burden towards the GPU to the point where a better GPU solution actually does something.
I am talking about Skyrim,
I am talking about Skyrim, BTW, in my above comment.
Ryan are you planning on
Ryan are you planning on testing multimonitor in the near future?
Below is my question.
For example: I have my 7950 sapphire flex in crossfire. I play eyefinity games at 5920×1440. My 3 screens are 1680x1050x2 (16×10) for the outiside screens. And my u2711 at 2560×1440 (16×9) in the middle. All are dp monitors.
I do this because for the games that do not support multi monitor. I like the bigger 2560×1440 screen.
It would be great to know if Nvidia has updated surround capabilities to match AMD’s!
I’m happy to see nVidia
I’m happy to see nVidia endorse downsampling in the form of a supported feature. I’m curious about the downsampling filter they use though – a 13-tap Gaussian filter should produce a decently sharp image without ringing, but is there any word on whether or not it is gamma-aware? That last detail is important when downsampling and particularly for high-contrast details.
Hi,
I have a request to
Hi,
I have a request to benchmark skyrim with enb. full quality.
real vision option a full quality is a good videocard destroyer!
my system is i7 4820k @ 4.5ghz and a 290X
skyrim at 4k without enb is 45-50fps
enb on full quality is 17-19FPS..
can you setup a skyrim enb benchmark for reference from now on?
im very interested in your benchmarks for skyrim enb with 290x, 780ti, 970gtx and 980gtx
I know its alot of work but please please please! hehehe
ohhh, if you do, please add
ohhh, if you do, please add unoffical hd textures, flora overhaul and that hurts performance even more! makes the game so beautiful to play…
Most of those supplements
Most of those supplements work by stopping the cause
of baldness. Excess consumption of zinc may cause bleeding stomach and severe abdominal pain. There are only 2 St Johns wort products that I know of,
that have had been properly researched and the Flordis Remotiv is one of those.
My site … ev44.pl (Priscilla)
I have been watching your
I have been watching your channel for a long time now. I would like to say that i enjoy the thorough way in which you benchmanrk every card. That being said, i would guess to say that 98% of pc gamers play in 1080p. Im wondering why you test such hi resolutions? Im sure you have 1080p benchmarks on another page. I just feel raked over the coals with GSync and 4K. Im tired of forking over thousands for small increases in performance. This bleeding edge is making my wallet bleed!
I agree with Shaun, the
I agree with Shaun, the realvision ENB would be a great benchmark tester as with the realvision ENB on skyrim. I had average of 15fps in open area outside of white run and the rest. Was average of 20-35fps on a gigabyte r9 290x OC 4gb and I’m not sure if that did but my card eventually broke Or overheating issues but wouldn’t load into windows just a black screen with fans spinning fullspeed after windows load screen after POST. So I RMA’d that card. And got credit refund and bought the MSI GTX 970 4Gb and waiting for it to arrived with also with a new motherboard.
So I think that it would be a great Benchmark as it really pushes the GPU not so much the CPU and Skyrim with mods normally uses upto 4gb of VRAM
Anyway Thanks
Awesome Review, SLI power
Awesome Review, SLI power consumption for dual 980’s is hard to come by and you sir have slayed my doubts about overdrawing a 850W power supply. THANKYOU!!! 😀
Im going to build this
Im going to build this system
I7 4790K
SLI gtx 970s
16gb ram 1600mhz
Is a 630 watt PSU sufficient to run in sli? If yes can i also overclock?
630W? are you using a brand
630W? are you using a brand name PSU?
Dont trust PSU’s that come preinstalled with a case..
I would think 630 would be buggy for SLI..
you want at least 25% to spare.. I’d say at least a 750W..
Tho my Coolermaster ran 2x R9 280’s fine.. but that was me slightly underclocking my cpu so allow that..
just make sure the psu is a quality one.
Where AMD will have problems,
Where AMD will have problems, not so much in pricing, but in the thermals that are required for the mini/micro sized systems for HTPC/Etc. that may not be able to take the AMD SKUs even if the prices are lower, getting as much GPU power into as small a form factor as possible is going to be a much more important market segment, as more of these products are being introduced.
Small portable form factor portable mini desktop systems, linked wirelessly to tablets, and relying on the mini desktop for most of the processing needs, are going to appear, systems that can be easily carried around in a laptop bag, along with a tablet, the tablet acting as the desktop host for direct(Via ad hoc WiFi) remote into the mini desktop PC. these type of systems will be more powerful than a laptop(the Mini PC part of the pair), but just as portable, and plugged in at the coffee houses/ETC. and wirelessly serving games, and compute to one, or more tablets. Fitting More powerful discrete GPUs into these systems that will not overburden the limited cooling available in the Mini/Micro form factor will be a big business, especially for gaming/game streaming on the go, and taking these devices along while traveling, and having a device that can be configured to be more like a laptop when on battery power, but ramp up the power beyond what a laptop is capable of while plugged in.
Can i run gtx970 on my intel
Can i run gtx970 on my intel DH61HO Motherboard??
It’s obvious Ryan you have
It’s obvious Ryan you have taken heaps of time doing this (well done mate), but as someone wanting to build a rig to use on a big TV, I’m holding back until I can get my head around the 4k TV vs PC gaming output thing.
HDMI and 4k is my worry. I’ll be buying a big (thinking 65″) TV, only 4k for the gaming. It’ll do service as a normal TV too, but in Australia it’ll be obsolete before we see 4k content on the air! So that leaves gaming.
Is a big 4k TV a good option for high resolution gaming? Or are there land mines hidden in HDMI 1.x/2.x specs that’ll catch out the unaware? Certainly look better than 3 monitors.
It looks like the 980 will push BF4 to 4k @ ~30fps, but is that enough, or is SLI to get 45-60fps needed to be a pleasure to play?
A pair of 290x SLI watercooled would have to be an option, quiet yet in the running on fps. OK, uses more power but the purchase price difference buys a lot of electricity, unless water cooling costs a bomb!
Why are the specific settings
Why are the specific settings not disclosed?
That’s pretty much benchmarking 101, and things like AA & AO can make a massive difference.
While this is the only place
While this is the only place I have seen that has benchmarked Skyrim in 4k with a 970, and Thank you very much for that! But what settings did you use? You post what settings you used at the top? But did the 970 really pull ~50 fps at 4k with 8x A and ultra? Find that hard to believe. I have a new 4k Samsung and really just want to play Skyrim in vanilla 4k, no need for AA and trying to decide if 970 is enough.