A new GPU, a familiar problem

NVIDIA’s GM206 based GTX 960 card is finally launching, after weeks of rumors and leaks. Can it topple AMD’s lead at $199?

Editor's Note: Don't forget to join us today for a live streaming event featuring Ryan Shrout and NVIDIA's Tom Petersen to discuss the new GeForce GTX 960. It will be live at 1pm ET / 10am PT and will include ten (10!) GTX 960 prizes for participants! You can find it all at http://www.pcper.com/live

There are no secrets anymore. Calling today's release of the NVIDIA GeForce GTX 960 a surprise would be like calling another Avenger's movie unexpected. If you didn't just assume it was coming chances are the dozens of leaks of slides and performance would get your attention. So here it is, today's the day, NVIDIA finally upgrades the mainstream segment that was being fed by the GTX 760 for more than a year and half. But does the brand new GTX 960 based on Maxwell move the needle?

But as you'll soon see, the GeForce GTX 960 is a bit of an odd duck in terms of new GPU releases. As we have seen several times in the last year or two with a stagnant process technology landscape, the new cards aren't going be wildly better performing than the current cards from either NVIDIA for AMD. In fact, there are some interesting comparisons to make that may surprise fans of both parties.

The good news is that Maxwell and the GM206 GPU will price out starting at $199 including overclocked models at that level. But to understand what makes it different than the GM204 part we first need to dive a bit into the GM206 GPU and how it matches up with NVIDIA's "small" GPU strategy of the past few years.

The GM206 GPU – Generational Complexity

First and foremost, the GTX 960 is based on the exact same Maxwell architecture as the GTX 970 and GTX 980. The power efficiency, the improved memory bus compression and new features all make their way into the smaller version of Maxwell selling for $199 as of today. If you missed the discussion on those new features including MFAA, Dynamic Super Resolution, VXGI you should read that page of our original GTX 980 and GTX 970 story from last September for a bit of context; these are important aspects of Maxwell and the new GM206.

NVIDIA's GM206 is essentially half of the full GM204 GPU that you find on the GTX 980. That includes 1024 CUDA cores, 64 texture units and 32 ROPs for processing, a 128-bit memory bus and 2GB of graphics memory. This results in half of the memory bandwidth at 112 GB/s and half of the peak compute capability at 2.30 TFLOPS.

Those are significant specification hits and will result in a drop of essentially half the gaming performance for the GTX 960 compared to the GTX 980. Some readers and PC enthusiasts will immediately recognize the GTX 960 as a bigger drop from the flagship part than recent generations of graphics cards from NVIDIA. You're not wrong.

  GTX 960 GTX 970 GTX 980 GTX 760 GTX 770 GTX 780 GTX 660 GTX 670 GTX 680
GPU GM206 GM204 GM204 GK104 GK104 GK110 GK106 GK104 GK104
GPU Cores 1024 1664 2048 1152 1536 2304 960 1344 1536
Rated Clock 1126 MHz 1050 MHz 1126 MHz 980 MHz 1046 MHz 863 MHz 980 MHz 915 MHz 1006 MHz
Texture Units 64 104 128 96 128 192 80 112 128
ROP Units 32 64 64 32 32 48 24 32 32
Memory 2GB 4GB 4GB 2GB 2GB 3GB 2GB 2GB 2GB
Memory Clock 7000 MHz 7000 MHz 7000 MHz 6000 MHz 7000 MHz 6000 MHz 6000 MHz 6000 MHz 6000 MHz
Memory Interface 128-bit 256-bit 256-bit 256-bit 256-bit 384-bit 192-bit 256-bit 256-bit
Memory Bandwidth 112 GB/s 224 GB/s 224 GB/s 192 GB/s 224 GB/s 288 GB/s 144 GB/s 192 GB/s 192 GB/s
TDP 120 watts 145 watts 165 watts 170 watts 230 watts 250 watts 140 watts 170 watts 195 watts
Peak Compute 2.30 TFLOPS 3.49 TFLOPS 4.61 TFLOPS 2.25 TFLOPS 3.21 TFLOPS 3.97 TFLOPS 1.81 TFLOPS 2.46 TFLOPS 3.09 TFLOPS
Transistor Count 2.94B 5.2B 5.2B 3.54B 3.54B 7.08B 2.54B 3.54B 3.54B
Process Tech 28nm 28nm 28nm 28nm 28nm 28nm 28nm 28nm 28nm
MSRP $199 $329 $549 $249 $399 $649 $230 $399 $499

This table compares the last three brand generations of NVIDIA's GeForce cards from x80, x70 and x60 products. Take a look at the GTX 680, a card based on the GK104 GPU and the GTX 660 based on GK106; the mainstream card has 62.5% of the CUDA cores and 75% of the memory bus width. The GTX 760 is actually based on the same GK104 GPU as the GTX 680 and GTX 770 and includes a wider 256-bit memory bus though dropped to half of the CUDA cores of the GK110-based GTX 780.

It's complicated (trust me, I know), but NVIDIA definitely wants to get to smaller GPU dies again on the lower-priced parts. Way back in 2012 NVIDIA released the GTX 660 with a 2.54 billion transistor die on the 28nm process, but stayed performance competitive with the 700-series. The GTX 760 jumped up a lot to a 3.54 billion transistor die and increased the price up to $250 at launch. Today's release of the GTX 960 is down to 2.94 billion transistors, near that of the GTX 660, but with a lower starting price point of $199.

Power use on the GTX 960 is amazingly low with a rated TDP of 120 watts and in our testing the GPU almost never even approaches that level. In fact, when playing a game like DOTA 2 with V-Sync off (60 FPS cap) the card barely draws more than 35 watts! (More details on that on the power page.)

In the press documentation from NVIDIA, the company makes several attempts to put a better spin on the specifications surround the GeForce GTX 960. For the first time, NVIDIA mentions an "effective memory clock" rate that is justified by the efficiency improvement in memory compression of Maxwell over Kepler. While this is definitely true, it's been true between generations for years and is part of the reason analysis of GPUs lie ours continue to exist. Creating metrics to selective improve line items is a bad move, and I expressed as much during our early meetings.

Separately, NVIDIA is moving forward with the continued emphasis on MFAA performance numbers. Remember that multi-frame sampled anti-aliasing (MFAA) was launched with the GTX 980 and GTX 970, and uses a post-processing filter to combine multiple frames temporally at 2xMSAA quality with shifted sample points. The result is a 4xMSAA look at 2xMSAA performance, at least in theory. When the GTX 980 and GTX 970 launched game support was incredibly limited, making the feature less than exciting. With this new driver, Maxwell GPUs will be able to support MFAA on all DirectX 11 and 10 games that support MSAA excluding only Dead Rising 3, Dragon Age 2 and Max Payne 3. That applies to most games we test in our suite including Crysis 3, Battlefield 4 and Skyrim; other games like Metro: Last Light or Bioshock Infinite use internal AA methods, not driver-based MSAA, and thus are unable to utilize MFAA.

When NVIDIA defaults to using MSAA, they are comparing 2xMSAA with MFAA (4xAA quality essentially) to 4xMSAA on other cards. To its credit, NVIDIA says they are only comparing this way to previous NVIDIA hardware, not to AMD's competing hardware. My thoughts on this are mixed at this point as it will no doubt start a race from both parties to fully integrate and showcase custom, proprietary AA methods exclusively going forward. See my page on MFAA performance later in the review for more details.

There are interesting comparisons to be made between the new GTX 960 and the currently shipping competing parts from AMD. Some of the specification differences will be claimed as important advantages for the Radeon line up. Obviously our performance evaluation will be the final deciding factor, but is there anything to these claims?

  GTX 960 GTX 760 R9 285 R9 280
GPU GM206 GK104 Tonga Tahiti
GPU Cores 1024 1152 1792 1792
Rated Clock 1126 MHz 980 MHz 918 MHz 827 MHz
Texture Units 64 96 112 112
ROP Units 32 32 32 32
Memory 2GB 2GB 2GB 3GB
Memory Clock 7000 MHz 6000 MHz 5500 MHz 5000 MHz
Memory Interface 128-bit 256-bit 256-bit 384-bit
Memory Bandwidth 112 GB/s 192 GB/s 176 GB/s 240 GB/s
TDP 120 watts 170 watts 190 watts 250 watts
Peak Compute 2.30 TFLOPS 2.25 TFLOPS 3.29 TFLOPS 3.34 TFLOPS
Transistor Count 2.94B 3.54B 5.0B 4.3B
Process Tech 28nm 28nm 28nm 28nm
MSRP $199 $249 $249 $249

While comparing GPU core counts is useless between architectures, many of the other data points can be debated. The most prominent difference is the 128-bit memory bus that GM206 employs when compared to the R9 285 with a 256-bit memory bus or even the R9 280 with its massive 384-bit memory bus. Raw memory bandwidth is the net result of this - the GTX 960 only sports 112 GB/s while the R9 280 tosses out 240 GB/s, more than twice the value. This allows the R9 280 to have a 3GB frame buffer but also means it has disadvantage in TDP and transistor count / die size. An additional 1.4 billion transistors and 130 watts of thermal headroom are substantial. The Tonga GPU in the R9 285 has more than 2.0 billion additional transistors when compared to the GTX 760 - what they all do though is still up for debate.

There is no denying that from a technological point of view, having a wider memory bus and higher memory bandwidth is a good thing for performance. But it comes at cost - both in terms of design and in terms of AMD's wallet. Can NVIDIA really build a GPU that is both substantially smaller but equally as powerful?

« PreviousNext »