The Kepler Architecture
The wait is finally over as we have our review of the brand new GTX 680 card from NVIDIA!!
Join us today at 12pm EST / 9am CST as PC Perspective hosts a Live Review on the new GeForce GTX 680 graphics card. We will discuss the new GPU technology, important features like GPU Boost, talk about performance compared to AMD’s lineup and we will also have NVIDIA’s own Tom Petersen on hand to run some demos and answer questions from viewers. You can find it all at https://pcper.com/live!!
NVIDIA fans have been eagerly waiting for the new Kepler architecture ever since CEO Jen-Hsun Huang first mentioned it in September 2010. In the interim, we have seen the birth of a complete lineup of AMD graphics cards based on its Southern Islands architecture including the Radeon HD 7970, HD 7950, HD 7800s and HD 7700s. To the gamer looking for an upgrade it would appear that NVIDIA had fallen behind; but the company is hoping that today’s release of the GeForce GTX 680 will put them back in the driver’s seat.
This new $499 graphics card will directly compete against the Radeon HD 7970, and it brings quite a few "firsts" to NVIDIA’s lineup. This NVIDIA card is the first desktop 28nm GPU, the first to offer a clock speed over 1 GHz, the first to support triple-panel gaming on a single card, and the first to offer "boost" clocks that vary from game to game. Interested yet? Let’s get to the good stuff.
The Kepler Architecture
In many ways, the new 28nm Kepler architecture is just an update to the Fermi design that was first introduced in the GF100 chip. NVIDIA’s Jonah Alben summed things up pretty nicely for us in a discussion stating that "there are lots of tiny things changing (in Kepler) rather than a few large things which makes it difficult to tell a story."
GTX 680 Block Diagram
The chip that the GeForce GTX 680 is built on — GK104 — is seen in its block diagram form above. Already, you can see a big difference between this and the GTX 580 flagship card before it. There are 1536 stream processors / CUDA cores on GTX 680 compared to the 512 cores found in GTX 580 cards. The divisions of the GPU still exist in NVIDIA’s design — the GPC is a combination of SMs — though they have changed as well. A GPC now includes two SMX units (seen below) where the GTX 580 GPC included four SMs each.
With the SM increasing from 32 cores to 192 cores each, NVIDIA is claiming a performance per watt metric improvement of 2x which is becoming a crucial factor as designers focus on the thermal limits and power consumption of GPUs.
Kepler SMX Block Diagram
The SMX unit consists of 192 CUDA cores, an updated PolyMorph Engine, 16 texture units, thread scheduling, among others. Further, the cores are arranged differently than we saw in Fermi with six cores per special function unit (SFU) instead of four. Warp (thread) count has gone from 48 to 64 in Kepler.
With the 128 total texture units on the GTX 680 (twice what we had on the GTX 580) and an increase in cores of nearly 3x, you might be wondering how it all balances out. You may also be curious whether Kepler is really 3x as fast as Fermi.
Gone away is the "hot clock" of NVIDIA GPUs where the cores would operate at twice the clock rate of the base GPU. Instead Kepler now runs the entire chip at the same clock rate. The reasoning is a trade off in terms of die space and power consumption. Engineers were able to reduce the clock power by half and logic power by 10% at the expense of some die area, but with a focus on power efficiency on this design it was a change they were obviously willing to make.
Another change in Kepler is found in the scheduling component where much of the process is actually moved from hardware to software to be run in the NVIDIA driver. Because the software is already handling so much of the decoding process from DirectX, CUDA, OpenCL, and more NVIDA found it to be more power efficient to continue to increase the workload in the software rather than on the chip itself. Some items remain on die though because of latency concerns, such as texture operations.
Because of a reduction in the number of SMX units per chip, NVIDIA had to double up on the performance of individual PolyMorph engines. But because we have half the SMX units on Kepler as you did on Fermi, total chip performance hasn’t changed much.
Compared to AMD’s Radeon HD 7970 the GTX 680 is actually a bit slower at lower expansion factors and it’s not until we hit 11x that we start to see the advantages NVIDIA once claimed to have throughout the scale. Both companies debate which factors are most important though to game developers with AMD claiming that the lower factors are much more often used.
For the new memory design NVIDIA has gone with a 256-bit controller (compared to the 384-bit found on Fermi) though the clock speeds are running at 6 Gbps (1500 MHz)! The total memory bandwidth provided by this design is 192 GB/s, which is basically identical to that of the GTX 580. ROP count has decreased from 48 on the GTX 580 to 32 on Kepler/GTX 680, however.
Today’s GTX 680 will ship with a 2GB frame buffer and some users may lament of expectation for NVIDIA to match AMD’s 3GB memory configuration on the HD 7900 cards. While we are never one to say we don’t want MORE memory on our GPUs, in our testing we have not seen detrimental effects of 2GB versus 3GB of memory even on multi-display gaming.
The GTX 680 is indeed a PCI Express 3.0 compatible card and the GPU does support DX11.1 features as well, but it isn’t really anything to get excited about just yet.
One interesting change is the addition of NVENC, a dedicated video encoding engine that is built (essentially) to rival the QuickSync technology found in Intel’s Sandy Bridge processors. The logic is completely fixed function now — it is no longer using the CUDA cores to encode video — and NVIDIA claims that it is even more power efficient than Intel’s implementation. In fact, I was told by designers that the NVENC feature could actually be used while the GPU was powered off.
Another important change is found in the display support on Kepler as NVIDIA has finally moved away from the two display limit on single GPU cards. You can now run up to four displays on a single card, and run three of them in an NVIDIA Surround or 3DVision Surround configuration for multi-display gaming. This is obviously a feature that NVIDIA has needed for quite some time, and we are glad to see it in Kepler. DisplayPort 1.2 support is included as well.
And here she is, the Kepler die in all her glory. The 28nm GPU is built with 3.54 billion transistors and is 294mm^2.
There is quite a bit more to Kepler and the GeForce GTX 680 though.