More Musings on GTX 480 and FermiAMD certainly stole the limelight with their 5000 series of products, and they were a very good compromise in terms of functionality and performance. While the design was much more akin to the previous HD 4000 series of parts, AMD threw in enough changes to make things interesting. GPGPU was almost an afterthought for these cards though. Pre-AMD, ATI made some waves with the X1800 and X1900 chips by offering, at the time, unrivaled GPGPU performance in applications like Folding@Home. But since that point, NVIDIA has been the one pushing the envelope.
The associated items within the box are basic, but needed. Not shown is the coupon for the Pick Your Poison game download.
My gut feeling in looking at the architecture, and in talking with people around the industry, is that NVIDIA wanted to take a no-compromise approach to the Fermi architecture. Not only would it hold its own in 3D graphics, but the GPGPU functionality and performance would exceed their previous generation of parts. They have apparently done this with the high end Fermi parts, but it has obviously come at a price. NVIDIA also included slightly better texture filtering support, enhanced Coverage Sample AA modes, and a built in 7.1 channel sound functionality through HDMI (8 channel LPCM, though no HD Audio bitstreaming). Those who really like their AA (and I’m one of them) will be very happy to see the new modes and their ability to really help clean up a scene.
To achieve excellent performance in 3D gaming, the chip had to run fast enough to outpace the competition, and still justify the price. To achieve breakthrough performance in GPGPU, a lot more transistors need to be used to allow efficient operation in such applications. The combination of these goals required the design of a 3.0+ billion transistor part that was notoriously hard to fabricate. Throw in the issues with TSMC’s 40 nm process, and it was a nightmare for NVIDIA to get this part to market.
Personally, I like what NVIDIA has done with the architecture. It is highly scalable, very flexible, and radically changes the workflow that is required for 3D graphics and GPGPU duties. And it seems to fuse those sometimes disparate workloads into one (mostly) harmonious product. We have also seen the flexibility of this architecture in the GF104 chip, which powers the GTX 460 series of products. In that product NVIDIA was able to remove the ECC support, cut down on the L2 cache size, lower double precision performance, and include more CUDA cores per SM and convert that into a super-scalar unit. These are some fairly hefty changes for a derivative, cut down part.
This may be considered the “sticker” version, but MSI was very particular about what components were used for this build.
Perhaps what I like best about the direction that NVIDIA is heading towards is the ability to utilize the available CUDA cores as much as possible. Unlike AMD’s 5000 series, the tessellation units rely on the CUDA cores instead of a monolithic, standalone tessellation unit. Each SM essentially is its own tessellation unit, and it leverages the CUDA cores to do a lot of the heavy lifting for it. So not only do the CUDA cores handle geometry, vertex, and pixel shading functionality, but also tessellation and DirectCL/OpenCL/CUDA workloads. There are still inefficiencies with the architecture, and the CUDA cores are likely not 100% utilized all of the time. But this is true with any current generation graphics architecture. In fact, it appears as though AMD’s architecture is less efficient overall, though arguably simpler. The same issues that haunt CPU designers also go for GPU people. That last 5% to 10% of performance costs many times that number in transistor budget.
Due to the highly scalable nature, and a dynamic balancing of workloads vs. units, the Fermi architecture does excel at tessellation. In pure tessellation workloads, the GTX 480 far exceeds that of the HD 5870, and even the dual chip HD 5970. My goal in this review is to take a deeper look at tessellation performance in both theoretical and real-world scenarios. I also wanted to take a good, hard look at the state of GPGPU applications available to end users.