Architecture Efficiency Experimental Testing
While writing this review I became enamored with the idea of somehow testing the architecture efficiency of the new GF104 GPU compared to the pre-existing GF100 GPU found in the GTX 480 and GTX 470.  In the world of processors we do this all the time and it’s called testing IPC or instructions per clock.  Doing something similar on the GPU is a bit more problematic, and noticeably less scientific, as it involves so many other pieces of hardware as well game software that might not be written in a way to really show off the particulars we are looking to see.

In the end I came up with this as a rough estimate to test my theories: compare the new 1GB version of the GeForce GTX 460, running at clock speeds of 675 MHz, 1350 MHz and 3.6 GHz (core, shader memory) to the GeForce GTX 480 running at 700 MHz, 1401 MHz and 3.7 GHz.  These clock rates aren’t exact and given some more time we could likely test them at exactly the same clock speeds, but this gives us comparable speeds to a margin of error of 3.7% or so. 

Obviously we also need to then take into account the different number of CUDA cores in each GPU: 480 on the GTX 480 and 336 on the new GTX 460.  What I am looking for is any noticeable difference in performance PER CORE that might be the result of the shift in balance NVIDIA made with the GF104s core count per SM, texture units, etc details on the first two pages of this article.  My hack of a solution?  Take the average frame rate of the scores from the GTX 480 and the GTX 460 and divide them by the CUDA core count of each GPU.  Then, for good measure, I multiplied by 1000 to make the numbers more readable.  My final algorithm looked like this:

(Avg FPS / CUDA Core Count) * 1000

Pretty simple right?  The results are seen below:

NVIDIA GeForce GTX 460 Review - GF104 and the budget Fermi - Graphics Cards 100

Our made up metric is being called “Kilo-frames per CUDA Core” for no other reason that it was late and things were starting to blur.  We are looking for higher bars; that would seemingly indicate an architecture that is more efficient in how it takes advantage of its CUDA cores in relation to the rest of the GPU. 

The result were not nearly as impressive as I’d hoped and in many cases the GTX 480 has the edge and in a few cases the new GF104 architecture takes a lead, but never by much.  The only place where the gap is really noticeable is with Metro 2033: the GTX 480 is about 50% more efficient at 1680×1050 and 1920×1200 if our numbers are to be believed.  This could be due to the frame buffer differences (1.5 GB vs 1.0 GB) or larger amounts of tessellation-ready PolyMorph engines available per CUDA core on the GTX 480. 

For now, I am willing to leave our results like this and in an admittedly vague state until we can get some more data from NVIDIA and debate more internally on the value of extended testing like this.  Let us know what you think!

« PreviousNext »