What G80 Brings
NVIDIA is launching the worlds first DX10 ready hardware today with the first unified architecture. Not only did they get it right, they wiped the floor with the competition!
Introduction
DirectX 10 is sitting just around the corner, hand in hand with Microsoft Vista. It requires a new unified architecture in the GPU department that neither hardware vendor has implemented yet and is not compatible with DX9 hardware. The NVIDIA G80 architecture, now known as the GeForce 8800 GTX and 8800 GTS, has been the known DX10 candidate for some time, but much of the rumors and information about the chip were just plain wrong, as we can now officially tell you today.
Come find out why the GeForce 8800 GTX should be your next GPU purchase.
What is a Unified Architecture?
The requirement of a unified architecture is one of the key changes to the upcoming release of DirectX 10 on Windows Vista. The benefits and pitfalls of a unified graphics architecture have been hotly debated since DX10 specs first became known several years ago. With Vista just months away now, both NVIDIA and ATI no longer get to debate on the logic of the move; now they have to execute on it.
A unified graphics architecture, in its most basic explanation, is one that does away with seperate pixel pipelines and texture pipelines in favor of a single “type” of pipeline that can be used for both.
Traditional GPU Architecture and Flow
This diagram shows what has become the common GPU architecture flow; starting with vertex processing and ending with memory access and placement. In G70, and all recent NVIDIA and ATI architectures, there was a pattern that was closely followed to allow data to become graphics on your monitor. First, the vertex engine, starting out and pure texture and lighting hardware, processing the vertex data into cohesive units and passes it on to the triangle setup engine. Pixel pipes would then take the data and apply shading and texturing and pass the results onto the ROPs that are responsible for culling the data, anti-aliasing it (in recent years) and passing it in the frame buffer for drawing on to your screen.
This scheme worked fine, and was still going strong with DX9 but as game programming became more complex, the hardware was becoming more inefficient and chip designers basically had to “guess” what was going to be more important in future games, pixel or vertex processing, and design their hardware accordingly.
A unified architecture simplifies the pipeline significantly by allowing a single floating point processor (known as a pixel pipe or texture pipe before) to work on both pixel and vertex data, as well as new types of data such as geometry, physics and more. These floating point CPUs then pass the data onto a traditional ROP system and memory frame buffer for output that we have become familiar with.
I mentioned above that because of the inefficiencies of the two-pipeline-style, hardware vendors had to “guess” which type was going to be more important. This example showcases this point very well: in the top scenario, the scene is very vertex shader heavy while the pixel shaders are being under utilized, leaving idle hardware. In the bottom scenario, the reverse is happening, as the scene is very pixel shader intensive leaving the vertex shaders sitting idle.
Any hardware designer will tell you that having idle hardware when there is still work to be done is perhaps the single most important issue to address. Idle hardware costs money, it costs power and it costs efficiency — all in the negative direction. Unified shaders help to prevent this curse on computing hardware.
In the first example, notice that the sample “GPU” had 4 sets of vertex shaders and 8 sets of pixel shaders; a total of 12 processing units that were used inefficiently. Here we have another GPU running with 12 unified processing shaders that can dynamically be allocated to work on vertex of pixel data as the scene demands. In this case, in the top scene that was geometry heavy uses 11 of the 12 shaders for vertex work and 1 for pixel shading, using all 12 shaders to their maximum potential.
This is of course the perfect, theoretical idea behind unified architectures, and in the real world the problem is much more complex.
In the real world, there are more than 12 processor pipelines and the ability to break down a scene into “weights” like we did above is incredibly complex. NVIDIA’s new G80 hardware has the ability to dynamically load balance in order to get as high of an efficiency out of the unified shaders as possible. As an example from Company of Heroes, on the left is a scene with little geometry and one on the right with much more geometry to process. The graph at the bottom here shows a total percentage usage of the GPU, yellow representing pixel shading work and red representing vertex shading work. When the scene shifts from the left to the right view, you can see that the amount of vertex work increases dramatically, but the total GPU power being used remains the same; the GPU has load balanced the available processing queue accordingly.