Introduction and 8800 GTX Review
We finally had a pair of GeForce 8800 GTX cards sitting here long enough to get some SLI testing done; and we think you’ll be impressed by the results!
My initial review of the NVIDIA GeForce 8800 GTX GPU showed it to be a stellar product. With increased DX9 gaming performance as well as support for DX10 and better performance per watt ratios than ATI’s X1950 XTX card, the 8800 GTX was a winner in nearly all fronts. The one exception was that SLI configurations weren’t ready during the initial release and reviewers and gamers had to wait a week or so for the driver support to be implemented.
The wait for that is over and I have had some time to sit down and play with an SLI-configured gaming system here for a couple weeks. Before we get into the system setup and of course, the benchmark results, here is a quick summary of the 8800 GTX G80 architecture from my previous review.
The GeForce 8800 GTX
A unified graphics architecture, in its most basic explanation, is one that does away with seperate pixel pipelines and texture pipelines in favor of a single “type” of pipeline that can be used for both.
Traditional GPU Architecture and Flow
This diagram shows what has become the common GPU architecture flow; starting with vertex processing and ending with memory access and placement. In G70, and all recent NVIDIA and ATI architectures, there was a pattern that was closely followed to allow data to become graphics on your monitor. First, the vertex engine, starting out and pure texture and lighting hardware, processing the vertex data into cohesive units and passes it on to the triangle setup engine. Pixel pipes would then take the data and apply shading and texturing and pass the results onto the ROPs that are responsible for culling the data, anti-aliasing it (in recent years) and passing it in the frame buffer for drawing on to your screen.
This scheme worked fine, and was still going strong with DX9 but as game programming became more complex, the hardware was becoming more inefficient and chip designers basically had to “guess” what was going to be more important in future games, pixel or vertex processing, and design their hardware accordingly.
A unified architecture simplifies the pipeline significantly by allowing a single floating point processor (known as a pixel pipe or texture pipe before) to work on both pixel and vertex data, as well as new types of data such as geometry, physics and more. These floating point CPUs then pass the data onto a traditional ROP system and memory frame buffer for output that we have become familiar with.
Click to Enlarge
All hail G80!! Well, um, okay. That’s a lot of pretty colors and boxes and lines and what not, but what does it all mean, and what has changed from the past? First, compared to the architecture of the G71 (GeForce 7900), which you can reference a block diagram of here, you’ll notice that there is one less “layer” of units to see and understand. Since we are moving from a dual-pipe architecture to a unified one, this makes sense. Those eight blocks of processing units there with the green and blue squares represent the unified architecture and work on pixel, vertex and geometry shading.
There are 128 streaming processors that run at 1.35 GHz accepting dual issue MAD+MUL operations. These SPs (streaming processors) are fully decoupled from the rest of the GPU design, are fully unified and offer exceptional branching performance (hmm…). The 1.35 GHz clock rate is independent of the rest of the GPU, though all 128 of the SPs are based off of the same 1.35 GHz clock generator; in fact you can even modify the clock rate on the SPs seperately from that of the GPU in the overclocking control panel! The new scalar architecture on the SPs benefits longer shader applicaitons to be more efficient when compared to the vector architecture of the G70 and all previous NVIDIA designs.
With the new G80 architecture, NVIDIA is introducing a new antialising method known as coverage sampled AA. Because of the large memory storage that is required on multisampled AA (the most commonly used AA), moving beyond 4xAA was not efficient and NVIDIA is hoping the CSAA can solve the issue by offering higher quality images with less storage requirements. For much more detail and examples of this new AA method, look here in our previous architecture article.
I mentioned in the discussion on the new G80 architecture that the texture filtering units are much improved and offer us better IQ options than ever before. While we haven’t looked at it in depth on PC Perspective recently, there has been a growing concern over the filtering options that both ATI and NVIDIA were setting in their drivers, and the quality they produced. If you have ever been playing a game like Half Life 2 or Guild Wars (probably one of the worst) and noticed “shimmering” along the ground, where textures seem to “sparkle” before they come into focus, then you have seen filtering quality issues. And for more information on the improved filter, again, look here in our previous article.