NVIDIA was extremely open during this chip launch with their technology features and configurations, a welcome change in most of my briefings. Media was given a very in-depth look at the GeForce 7800 architecture, which we will share with you below.
This diagram is the complete G70 architecture at a glance. Just by looking at the patterns and colors you can see that what the 7800 offers is a lot of parallelism in its processing. At the heart of every GPU is shading power; in particular pixel shading. NVIDIA recognizes this and the G70 architecture is designed to push the most pixel processing power of any GPU currently available. At the top you can spot 8 separate vertex shader pipelines, leading to a setup and then a shader instruction dispatch. The dispatch is the logic that properly load balances the pixel pipelines below it and optimize any shader instructions for optimal performance. Immediately following are the 24 pixel pipes that are organized into 6×4 partitions. The fragment crossbar then takes the data and decides which of the 16 ROPs (raster operators) will work on the pixels next, finishing with the memory bus that is 256-bits wide and broken into 4 64-bit partitions.
Breaking down the large scale diagram into smaller pieces, we can see here the detail on a single vertex shading pipeline. There are no very dramatic changes from the NV40 architecture in these but NVIDIA has increased them in quantity from 6 to 8. The VPE (vertex processing engine) is still dual issue and can hide the latency in the vertex texture fetch unit.
After analyzing a good number of games and shaders, NVIDIA came to the conclusions that applications are becoming increasingly shader compute bound and less dependent on raw bandwidth that a GPU can provide. They also found the HDR, higher precision render targets and increased high quality filtering are also the trends of game software. It is with those indicators that NVIDA designed the G70 pixel pipelines to support both more math per pipe and more math per clock, meaning that a higher clock frequency wouldn’t be necessary to improve performance.
Most of the changes that the G70 sees over the NV40 are in this portion of the architecture. There are still two FP32 shader units in each pixel pipe, both supporting dual-issue and a smaller ALU unit. But, while the NV40 could do only a single MADD operator per pipe per clock, the G70 can do 2 MADDs (MUL and ADD) per pipe per clock, improving performance on a wide array of shader applications right off the bat.
The pixel shader ALUs are each capable of handling 5 instructions per pixel and 10 operations per pixel for a total of 26 flops/pixel per clock per ALU. With 24 pipe lines that gives us a total of 648 Flops per clock on the G70, just for pixel shading.
The ROP is the raster operator that gives the GPU the ability to write pixels, handle any antialiasing as well as Z and color compression. The ROP pixel pipelines remain the same from the NV40 architecture for the most part again, from a function perspective. The G70 has 16 of these just as the NV40 did.
With all that power, NVIDIA claims they are seeing as much as a 50% increase in performance per pixel per clock on common shader applications from 3DMark05 to Doom3 and Half-Life 2. We’ll see in our benchmarks of course how this holds up in real world gaming scenarios.
This final slide gives you tech-heads a complete overview of the kind of numbers the G70 architecture can spit out in their raw function. The vertex shader units can handle 34.4 billion floating point operations per second and the pixel shaders can handle 278.6 billion. For comparisons sake, the numbers that are available on the ATI Xbox 360 GPU discuss the ability to run at 240 billion of the same calculations. Of course, there is more to gaming performance than GFlops, and that leads us into the new features that NVIDIA added into the 7800 GTX besides raw processing power.