RV790 Potential Specifications
The rumors that the RV790 is going to be new and different are lent a lot of weight by both AMD’s history of design choices and their product introductions. And when we consider what TSMC’s 40 nm process is looking like, then AMD could certainly surprise the industry with a new high end offering at the 55 nm node.
NVIDIA’s last foray into the “more affordable, efficient, and not quite as complex” design philosophy of GPUs. The 7900 GTX that the G71 powered was a very successful card that outsold ATI’s more advanced and slightly faster X1900 XTX.
Now is the time I get my guessing hat out and see what AMD has in store for us. These guesses are simply guesses based on what we have seen before, as well as a feel for where the industry has been going. I have no inside information, or anything even close to being an official or unofficial source for these guesses. These guesses are also based on us being 10 months out of the last major update to AMD’s chip lineup.
I do not think we will see a massive increase in stream units as we saw going from the RV670 to the RV770. I think we will see an increase to 960 stream units (divided into 12 SIMDs), but the amount of RBEs will likely be unchanged (though AMD would include another texture unit per SIMD, adding another RBE would sort of upset the orthoganality of the setup). Considering the current texturing and AA performance of the HD 4870, those changes are likely not needed in the new chip. Performance increases in pure pixel fillrate and texturing will be improved from the clockspeed increases as well as further internal optimizations into these units. I think we will also see an increase in die size in the same 30% to 35% range again, which would make the new die 320 mm square to 350 mm square.
That number should of course make anyone paying attention to this little article scratch their head. Why would an extra 160 stream units take up all that extra die space, where more than double that amount going from RV670 to RV770 plus all those extra texture units and all that work on the RBEs take up the same percentage? The answer to that one, according to my often wrong brain processes, is that AMD is focusing on extracting far more efficiency from those 960 units than they were with the introduction of the RV770. If we look at overall performance going from the HD 3870 to the HD 4870, it was a significant improvement. But it was not double the performance, even in shader heavy applications. 960 stream units at 900 MHz would give approximately 1.7 TFLOPs in single precision, which is up from the 1.2 TFLOPs that the current HD 4870 gives at peak. My guess is that the majority of the die space being used in the RV790 will be to more effectively get that horsepower to the pavement so to speak.

The RV770 chip is still not exactly small, but it does remind me of the old R300 in its overall size. Significant? Time will tell.
Making the chip more efficient will actually be aimed at two different goals. The first is obviously that of making games go faster, making shader code run more efficiently, and providing a better gaming experience by improving speed while applying higher level effects. The second is at improving the GPGPU aspects of the design. If AMD is clever about this, both goals can be achieved by improving internal communications and more effectively scheduling and sharing data between the SIMDs. The one area where NVIDIA has really stood out with their GT200 generation of chips was the amount of die space given to GPGPU concerns. The GTX 260/280/285 cards are all GPGPU monsters, and the work at the transistor level reflects this. Now I believe that AMD is also pursuing this goal. While the current HD 4870 does well in apps like Folding@Home, the flexibility and power of the GT200 series of chips from NVIDIA in such applications easily overshadows what AMD currently has.
Make no mistake, 960 stream units provide a lot of raw power. I believe that AMD will also work to improve the double precision float performance of this product. The current RV770 provides 1.2 TFLOPs of single precision float performance and 200 GFLOPs of double precision float. If AMD could change around their stream units to give 1.7 TFLOPs of single precision along with 680 GFLOPs of double precision, they will raise more than a few eyebrows with this part. That 680 GFLOPs would be assuming AMD converts 2/5 of the stream units to double precision. Now, if AMD were to convert 3/5 of those units to double precision, a single card could do slightly over 1 TFLOP. One single video card could replace the floating point power of an 8P rack of high end Xeons, with extra performance to spare. Then consider that this card would pull around 180 to 190 watts at full load. Pretty impressive math, from nearly every angle.