Don’t Forget the ROPS!
    The one last area in which NVIDIA lavished a lot of attention are the ROPS.  For a while there, we were primarily shader bound in most of our applications.  Now that we have 512 CUDA cores spitting out a tremendous amount of pixels, we are no longer being held back.  Instead we have seen a new performance bottleneck, and it is one we have seen before.  Pixel fillrate is now very much back in demand, and that is due to the rise of multi-screen gaming and NVIDIA’s 3D Vision.  The last generation of products from NVIDIA and AMD were able to handle resolutions of 2560×1600 with 4X to 8X anti-aliasing, but with the addition of multi-monitor gaming becoming more mainstream, combined with the double fillrate needs of 3D Vision, there needed to be a massive expansion of pixel fillrate capabilities from these cards.

NVIDIA GF100 Architecture Preview - Fermi brings DX11 to the desktop - Graphics Cards  21

The new Coverage Sample and Transparency AA solutions are a lot more effective than what we see in previous generations of parts.

    NVIDIA expects the sweet spot in multi-monitor gaming to be in the three by 1920×1080 monitor categories.  This means that a single card needs to adequately fill 6 million pixels at a rate of 60 times per second at the minimum, and this is not counting anti-aliasing.  Throw in 4X or 8X AA, and we are looking at essentially 48 million pixels 60 times a second for smooth framerates (though MSAA does not work exactly that way, there is still a lot of work being done by the ROPS).

NVIDIA GF100 Architecture Preview - Fermi brings DX11 to the desktop - Graphics Cards  22

The new AA functionality can have a programmable scatter pattern for both MSAA and CSAA.  This should help cut down on banding and shadowing artifacts by dynamically and randomly adjusting sample positions.

    NVIDIA’s solution to this is to massively increase the throughput of their ROP units.  There are now six ROP partitions, each of which contains eight ROP units a piece.  These 48 ROP units have been reworked to improve efficiency again over previous architectures.  Pure pixel pushing performance has taken a dramatic increase with GF100.  Raw pixel performance simply overshadows everything which has come before.

NVIDIA GF100 Architecture Preview - Fermi brings DX11 to the desktop - Graphics Cards  23

Increased quality and increased performance are the primary reasons behind the changes NVIDIA has made.  This is simply not a case of, “We offered better quality, but you are taking a hit in the pants in terms of performance.”

    Not everyone will utilize 3D Vision and multi-monitor support, so it seems all that extra pixel pushing power would be useless even for people running 30” monitors.  Happily for us image quality junkies, NVIDIA has again redefined what image quality should be.

NVIDIA GF100 Architecture Preview - Fermi brings DX11 to the desktop - Graphics Cards  24

Going from regular 8x MSAA to 32x CSAA (8x MSAA + 24 coverage samples) only results in a 7% drop in performance.

    Coverage Sample AA was introduced with the G80 architecture, and it was one of my most favorite things about those products.  While 4X AA looked good, going to 4X AA with 8 coverage samples gave some of the cleanest scenes a person could want at the time.  Plus CSAA was fairly easy on performance, as it was around 10% slower than standard 4X AA.  With the GF100 NVIDIA introduces a new CSAA implementation.  It can take up to 24 coverage samples and 8X AA.  It also adds in coverage point shifting algorithms to further fool the human eye into not seeing patterns and banding.  On top of all this is a new transparency AA algorithm which significantly improves its performance and quality.  Obviously more testing needs to be done on these new features to see how well they work, but NVIDIA has taken another large leap over AMD in terms of final anti-aliased output with their new solution.

« PreviousNext »