NVIDIA GT200 Architecture (cont’d)Memory Controller Gets an Upgrade
Remember when the G80 was launched as we saw odd frame buffer sizes on 8800 GTX cards like 768MB or 640MB? That was because the G80 used a 384-bit memory controller at a time when the Radeon HD 2000/3000 series cards used up to a 512-bit memory controller. When G92 was introduced it was developed with a 256-bit memory controller and it turned out to be a noticeable hindrance to performance – comparisons of similarly clocked G92 and G80 parts showed the G80 having a big memory advantage.
NVIDIA addresses that in the GT200 design with its own 512-bit memory controller; or more precisely a combination of 8 separate 64-bit memory controllers. Each of the 8 memory controllers is connected to a single block of ROPs as we are accustomed to. This doesn’t mean we’ll only be getting 512MB or 1024MB memory configurations though – as we’ll soon see the GTX 260 actually uses 896MB!
Much has been made recently about AMD’s pre-announcement that their next-generation part would utilize GDDR5 memory technologies; NVIDIA was quick to point out that using technology for technology’s sake is a waste if it does not net you additional performance. The memory controller on GT200 can support either GDDR3 or GDDR4 memory but all the initial boards will be using GDDR3 because NVIDIA doesn’t see the benefits of GDDR4 from a cost/frequency perspective. With GDDR3 supplying sufficient clock speed and data rate per pin to mostly saturate GT200’s memory bus, a move to a solution that is half as wide but twice as fast doesn’t always save you on transistor budget. We’ll have to see how AMD’s technology takes advantage of GDDR5 before really making our committed analysis.
Looking at the Chip “As Big As Your Head”
Keeping mind that NVIDIA is building these on 300mm wafers, let’s look at this shot:
This is probably the first time you can actually look at the wafer shot provided by a company and count, easily, how many GT200s the company could make pending 100% yield. The answer by the way is 95.
This die shot highlights a single shader processor and a cluster of 24 with corresponding memory and logic.
And again, here is the GT200 die with an overlay of all the common GPU functionality: SPs, texture units, ROPs, memory controllers and “mystery logic” in the middle that likely includes the VP2 engine, SLI support and more.
Power Efficiency Increases
Another one of NVIDIA’s key improvements with the GT200 design come in the form of power management and efficiency increases. The new core design is much more granular in the way it powers down segments not being used at any given time in order to save on idle and low processing power consumption. For example, while the G80 used about 80 watts at idle, the G92 used 45 watts while the new GT200 will use only 25 watts at idle. Considering the increased size of the chip and increase in gaming performance, this is an impressive feat.
How is it done? The GT200 integrates some advanced power saving features such as improved clock gating and clock and voltage scaling. NVIDIA even claimed the ability to turn off components unit-by-unit, though I am unsure if this means to each stream processor or to each block of 8 SPs or each block of 24 SPs – my guess in the last option. The slope of power can be more finely adjusted with an order of magnitude more “steps” on the ladder between powering off and full speed. As an example of this, NVIDIA’s Tony Tomasi said that for video decoding the GT200 only has about half as much area powered up than the G92, even with the larger die size taken into consideration.
The Hybrid Power technologies that were introduced with the 9800 GTX and the 9800 GX2 are again present in the GT200 series of graphics cards, but one has to wonder how useful they have become. If the GT200 cards are only using 25 watts at idle as NVIDIA states, then power that last 25 watts off shouldn’t be a big a “boost” in power savings compared to the G92 that used 45+ watts. Oh well, I guess any power savings is good power savings at this point.
If we look just at marketable features besides the obvious of “better performance”, the new architecture doesn’t have much to add. HybridPower still exists as I just discussed, 2-Way and 3-Way SLI support continues and the PureVideo 2 engine that was introduced on the GeForce 9-series is here as well.
Perhaps the only “new” feature is one we couldn’t test yet: PhysX support. Since NVIDIA purchased AGEIA some months ago the promise of running PhysX on your GeForce GPU has been there. The status of the CUDA revision of the PhysX has apparently been going very well – in just two months of work the team has converted soft bodies, fluid and cloth to the GPU successfully with just rigid bodies as the last point remaining.
As far as PhysX performance is concerned, I asked about a crossover point where the GPU and PPU (the dedicated PPU hardware that AGEIA sold in market) performed the same. The PhysX team didn’t have an exact answer yet but said it probably fell in line of a mid-range GeForce 8-series card; 8600 GT or so. What was good to hear was that even with the penalty in “context switching” a single GT200 card should be able to render faster than a single 9800 GTX card with dedicated PPU could have done. Context switching is the process by which a GPU is forced to change states, rendering graphical data versus computing physics or other data; the faster this can occur the less latency the system will see from the inclusion of PhysX and other simulation add-ons.
The GT200 continues to use an external display chip for digital and analog outputs (and inputs if any) which was kind of surprising. Essentially the GT200 outputs a single stream to the dedicated display chip that is responsible for branching out connections like HDMI, DVI, VGA and TV output. NVIDIA claimed this helped with board design, making custom designs much more straight forward for third-parties. This IO chip that NVIDIA is using is also the first to officially support 10bit digital output – for whenever those accompanying monitors start showing up.
One thing you will NOT find on the GT200: official support for the DX10.1 standard. This came as quite a shock to us since NVIDIA has had plenty of time to integrate into their core – AMD has had DX10.1 support in their GPUs since the HD 3800 series was released last year. NVIDIA did say that they have quite a bit of UNOFFICIAL support for DX10.1 features in their GT200 chip but because the DX10 rules state it’s “all or nothing” for claiming a technology conformity, NVIDIA is left with only a DX10 architecture part. They did commit to “working with developers and ISVs that want to use those deferred rendering paths” for any titles; of course we DID have a big fallout from Assassin’s Creed recently that we are curious (but likely to never will) to know the truth on…
Read more about the GT200 and general purpose parallel computing in our separate article: Moving Away From Just a GPU.