The piece is definitely worth a read though for anyone interested in how the GT200 and CUDA function.
Figure 3 above shows the system architecture of three throughput oriented processors, the G80, the GT200 and Niagara II. Note that the caches in the two GPUs are read-only texture caches, rather than the fully coherent caches in Niagara II. The GT200 frame buffer memory interface is 512 bits wide, composed of eight 64 bit GDDR3 memory controllers, compared to a 384 bit wide interface on the previous generation. The memory bandwidth varies across different models, but peaks at 141.7GB/s when the memory controller and memory are running at 1107MHz, approximately 65% higher than the previous generation. On top of a wider and higher bandwidth memory interface, the GDDR3 memory controller coalesces a much greater variety of memory access patterns, improving the efficiency as well as peak performance.