Stream Processing Features

Accelerated Computing Hardware

Just as we saw when the ATI R580 was released, AMD is pushing hard for the acceptance of the GPU as a general purpose processor for some other specific applications besides graphics.  The R600 saw even more development work put into the idea of stream and/or accelerated computing.

AMD ATI Radeon HD 2900 XT Review: R600 Arrives - Graphics Cards 157

This diagram should look familiar: it’s the R600 architecture we saw before but with some renaming to indicate usage on the GPGPU side of things.  Basically, all of the features we looked at before can be “double dipped” and used in another fashion to allow for friendlier computing on non-graphics data.

First, the thread generator was specifically optimized for both low-latency and high throughput thread generation.  The lower-latency would allow for interactive compute applications to run on the R600 (more serial items like AI) while the high throughput would be used for graphics and other large compute tasks like imaging and high-performance computing.  I don’t think you’ll find the R600 running Windows Vista any time soon so don’t expect that kind of low-latency on this chip quite yet.

The thread scheduler and the attached caches allow for unlimited application length (important for general purpose computing) as well as unlimited constants for developers to access quickly.  The SPUs are where all the magic happens though: 320 stream processors support both floating point ad integer ops while remaining in IEEE754 compliance.

In the top left you’ll find the parallel DMA engine that can be used to maximize usage of the PCIE bandwidth and maintain a closer CPU to stream compute parallelism. 

Programming Model

Of course having all the hardware in the world is only useful if you have the software to back it up.  NVIDIA has been pushing forward with their CUDA project while AMD has taken a slightly different approach: open up the hardware.

AMD ATI Radeon HD 2900 XT Review: R600 Arrives - Graphics Cards 158

The CTM (close to hardware) abstraction layer that ATI announced back in 2005 is still at work here on the R600 and allows for very detailed control of the hardware by the software developer.  NVIDIA’s CUDA project is more high level, provids an easier programming experience but less controllable performance.  With the merger of AMD and ATI they have combined to form an “AMD Runtime” that is forward compatible and works the world of multi-core AMD processors into the mix as well.

By helping to add libraries of functions, support for more compiler extensions and more developer assistance AMD is hoping to get more and more developers into the stream computing world.  The idea is large and an incredibly huge undertaking, but getting developers access to the kind of processing power GPUs provide will change the way processors are developed forever.

« PreviousNext »