The idea of GPGPU (general purpose graphics processing unit) isn’t new, but the momentum has been gaining on the benefits of GPGPU work since ATI and NVIDIA started pushing it over a year ago. ATI recently made headlines by working with Stanford to produce a GPU-based Folding @ Home client while NVIDIA was quiet on the subject. I think now we’ll all know why — NVIDIA didn’t want to talk up standard GPGPU when they had something much better lined up.
If you paid attention on the previous pages, you have surely noticed that with the changed DX10 brings in stream output and unified pipelines, and with NVIDIA’s work in threading and branching, the G80 architecture is looking more and more like a processor than the GPU we have come to love. But worry not, it’s all for the best! In NVIDIA’s view, the modern CPU is aimed towards multi-tasking; using large cores that are instruction focused (able to do MANY different things) but are not granular in the way a GPU is. Current GPGPU configurations are difficult to use and require programmers to learn graphics APIs and streaming memory in ways they are not used to at all.
NVIDIA CUDA (Compute Unified Device Architecture) attempts to remedy those GPGPU and CPU issues; by adding some dedicated hardware to G80 for computing purposes, NVIDIA is able to design a complete development solution for thread computing. Probably the most exciting point is the fact that NVIDIA is making available a C compiler for the GPU that will work to thread programming for parallel data and that scales with new GPUs as they are released; super computing developers aren’t interested in re-tooling their apps every 6 months! NVIDIA is working to create a complete development environment for programming on their GPUs.
This first example of how this might be used was physics; a likely starting point knowing what we know about GPGPU work. The work is about finding the positions of the flying boxes by doing some work on the old position and taking into account the velocity and time.
Looking at how the CPU would solve this problem, based on solving one equation at a time (maybe 2 or 4 with multiple core processors now), we can see that the design is inefficient for solving many of these identical equations in a row. Operating out of the CPU cache, the large complete control logic is used to keep the CPU busy, but even it can’t do anything about the lack of simultaneous processing that curent CPUs offer.
The current generation of GPGPU options would solve this problem faster due to the parallel nature of GPUs. The shaders would solve the equations and could share information using the video memory.
NVIDIA’s CUDA model would thread the equations and the GPUs shaders would be able to share data much faster using the shared data cache as opposed to the video memory.
What this example doesn’t take into consideration is the need for the threads to communicate during execution; something that is ONLY possible on DX10 capable hardware using stream output. Take an example of calculating air pressure: the equation involves calculating the influences of all neighboring air molecules. Only through the stream output option could the equations being truly run in parallel, using the shared cache to talk to each other much faster than the current generation of GPUs could in GPGPU architectures.
The G80 has a dedicated operating mode specifically for computing purposes outside of graphics work. It essentially cuts out the unncessary functions and unifies the caches into one cohesive data cache for ease of programming. To super computing fanatics, the idea of having 128 1.35 GHz processors working in parallel processing modes for under $600 is probably more than they can handle — and NVIDIA hopes they buy into it and is doing all they can to get the infrastructure in place to support them.
Some quick performance numbers that NVIDIA gave us comparing the GeForce 8800 to a dual core Conroe running at 2.67 GHz show significant leaps. Ranging from 10x speed up on rigid body physics to 197x on financial calculations (think stock companies), if these applications can come to fruition, it would surely bring a boom in the super computing era.