Caustic Graphics finds the answerJames was quick to iterate to me on a few occasions that they initially had no desire to build hardware but as they developed their unique software algorithms it became apparent that there simply was not a processor on the market today, either in GPU or CPU form, that did exactly what they needed. These new algorithms attempt to find order in what we all see as the randomness of ray tracing and they attempt to drastically increase the memory locality for efficiency reasons and the founders decided that required a custom co-processor. But they seem to have played the game intelligently by continuing to utilize hardware where it is most efficient: the Caustic card will handle only the operations that modern components are inefficient at yet they still leverage the power of the GPU for pure shading horsepower.
Obviously much of what the Caustic co-processor does and how it interacts with the rest of the system is being kept a secret simply due to intellectual property and Caustic’s obvious desire to keep their “secret sauce” to themselves. I was able to get quite a bit of information though in my interview that I think sheds some light on the technology.
The RTPU (ray tracing processing unit) that Caustic has developed handles only the ray tracing portion of the calculations required for a complete rendering system. When a ray is created in the shader software the entire process of that ray bouncing around the scene and activating shaders is handled solely by the Caustic card. The magic in the design is that they claim to have figured out a software and hardware combination that allows them to compute tremendous numbers of rays with much higher bounce allowances than modern ray tracing systems while maintaining real-time efficiency. The CausticGL software then passes the resulting shaders that need to be run on various pixels to the GPU in such a way that it can efficiently run the code on its architecture. That means what the Caustic hardware is doing is essentially taking what would normally be random shader results from a ray tracing algorithm and compiling the data in a format that the GPU is used to seeing – code that can run with high memory locality. This means the GPU can do what the GPU does very well in terms of mass shading power while leaving the work that it is less efficient at to the Caustic software and hardware design.
What makes the Caustic Graphics approach even more intriguing is that it is completely independent of the specific graphics in use – the GPU literally has no idea that the Caustic hardware is in the machine and doesn’t need to be aware of its function. This is accomplished thanks in large part to the new OpenGL-based graphics API that the Caustic hardware requires but I will touch on that a bit later in the article.
The Caustic Graphics Memory System
With all ray tracing algorithms today there is large amount of setup time devoted to creating and then maintaining a database of triangles and information representing the scene to be rendered. This will usually hinder doing things like changing the geometry in real-time because programmers essentially would have to rebuild the database as changes were made – the data tree for the scene could shift quite a bit. That is why a lot of ray tracing demos are simply moving the camera around a pre-built scene; the geometry stays fixed in that case. And because in some cases the with high end production environments we can be talking about 400-500 million triangles a scene, the memory issue for ray tracing is very important.
While the compute power of the Caustic Graphics hardware is still wrapped in mystery we do have a bit more information on how the memory architecture works on the card. The system on the CausticOne (the name of the first hardware available) is able to manipulate objects in real time with little to effect on performance, according to designers. Again this particular piece of knowledge in “how” it is accomplished is part of the “secret sauce” that they would rather not reveal at this time.
The CausticOne only uses standard DDR2 SO-DIMM memory
As I mentioned before, when a traditional GPU does ray tracing the poor locality of reference that develops from high numbers of rays and bounce limits makes the smaller caches on them (16KB or so) quickly outmatched. When working on a 2GB database of triangles, for example, the cached data will quickly be exhausted and the graphics card will be forced to access memory much more frequently than with rasterization. This is why on modern GPUs most current ray tracing algorithms are memory bound.
I do know that with Caustic’s particular hardware and software implementation memory bandwidth requirements are surprisingly low. The CausticOne card used for the reference design work featured only 64-bit single channel DDR2 SO-DIMM memory configurations; a VERY far cry from the GDDR5 memory controller and bus configurations used in the RV770 design from AMD, for example. Caustic has designed a memory algorithm that allows the ray tracing CPU to optimize computations based on a particular memory dataset fetched from the on-board controller. Again, Caustic is taking what was a very counter-efficient part of ray tracing on a GPU and increased memory locality in a dramatic way.