Rendering Performance – OpenCL and CUDA

LuxMark 3.1

GPGPU compute performance is a big part of any modern GPU design, especially in the workstation environment. LuxMark is a long-standing OpenCL benchmark, based on the LuxRender engine and provides a good look at how different GPU architectures compare in typical OpenCL workloads. Today we are testing our field of graphics cards in the most compute intensive scene, Hotel.

Looking at LuxMark's Hotel scene, we a more competitive result than expected. While AMD GPUs traditionally have an advantage over NVIDIA's options in OpenCL, here we see a chart that scales well based on rated GPU compute performance. The around 8.5 TFLOPS of a single AMD Fiji XT GPU is neck and neck with the 8.9 TLOPS-rated P5000.

OpenCL scales well to multiple compute devices, meaning a very impressive score for the Radeon Pro Duo when both GPUs are being utilized. 

Blender 2.78b

Blender is a popular open-source project for 3D modeling and animation. Blender supports both OpenCL and CUDA pipelines, which allows for an interesting comparison across GPU vendors.

It's important to note that we tested Blender with the default settings. This means that the tile size is not necessarily optimized for the specific GPUs we were testing. Additional performance should be possible by experimenting with tile size for each specific GPU.

Looking at the results, it appears that the CUDA pipeline provides benefits over the OpenCL code path. Using OpenCL, the Radeon Pro Duo only provides a modest improvement with it's peak 16.38 TFLOPS of compute performance in dual GPU compared to a single Quadro P5000.

3DS Max (NVIDIA Iray+)

3DS Max is a professional-level 3D modeling, animation, and rendering application. Due to its popularity, there are many rendering engines available for use in 3ds Max. In this test, we are looking at the NVIDIA developed Iray+ engine. This rendering application uses CUDA, and cannot be run on the AMD Radeon Pro Duo (it can run exclusively on the CPU on computers with AMD GPUs).

We wanted to test Iray to get an idea of the relative performance between the Quadro options are looking it in a real-world application. As you can see, even the sub-$400 Quadro P2000 provides a huge performance improvement over ray tracing on the Intel i7-5960X 8-Core CPU. 

While the Quadro P4000 renders the scene 59% faster than the P2000, the performance increase going to the P5000 is just 15%. While these results will vary depending on workload, it shows the possible dimishing returns as you increase GPU power.

