Memory Mode (UMA/NUMA) and Memory Speed Performance

One of the most interesting aspects of Threadripper from a technological and architecture stand point is this new addition of options surrounding the memory configuration. Previous consumer platforms never had the ability or the need to dive into the weeds surrounding the complicated scenarios of memory locality, threading, and operating systems. Threadripper is a different beast all together, and sometimes when new tech hits the market, there are growing pains as we all learn best practices. This is one of those times.

Below is a summary of performance, comparing our suite of benchmarks and games in both distributed and local modes. We have normalized the data so it can all appear on a single graph, giving us a look at the impact of NUMA/UMA at a glance. We are using the Threadripper 1950X CPU with 32GB of memory running at DDR4-2400.

There is a lot of data here, some of it uninteresting but some very telling of the differences between these two modes. Take a look at the Geekbench results as our first example. The single threaded score is 4% in the Local/NUMA mode but the Distributed/UMA mode is 9% faster in the multi-threaded results. This benchmark shows us the best and worst case scenarios for Threadripper; yes you can adjust the memory mode to your suit your needs (which is great) but also that you may have to adjust the memory mode to suit your needs (which could be a concern).

Our Blender rending times show an advantage to the UMA mode, at least with the longer Gooseberry workload, with a 7% performance advantage. The BMW workload has a different profile composition and as such we see almost no difference between the memory mode settings.

X264 benchmark results show a 7% and 14% advantage to the UMA mode depending on the pass you compare. Cinebench R15 shows almost no difference between UMA and NUMA settings due to its highly localized memory patterns (that fit inside L2).

The gaming tests at 1080p, which we limited to in this graph because 1440p and 4K results were homogenous, show a variance in performance deltas. Ashes of the Singularity: Escalation and Ghost Recon Wildlands have slightly better performance in the UMA/Distributed mode, while Civ 6 and Rise of the Tomb Raider prefer the NUMA/Local mode. Other titles have very little shift in performance based on these modes.

Clearly there are instances that prove advantageous for either memory mode that you put your new Threadripper processor into. However, as I see it today, the only deltas that stand out to me lean in favor of leaving your system in the UMA/Distributed performance mode as this will ensure you have the best possible performance for highly threaded workloads typical of content creators and developers without remembering what mode you are in and being forced to reboot your PC. For gamers that desperately cling to needing the absolute best performance for their platform, a move into NUMA/Local (or even legacy compatibility) mode will get you there.

Impact of Memory Speed on Threadripper Performance

Just like we did in the section previous, we will be showing you normalized data comparing our Threadripper 1950X running at DDR4-2400 and DDR4-3200 memory speeds. We are conforming to the Distributed/UMA mode for our testing here, as I believe it offers the best overall performance to consumers looking at a 16-core, high-end platform.

Unlike the memory access mode testing above, the memory speed performance changes are well understood with Ryzen. With only a couple of exceptions, running memory at 3200 MHz rather than 2400 MHz results in higher performance across the board. Our 1080p gaming tests are the most impactful, with gains as high as 8-11% in four different titles including Tomb Raider, Hitman, Deus Ex, and Civ 6. Geekbench sees a 5-6% advantage in both single and multi-threaded results and of course, our SiSoft Sandra memory performance spikes up by 14% as well.

Our Performance Selection

As we dive into the heart of our performance results, I feel it warrants a discussion to tell you why I have decided on Distributed/UMA and 3200 MHz memory as the standard we will be using for Threadripper testing today and going forward. I firmly believe that the prospect of adjusting memory modes and rebooting between gaming sessions and rendering (for example) is a trade-off that no reasonable consumer would make. Extreme gaming enthusiasts that want the very best out of their hardware may do it, but in general, that consumer is going to be more interested in other hardware platforms. Instead, the Threadripper and Skylake-X buyer has a “content creation first, gaming second” mentality, by and large. While they value high performance gaming, the time and hassle of interrupting their workflow outweighs those few added frames per second.

Memory performance is something of a stickler for me too. With the Ryzen 7/5/3 family I used only DDR4-2400 speeds to compare against Intel Core i7/i5/i3 processors using the same DDR4-2400 settings. For Threadripper, I am running tests at DDR4-3200 by default, with the assumption that a consumer buying a $799/999 processor from AMD will also buy higher performing memory. Along with that purchase is the knowledge to run it at the advertised speed. We will likely adjust the Intel and lower-tier Ryzen performance settings in future reviews.

« PreviousNext »