Memory Mode (UMA/NUMA) and Memory Speed Performance
One of the most interesting aspects of Threadripper from a technological and architecture stand point is this new addition of options surrounding the memory configuration. Previous consumer platforms never had the ability or the need to dive into the weeds surrounding the complicated scenarios of memory locality, threading, and operating systems. Threadripper is a different beast all together, and sometimes when new tech hits the market, there are growing pains as we all learn best practices. This is one of those times.
Below is a summary of performance, comparing our suite of benchmarks and games in both distributed and local modes. We have normalized the data so it can all appear on a single graph, giving us a look at the impact of NUMA/UMA at a glance. We are using the Threadripper 1950X CPU with 32GB of memory running at DDR4-2400.
There is a lot of data here, some of it uninteresting but some very telling of the differences between these two modes. Take a look at the Geekbench results as our first example. The single threaded score is 4% in the Local/NUMA mode but the Distributed/UMA mode is 9% faster in the multi-threaded results. This benchmark shows us the best and worst case scenarios for Threadripper; yes you can adjust the memory mode to your suit your needs (which is great) but also that you may have to adjust the memory mode to suit your needs (which could be a concern).
Our Blender rending times show an advantage to the UMA mode, at least with the longer Gooseberry workload, with a 7% performance advantage. The BMW workload has a different profile composition and as such we see almost no difference between the memory mode settings.
X264 benchmark results show a 7% and 14% advantage to the UMA mode depending on the pass you compare. Cinebench R15 shows almost no difference between UMA and NUMA settings due to its highly localized memory patterns (that fit inside L2).
The gaming tests at 1080p, which we limited to in this graph because 1440p and 4K results were homogenous, show a variance in performance deltas. Ashes of the Singularity: Escalation and Ghost Recon Wildlands have slightly better performance in the UMA/Distributed mode, while Civ 6 and Rise of the Tomb Raider prefer the NUMA/Local mode. Other titles have very little shift in performance based on these modes.
Clearly there are instances that prove advantageous for either memory mode that you put your new Threadripper processor into. However, as I see it today, the only deltas that stand out to me lean in favor of leaving your system in the UMA/Distributed performance mode as this will ensure you have the best possible performance for highly threaded workloads typical of content creators and developers without remembering what mode you are in and being forced to reboot your PC. For gamers that desperately cling to needing the absolute best performance for their platform, a move into NUMA/Local (or even legacy compatibility) mode will get you there.
Impact of Memory Speed on Threadripper Performance
Just like we did in the section previous, we will be showing you normalized data comparing our Threadripper 1950X running at DDR4-2400 and DDR4-3200 memory speeds. We are conforming to the Distributed/UMA mode for our testing here, as I believe it offers the best overall performance to consumers looking at a 16-core, high-end platform.
Unlike the memory access mode testing above, the memory speed performance changes are well understood with Ryzen. With only a couple of exceptions, running memory at 3200 MHz rather than 2400 MHz results in higher performance across the board. Our 1080p gaming tests are the most impactful, with gains as high as 8-11% in four different titles including Tomb Raider, Hitman, Deus Ex, and Civ 6. Geekbench sees a 5-6% advantage in both single and multi-threaded results and of course, our SiSoft Sandra memory performance spikes up by 14% as well.
Our Performance Selection
As we dive into the heart of our performance results, I feel it warrants a discussion to tell you why I have decided on Distributed/UMA and 3200 MHz memory as the standard we will be using for Threadripper testing today and going forward. I firmly believe that the prospect of adjusting memory modes and rebooting between gaming sessions and rendering (for example) is a trade-off that no reasonable consumer would make. Extreme gaming enthusiasts that want the very best out of their hardware may do it, but in general, that consumer is going to be more interested in other hardware platforms. Instead, the Threadripper and Skylake-X buyer has a “content creation first, gaming second” mentality, by and large. While they value high performance gaming, the time and hassle of interrupting their workflow outweighs those few added frames per second.
Memory performance is something of a stickler for me too. With the Ryzen 7/5/3 family I used only DDR4-2400 speeds to compare against Intel Core i7/i5/i3 processors using the same DDR4-2400 settings. For Threadripper, I am running tests at DDR4-3200 by default, with the assumption that a consumer buying a $799/999 processor from AMD will also buy higher performing memory. Along with that purchase is the knowledge to run it at the advertised speed. We will likely adjust the Intel and lower-tier Ryzen performance settings in future reviews.
I’m very curious on how will
I’m very curious on how will the two dies and memory modes affect virtualization? I’ve only experimented with VM in the past but is it possible to run two Hexa-cores windows VM and with each individual memory nodes assigned to each VM?
Are you setting the Blender
Are you setting the Blender tile sizes to 256 or 16/32?
Just wondering since an overclocked 5960x gets 1 minute 30 seconds on the BMW at 16×16 tile size. Significant difference that shouldn’t just be a result of the OC.
For reference: 256 or 512 are for GPU and 16 or 32 are for CPU – at least for getting the best and generally more comparable results to what we get over at BlenderArtists.
When reading is not enough,
When reading is not enough, the mistakes are OVER 9000!
“If you content creation is your livelihood or your passion, ”
” as consumers in this space are often will to pay more”
” Anyone itching to speed some coin”
” flagship status will be impressed by what the purchase.”
” but allows for the same connectivity support that the higher priced CPUs.”
“”””Editor””””
Now just point me to the
Now just point me to the pages… 😉
Nice to see a review with
Nice to see a review with more than a bunch of games tested. Keep up the good work!
Should not a test like 7-zip
Should not a test like 7-zip use 32 threads as max since that is what is presented to the OS??
now it only uses 50% of the threads on TR but 80% on i9-7900x.
Silly performance, looking
Silly performance, looking forward to the 1900X and maybe 1900.
I sometimes wonder why nobody
I sometimes wonder why nobody ever points out that within CCX (4 cores that can allow a lot of games to run comfortably) ZEN has latencies of half those of Intel CPUs. Binding a game to those 4 cores (8 threads like any i7) has significant impact on performance. It does not change memory latencies of course but core to core is much better.
I’m glad someone else noticed
I’m glad someone else noticed this besides myself. I noted this during the Ryzen launch & quickly noted that by using CPU affinity along w CPU priority to force my games to run exclusively within 1 CCX & take advantage of using high CPU processing time on these same CPU cores I could take advantage of this up to a point.
What all this shows to me is that the OS & game developers software need to be revised to better handle this architecture at the core logic level instead of usersAMD having to provideuse methods to try to do this that cannot be used in a more dynamic fashion. I’ve ran some testing on Win 10’s Game Mode & discovered that MS is actually trying to use CPU affinity to dynamically set running game threads to be run on the fastestlowest latency CPU cores to “optimize” game output thru the CPU but it still tends to cross the CPU CCX’s at times if left on it’s own.
What I’ve found is by doing this my games run much smoother w a lot less variance which gives the “feel” of games running faster (actual FPS is the same) due to lower input lag & much better GPU frametime variance graph lines w very few spikes….essentially a fairly flat GPU frametime variance line which is what you want to achieve performance-wise.
Just to note….my box is running an AMD Ryzen 7 1800X CPUSapphire R9 Fury X graphics card w no OC’s applied to either the CPU or GPU.
It’s a step in the right direction but needs more refinement at the OS level……
As expected, performance per
As expected, performance per dollar is crap in single threaded tasks, which most workloads are. Games don’t even use more than 1 or 2 cores.
Yea games only use 2 cores
Yea games only use 2 cores lol
http://i.imgur.com/Hg3Ev5p.png
And “as expected”, we have
And “as expected”, we have yet another Intel shill complaining about gaming performance on a production CPU, which isn’t made for gaming (although it’s not bad in the least and has a longer future as devs code for more than Intel’s tiny core count (under $1000))..
-“performance per dollar is crap in single threaded workloads”…
Well, since these aren’t sold as a single or dual core CPU, performance per dollar as a unit is beyond everything on Intel’s menu.
– “Games don’t even use more than 1 or 2 cores”
Well, I’ve been using a FX-8350 for 2 years now, and I always see all 8 cores loaded up on every single game I play (and I have many). Windows 10 makes use of these cores even when it’s not coded in programs. It would work even better if devs started coding for at least 8 cores, and I believe they will start doing this in earnest now that 8-core CPUs are now considered average core counts (unless you’re with Intel).
You would have been better off stating that core vs core is in Intel’s favor on the 4-core chips and some others, but ironically the “performance per dollar”, as you mention is superior with AMD.. in every way.
What memory are you using,
What memory are you using, and could you name a 64GB kit that works in XMP? And why 3200Mhz over 3600?
Intel is still superior both
Intel is still superior both in raw performance and in perf/$. If you were being objective you wouldn’t have given slapped an editor’s choice on this inferior product.
In Handbrake the 1800x is 40%
In Handbrake the 1800x is 40% slower than the 1950x and in reverse the 1950x is 67% faster than 1800x.
Open cinebench with a TR or
Open cinebench with a TR or even an 1800x. Show me any Intel chip that can come within 20% of the 1950x. The entire Ryzen 7 lineup is king of the “perf/$” category. 1800x = $365 on eBay right now. Look how close it matches with Intel products that are double the price or worse.
If you want to compare single core perf vs Intel, you can win an argument.. at the cost of very high power draw and even worse cash draw. Perf/$ is a dead argument for any Intel fanboy. Find something else. BTW, are you also commenting under “Thatman007” or something? Sound like the same Intel mouthpiece.
Sorry for necroposting, but
Sorry for necroposting, but it really belongs here:
The recent Meltdown vulnerability and its performance implications on Intel CPUs pretty much leveled the playground now. After reading the article and all the comments above I opted for a very good B350 motherboard and a Ryzen 1800X to replace my Core i7 5930K (Haswell). Reason is that my CPU will likely be hit very badly performance wise by the upcoming Windows 10 security update. Intel should pay back 30% to all affected CPU owners, actually…
Reason is that likely I would not gain anything from NUMA, except of the additional complications. So I opt for the easier to manager (lower) power consumption and less noise from cooling as a result.
Thank you for collecting all the great info.