Media Rendering and Encoding
Cinebench R15
Single-threaded rendering in Cinebench R15 tracks with what we'd expect, with the 2950X and 2990WX showing similar performance to the 2700X.
Cinebench R15's multi-threaded rendering test is the first place where we see the definite advantage of the Threadripper 2990WX. With an over 58% increase in score over the highest-end Intel part, the i9-7980XE.
In a more impressive feat, the Threadripper 2950X is within 5% of the performance of the i9-7980XE, despite a 2-core (4-thread) disadvantage.
Blender 2.79b
In Blender, the 2990WX remains the fastest CPU we've ever tested. However, we start to see diminishing returns on core count in the Gooseberry test, with the 2990WX providing only a 23% lead over the i9-7980XE, as opposed to the 35% performance advantage we saw in the BMW workload.
The 2950X has a strong showing in Blender as well, remaining extremely competitive with the i9-7980XE, and winning in the BMW workload.
POV-Ray 3.7.0
POV-Ray's all CPU test gives a 57% advantage to the 2990WX over the highest-end Intel part, while the 2950X sits in-between the 16 and 18-core Skylake-X options.
Handbrake
Handbrake encoding is tested with a 4K 100mbps source file, and is transcoded to 1080p with a constant quality of 10mbps in a single pass encode. The encoder used is the X264 encoder bundled with Handbrake.
H264 transcoding from 4K to 1080p starts to show a plateau in Handbrake performance scaling. Above 16 cores, the results begin to level out. However, the advantage goes to AMD here with the 2950X taking 8% less time to finish the video encode than the 7980XE.
The Threadripper 2990WX sees a performance regression, likely due to conflicts with the Windows scheduler, handbrake, and massive thread counts.
X264 Benchmark 5.0.1
The 2950X takes the cake on in X264 benchmark, with nearly identical Pass 2 performance to the Intel Skylake-X CPUs, and an almost 20% performance advantage in Pass 1.
Audacity MP3 Encode
MP3 encoding with LAME in Audacity shows little performance scaling, with the single-threaded performance advantages of the 8700K placing it in the fastest position, while all the other CPUs take a similar amount of time to finish the audio encode.
“Due to this, the WX-series
“Due to this, the WX-series Threadripper processors must remain in a NUMA configuration, and present themselves as four individual NUMA nodes to an operating system, akin to a quad-CPU system. Additionally, the Infinity Fabric link between each of these dies is effectively running at half the speed of the 2-die arrangement found with the X-series processors.”
What are yoh refering to here? AFAIK, it is fully connected in the 4 die threadripper, just like it is in Epyc. In the two die variant, you only have one link between the two die and that is it. In the 4 die variant, they have 3 links in use each to connect to the other 3 die with a single hop latency. I don’t think I would refer to anything as half speed other than the memory bandwidth. I suspect that windows does not have the necessary NUMA optimizations to handle such a configuration properly anyway. I would be running linux on such a system. It gets a lot of use in HPC and can handle, in some cases, thousands of processor cores with a wide variety of memory configurations.
The mp3 encode as a benchmark does seem a bit odd. The gamming benchmarks, while not really odd, are of little importance. If you are going to buy a $900 or $1700 dollar processor for gaming at 1080p, unless you are using a software renderer, it would be a compelete waste. For game developers, this might still be a good system, assuming you are a developer capable of making your game perform well with many cores available, or at least not crash on start-up. As noted, windows looks like a problem here. It might have been good to test at 4k, just to see if it is graphics card limited, or whether the cpu is the bottleneck. It could hit windows scaling issues though. Also, nvidia’s driver is probably a near worst case scenario on any system that doesn’t have a single, last level cache. It seems to have a lot of fine grained, thread to thread communication. Maintaining a single last level cache with good latency is a major bottleneck to scaling to more cores, so it would be better in most cases if it would just go away, and developers would optimize their software for multiple core clusters They have to do that anyway for the consoles with similar 4 core cluster architectures. I wouldn’t be surprised to see cell phones go with core clusters also, due to better power consumption.
Well, off to look for linux compile benchmarks on Threadripper.
I stumbled on this customer
I stumbled on this customer review of the TR 1900X at Newegg:
“- Large 20% memory performance difference between NUMA and non-NUMA settings.
“On my system, the NUMA setting (memory interleave on) for some reason reduces CPU performance by
about 15%, while boosting RAM performance by 20%.”
…
“Wish amd could improve the memory controller and reduce CPU-RAM latency to competitive levels.”
Perhaps this is something to consider, and compare, when benching TR2 CPUs.
p.s. I believe der8auer at
p.s. I believe der8auer at YouTube switched memory interleave ON when running 2 x ASUS Hyper M.2 x16 add-in cards with 8 x Samsung 960 Pro NVMe SSDs:
https://www.youtube.com/watch?v=9CoAyjzJWfw
fast-forward starting around 7:30 on the counter
for the BIOS setup in that video
@ 8:07 on the counter: “Memory Interleaving”
Paul Alcorn’s recent article
Paul Alcorn’s recent article is a fun and easy read:
“AMD Ryzen Threadripper 2 vs. Intel Skylake-X:
Battle of the High-End CPUs” (August 14, 2018)