Synthetics and Scientific
Geekbench 4.2.3
Starting off with a single-threaded synthetic benchmark, we see gains of about 6% generationally from the 1950X to the 2950X.
Ultimately, the new Threadripper CPUs fall about 2% behind the Ryzen 7 2700X in single-threaded performance. Likewise, there is an over 6% performance gap between the new Threadripper CPUs and Intel's Skylake-X CPUs.
In Geekbench's multi-threaded test, we see some unexpected results. It appears that the benchmark has issues handling massive amounts of threads, like the 64 found in the 2900WX, making the 16-core 2950X score higher in comparison.
Euler3D
This fluid dynamics simulation is very CPU and memory intensive. From the benchmark source website:
"The benchmark test case is the AGARD 445.6 aeroelastic test wing. The wing uses a NACA 65A004 airfoil section and has a panel aspect ratio of 1.65, a taper ratio of 0.66, and a 45-degree quarter-chord sweep angle. This AGARD wing was tested at the NASA Langley Research Center in the 16-foot Transonic Dynamics Tunnel and is a standard aeroelastic test case used for validation of unsteady, compressible CFD codes. Figure 1 shows the CFD predicted Mach contours for a freestream Mach number of 0.960.
The benchmark CFD grid contains 1.23 million tetrahedral elements and 223 thousand nodes. The benchmark executable advances the Mach 0.50 AGARD flow solution. Our benchmark score is reported as a CFD cycle frequency in Hertz."
Similarly, despite the thread count advantage, the 2990WX falls short to the 2950X in the Euler3D benchmark. This performance issue is likely due to the 4 NUMA node configuration of the 2900WX, versus the unified memory configuration of the 2950X.
However, the Skylake-X processors far and away take the performance crown, with up to double the score of the 2950X in the 32T test.
7-Zip Compression
Compression in 7-Zip sees similar results at the previous test, where the 2990WX simply doesn't seem to performance scale above 24T, and the Skylake-X processors outperform the Threadripper 2950X.
“Due to this, the WX-series
“Due to this, the WX-series Threadripper processors must remain in a NUMA configuration, and present themselves as four individual NUMA nodes to an operating system, akin to a quad-CPU system. Additionally, the Infinity Fabric link between each of these dies is effectively running at half the speed of the 2-die arrangement found with the X-series processors.”
What are yoh refering to here? AFAIK, it is fully connected in the 4 die threadripper, just like it is in Epyc. In the two die variant, you only have one link between the two die and that is it. In the 4 die variant, they have 3 links in use each to connect to the other 3 die with a single hop latency. I don’t think I would refer to anything as half speed other than the memory bandwidth. I suspect that windows does not have the necessary NUMA optimizations to handle such a configuration properly anyway. I would be running linux on such a system. It gets a lot of use in HPC and can handle, in some cases, thousands of processor cores with a wide variety of memory configurations.
The mp3 encode as a benchmark does seem a bit odd. The gamming benchmarks, while not really odd, are of little importance. If you are going to buy a $900 or $1700 dollar processor for gaming at 1080p, unless you are using a software renderer, it would be a compelete waste. For game developers, this might still be a good system, assuming you are a developer capable of making your game perform well with many cores available, or at least not crash on start-up. As noted, windows looks like a problem here. It might have been good to test at 4k, just to see if it is graphics card limited, or whether the cpu is the bottleneck. It could hit windows scaling issues though. Also, nvidia’s driver is probably a near worst case scenario on any system that doesn’t have a single, last level cache. It seems to have a lot of fine grained, thread to thread communication. Maintaining a single last level cache with good latency is a major bottleneck to scaling to more cores, so it would be better in most cases if it would just go away, and developers would optimize their software for multiple core clusters They have to do that anyway for the consoles with similar 4 core cluster architectures. I wouldn’t be surprised to see cell phones go with core clusters also, due to better power consumption.
Well, off to look for linux compile benchmarks on Threadripper.
I stumbled on this customer
I stumbled on this customer review of the TR 1900X at Newegg:
“- Large 20% memory performance difference between NUMA and non-NUMA settings.
“On my system, the NUMA setting (memory interleave on) for some reason reduces CPU performance by
about 15%, while boosting RAM performance by 20%.”
…
“Wish amd could improve the memory controller and reduce CPU-RAM latency to competitive levels.”
Perhaps this is something to consider, and compare, when benching TR2 CPUs.
p.s. I believe der8auer at
p.s. I believe der8auer at YouTube switched memory interleave ON when running 2 x ASUS Hyper M.2 x16 add-in cards with 8 x Samsung 960 Pro NVMe SSDs:
https://www.youtube.com/watch?v=9CoAyjzJWfw
fast-forward starting around 7:30 on the counter
for the BIOS setup in that video
@ 8:07 on the counter: “Memory Interleaving”
Paul Alcorn’s recent article
Paul Alcorn’s recent article is a fun and easy read:
“AMD Ryzen Threadripper 2 vs. Intel Skylake-X:
Battle of the High-End CPUs” (August 14, 2018)