Widening the Offerings
AMD doesn’t fail to impress with their 2nd Generation Threadrippers
Today, we are talking about something that would have seen impossible just a few shorts years ago— a 32-core processor for consumers. While I realize that talking about the history of computer hardware can be considered superfluous in a processor review, I think it's important to understand the context here of why this is just a momentous shift for the industry.
May 2016 marked the launch of what was then the highest core count consumer processor ever seen, the Intel Core i7-6950X. At 10 cores and 20 threads, the 6950X was easily the highest performing consumer CPU in multi-threaded tasks but came at a staggering $1700 price tag. In what we will likely be able to look back on as the peak of Intel's sole dominance of the x86 CPU space, it was an impossible product to recommend to almost any consumer.
Just over a year later saw the launch of Skylake-X with the Intel Core i9-7900X. Retaining the same core count as the 6950X, the 7900X would have been relatively unremarkable on its own. However, a $700 price drop and the future of upcoming 12, 14, 16, and 18-core processors on this new X299 platform showed an aggressive new course for Intel's high-end desktop (HEDT) platform.
This aggressiveness was brought on by the success of AMD's Ryzen platform, and the then upcoming Threadripper platform. Promising up to 16 cores/32 threads, and 64 lanes of PCI Express connectivity, it was clear that Intel would for the first time have a competitor on their hands in the HEDT space that they created back with the Core i7-920.
Fast forward another year, and we have the release of the 2nd Generation Threadripper. Promising to bring the same advancements we saw with the Ryzen 7 2700X, AMD is pushing Threadripper to even more competitive states with higher performance and lower cost.
Will Threadripper finally topple Intel from their high-end desktop throne?
Threadripper 2990WX | Core i9-7980XE | Threadripper 2950X | Core i9-7900X | Core i7-8700K | Ryzen 7 2700X | |
---|---|---|---|---|---|---|
Architecture | Zen+ | Skylake-X | Zen+ | Skylake-X | Coffee Lake | Zen+ |
Process Tech | 12nm | 14nm+ | 12nm | 14nm+ | 14nm++ | 12nm |
Cores/Threads | 32/64 | 18/36 | 16/32 | 10/20 | 6/12 | 8/16 |
Base Clock | 3.0 GHz | 2.6 GHz | 3.5 GHz | 3.3 GHz | 3.7 GHz | 3.7 GHz |
Boost Clock | 4.2 GHz | 4.2 GHz | 4.4 GHz | 4.3 GHz | 4.7 GHz | 4.3 GHz |
L3 Cache | 64MB | 24.75MB | 32MB | 11MB | 12MB | 20MB |
Memory Support | DDR4-2933 (Quad-Channel) | DDR4-2666 (Quad-Channel) | DDR4-2933 (Quad-Channel) | DDR4-2666 (Quad-Channel) | DDR4-2666 (Dual-Channel) | DDR4-2933 (Dual-Channel) |
PCIe Lanes | 64 | 44 | 64 | 44 | 16 | 16 |
TDP | 250 watts | 165 watts | 180 watts | 140 watts | 95 watts | 105 watts |
Socket | TR4 | LGA-2066 | TR4 | LGA-2066 | LGA1151 | AM4 |
Price (MSRP) | $1799 | $1999 | $899 | $1000 | $349 | $329 |
One thing that is clear from the specifications above is how stale Intel's HEDT lineup now seems compared to Threadripper. While Skylake-X and the original Threadripper were announced at the same event, Computex 2017, and the i9-7900X launched before Threadripper, we have yet to hear any information out of Intel about a refresh.
Even more so, Intel's current top-of-the-line HEDT offering remains based on the Skylake microarchitecture. While Kaby Lake and Coffee Lake didn't provide IPC improvements, these high core count processors are still missing any frequency advancements made in subsequent generations and (slight) process node improvements.
On an architectural level, the 2nd Generation Threadripper processors are using the same Zen+ core that we first saw launched with the Ryzen 7 2700X. As a refresher, Zen+ is mostly a process node shrink, from 14nm to 12nm, but also provides some improvements to cache latency, support memory frequencies (DDR4-2933 in this case), and IPC. For more details, you can check out our review of the Ryzen 7 2700X and Ryzen 5 2600X from earlier this year.
While 2nd generation Threadripper doesn't introduce any more new architectural elements to the Zen family, it does have some stark changes compared to the original Threadripper in the form of segmentation between the Threadripper X and WX series.
The Threadripper X-series, consisting of the 2920X 12-core and 2950X 16-core processors are being targeted by AMD to more of a gamer/enthusiast audience, just like the previous Threadripper.
Constructed identically to the original Threadripper processors launched last year, the X-series consists of two Zen+ dies, each featuring 2 Core Complex (CCX) units. These two dies, each containing a dual-channel memory controller and 32 lanes of PCI Express, are connected through Infinity Fabric.
While this Infinity Fabric link does provide some detriments in the form of latency, these ramifications are well known at this point, as we discussed in our Threadripper 1950X review. This flexible configuration also allows the user to choose between a Unified (UMA) or Non-Unified Memory Access (NUMA) arrangement.
However, the real changes with 2nd generation Threadripper come with the introduction of the WX-series
Geared towards workstation and professional users, the WX-series Threadripper is a hybrid between the Threadripper and AMD's server CPU offering, EPYC.
To hit core counts of 24 and 32, AMD needed to move to a 4-die configuration for the WX-series processors. However, this presents some interesting challenges. If AMD decided to move to a full EPYC configuration with a total of 8 channel memory controller, and 128 lanes of PCI Express, then compatibility among different Threadripper CPUs would be shaky at best on the motherboard level.
Instead, AMD is simply connecting the additional two Zen+ dies through Infinity Fabric, and ignoring their memory controller and PCI Express capabilities. These new dies depend on the rest of the processor for both memory and PCI Express access.
Due to this, the WX-series Threadripper processors must remain in a NUMA configuration, and present themselves as four individual NUMA nodes to an operating system, akin to a quad-CPU system. Additionally, the Infinity Fabric link between each of these dies is effectively running at half the speed of the 2-die arrangement found with the X-series processors.
While the highly expandable of the Zen architecture afford AMD the ability to create the first consumer 32-core processor at a relatively affordable price, there are some notable potential drawbacks to this approach, namely memory latency.
Review Terms and Disclosure All Information as of the Date of Publication |
|
---|---|
How product was obtained: | The product is on loan from AMD for the purpose of this review. |
What happens to the product after review: | The product remains the property of AMD but is on extended loan for future testing and product comparisons. |
Company involvement: | AMD had no control over the content of the review and was not consulted prior to publication. |
PC Perspective Compensation: | Neither PC Perspective nor any of its staff were paid or compensated in any way by AMD for this review. |
Advertising Disclosure: | AMD has purchased advertising at PC Perspective during the past twelve months. |
Affiliate links: | This article contains affiliate links to online retailers. PC Perspective may receive compensation for purchases through those links. |
Consulting Disclosure: | AMD is a current client of Shrout Research for products or services related to this review. |
“Due to this, the WX-series
“Due to this, the WX-series Threadripper processors must remain in a NUMA configuration, and present themselves as four individual NUMA nodes to an operating system, akin to a quad-CPU system. Additionally, the Infinity Fabric link between each of these dies is effectively running at half the speed of the 2-die arrangement found with the X-series processors.”
What are yoh refering to here? AFAIK, it is fully connected in the 4 die threadripper, just like it is in Epyc. In the two die variant, you only have one link between the two die and that is it. In the 4 die variant, they have 3 links in use each to connect to the other 3 die with a single hop latency. I don’t think I would refer to anything as half speed other than the memory bandwidth. I suspect that windows does not have the necessary NUMA optimizations to handle such a configuration properly anyway. I would be running linux on such a system. It gets a lot of use in HPC and can handle, in some cases, thousands of processor cores with a wide variety of memory configurations.
The mp3 encode as a benchmark does seem a bit odd. The gamming benchmarks, while not really odd, are of little importance. If you are going to buy a $900 or $1700 dollar processor for gaming at 1080p, unless you are using a software renderer, it would be a compelete waste. For game developers, this might still be a good system, assuming you are a developer capable of making your game perform well with many cores available, or at least not crash on start-up. As noted, windows looks like a problem here. It might have been good to test at 4k, just to see if it is graphics card limited, or whether the cpu is the bottleneck. It could hit windows scaling issues though. Also, nvidia’s driver is probably a near worst case scenario on any system that doesn’t have a single, last level cache. It seems to have a lot of fine grained, thread to thread communication. Maintaining a single last level cache with good latency is a major bottleneck to scaling to more cores, so it would be better in most cases if it would just go away, and developers would optimize their software for multiple core clusters They have to do that anyway for the consoles with similar 4 core cluster architectures. I wouldn’t be surprised to see cell phones go with core clusters also, due to better power consumption.
Well, off to look for linux compile benchmarks on Threadripper.
I stumbled on this customer
I stumbled on this customer review of the TR 1900X at Newegg:
“- Large 20% memory performance difference between NUMA and non-NUMA settings.
“On my system, the NUMA setting (memory interleave on) for some reason reduces CPU performance by
about 15%, while boosting RAM performance by 20%.”
…
“Wish amd could improve the memory controller and reduce CPU-RAM latency to competitive levels.”
Perhaps this is something to consider, and compare, when benching TR2 CPUs.
p.s. I believe der8auer at
p.s. I believe der8auer at YouTube switched memory interleave ON when running 2 x ASUS Hyper M.2 x16 add-in cards with 8 x Samsung 960 Pro NVMe SSDs:
https://www.youtube.com/watch?v=9CoAyjzJWfw
fast-forward starting around 7:30 on the counter
for the BIOS setup in that video
@ 8:07 on the counter: “Memory Interleaving”
Paul Alcorn’s recent article
Paul Alcorn’s recent article is a fun and easy read:
“AMD Ryzen Threadripper 2 vs. Intel Skylake-X:
Battle of the High-End CPUs” (August 14, 2018)