Thread to Thread Latency and The X299 Platform

Our testing and evaluation of the thread-to-thread communication latency on Ryzen got a lot of attention during the launch of that product (as well as its corresponding discussion around 1080p gaming performance). With a simple latency evaluation tool, we are able to measure the roundtrip latency from one logical thread to every other logical thread, essentially giving us the core to core ping times that help evaluate inter-chip latency.

The data here is difficult to read, but we can get some very valuable information from it. The L2 bandwidth measures around 11ns or so, as shown in the paired logical core ping times. (This is how long it takes two threads on the same core to communicate.) However, the L3/LLC ping times are much higher, nearing 100ns in several cases.

Let’s simplify this and add some comparative data. In the graph below, we are only showing the ping times from logical core 0 to the other logical cores on each processor.

The Core i9-7900X demonstrates the highest thread-to-thread communication times of any tested system through the first 8 cores. Both the Core i7-6950X (10-core) and the Core i7-5960X (8-core) are much lower, around 80ns, while still running on the ring bus architecture. The Core i7-7700K has a much smaller ring with only four-cores, and, as such, it has the fastest thread-to-thread communication of all the Intel parts.

The AMD Ryzen 7 1800X processor has a significant jump at the half point, where the threads move to another CCX, making them dependent on the L3 cache for communication. This puts the 7900X in an interesting spot; it has a dramatically higher latency to the LLC versus the previous generation Intel processors.

I also did some quick testing to see how DDR4 memory speed affected the latency on the 7900X. With the new mesh architecture, does increasing memory clock have any effect?

Moving from DDR4-2400 up to DDR4-2800 does show us some small gain, as does the overclocking of the cache frequency in the ASUS UEFI from 2400 MHz to 2800 MHz. These changes don’t bring the LLC latency down to the levels of Broadwell-E or Kaby Lake, but it does indicate that some tweaking can help alleviate any performance issues if they arise.

The X299 Platform and ASUS Prime X299-Deluxe

Though we stuck with the X99 chipset and socket for Broadwell-E, Skylake-X and Kaby Lake-X will be using a new chipset and a new LGA2066 socket. We have already seen motherboards leaking out and we should have numerous announcements through Computex this week (check back to pcper.com for all the news!) but now we have preliminary details on what changes it offers.

Fundamentally it looks pretty much the same. The biggest change is one of connectivity – the X299 chipset will now offer up to 24 PCIe 3.0 lanes, mirroring the capability of the Z270 chipset. Compared to the X99 chipset, that only included 8 lanes of PCIe 2.0, this is a significant increase. The DMI connection between the chipset and the processor is also upgrade to DMI 3.0, giving us a doubling of peak throughput (4GB/s rather than 2GB/s). That helps to alleviate the bottleneck from the chipset to the CPU, though a highly saturated system utilizing chipset-based connectivity could still hit speed limitations.

I would expect that concern to be alleviated for a majority of consumers though as the 28 or 44 lanes of PCIe Gen 3.0 provided by the processors (Skylake-X) allows for multi-GPU configurations at x16 or x8 speeds with room for PCIe NVMe storage to boot. Either way, it’s impressive that an X299 system with a 10-core or higher processor would have access to 68 lanes of PCI Express in total – 44 from the CPU and 24 from the chipset.

Default memory speed gets a jump from 2133 MHz on the X99 systems to 2666 MHz on the new X299 systems, giving the new platform another potential advantage. We also have 8x SATA 3 channels, 10x USB 3.0 ports and support for 3-way RAID of PCIe and NVMe drives as a part of this system platform.

Morry is hard at work on reviews for several upcoming X299 motherboards, so I won’t spend too much time here going over it, but the Intel HEDT platforms have always been a pricey (yet impressive) spot in the market. ASUS sent us the Prime X299-Deluxe motherboard to help in our testing and it includes features like 802.11ad, Thunderbolt 3, and Intel VROC RAID support. I fully expect options from ASUS, MSI, and Gigabyte to have truly impressive feature sets, if you are willing to pay.

One advantage to having a platform that is an evolution of a previous is that coolers seem to be working without issue. The Corsair H100i GTX operated exactly as intended on the LGA2066 socket as it did on the LGA2011 socket all these past years. It’s nice when you don’t have to uproot your entire ecosystem every once in a while.

« PreviousNext »