The Ryzen Chipsets and Storage Performance
We have already discussed in some detail the Ryzen chipset and who the IO structure differs from this new AMD SoC compared to currently shipping Intel platforms in this space. While the Ryzen processor itself has some integrated connectivity on it, including SATA, USB and PCIe for GPUS and NVMe SSDs, most currently shipping motherboards are using either the X370 or B350 chipset and will utilize connectivity that stems from the chipset itself. The only exception appears to be the 20 lanes of PCIe earmarked for x16 or x8+x8 PCIe slots for GPUs and a x4 PCIe Gen3 connection for high performance storage.
From our previous story on the Ryzen chipset setup:
Ryzen is a slightly different beast when it comes to its I/O functionality. It features a full x16 PCI-E implementation that can be split into 2 x8 for SLI and CrossFire. There are four more PCI-E lanes that connect this CPU to the chipset. Above that it features an I/O stack that can be configured in multiple ways. It supports 2 SATA6G ports that can be mixed and matched with PCI-E lanes supporting NVME. In what is probably going to be the most popular implementation on X370 boards will be the single x4 NVME connection that will provide tremendous I/O bandwidth directly to the CPU. It also features four native USB 3.1 gen 1 ports.
What pops out to me here is that AMD is still supporting SATA Express. It can support up to two of these implementations (each of which is comprised of two SATA6G ports and two lanes of PCI-E 3.0). We may see a handful of boards with this implemented, but considering how nonexistent SATA Express drives are this will not be a popular or much used option. Instead, motherboard vendors will likely opt to simply include the raw SATA connections on board.
The setup that AMD is giving to its partners is extremely flexible in how it can be implemented. We will see a variety of boards sporting differing features that will hopefully fit the needs of a wide swath of consumers. AMD has finally caught up with Intel in base features and they will leverage other 3rd party chips again to differentiate again the loadouts of boards from their partners.
AMD had a very competitive SATA6G implementation with the SB850/950 chips that gave them a feature advantage over Intel. By the time Intel came out with similar numbers of parts we saw that Intel was only about 5% faster overall in performance from AMD. This was a far cry from the previous SATA3G implementations from AMD that were severely lacking in performance and had real compatibility and driver issues. Hopefully this chipset continues to provide good overall performance with SATA, NVME, and USB 3.1.
If there is one slightly disappointing thing with this release it is that the physical implementation of the AM4 socket is not compatible with previous AM3+ and FM2+ coolers. New coolers and brackets will be required to fit the new dimensions of the AM4 socket infrastructure. Some things like All-in-One liquid cooling kits will just require a new bracket while the Wraith coolers from AMD will have to be redesigned to fit. This should not be a significant problem for AMD as these changes can be done quickly with many products that do not have a set mounting mechanism. Popular parts like the Cooler Master 212 EVO will just need a new bracket to fit in with AM4.
It has been a long time since there was some real excitement around AMD’s CPUs from the enthusiast crowd. AMD looks to be providing a strong competitor with Ryzen and they are backing it up with a modern motherboard chipset implementation. It will be feature comparable to what Intel has in the market now and could provide a few more tricks with some clever engineering.
Measuring Storage Performance of the X370
Allyn here with an evaluation of the storage performance across the three major platforms. After Ryan was done with his testing, I snuck in and rearranged things a tad so that we could get as close of a look at the storage specific performance of X370 and Ryzen. Platform configuration was as follows:
- (X370) AMD 1800X @ 3.5 GHz | ASUS Crosshair VI Hero
- (X99) Intel 6900K @ 3.5 GHz | ASUS X99-Deluxe II
- (Z270) Intel 7700K @ 3.5 GHz | ASUS Prime Z270-A
These are the same systems Ryan used in his testing, but I have adopted the methodology he used for his IPC tests. When I typically test SSDs for a review, everything on the test system is cranked to the max. Power management is disabled, CPU and RAM are overclocked as far as I can reliably push them, etc. The intention there is for the system itself to offer the least possible interference with the results of the SSD being tested. For these tests I needed to flip that mentality around, so I am going with the fastest SATA and PCIe-NVMe parts we have tested to date – the Samsung 850 PRO and 960 PRO. Instead of overclocking to the max, I kept BIOS power management settings at defaults and ran the tests twice per system. The first run used the default Windows power management setting of 'Balanced', while the second run was with power management set to 'High Performance'. The latter will be annotated by 'HP' in the charts below.
SATA
- X370: 8x 6Gb ports, all RAID capable
- X99: 6x 6Gb ports RAID capable + 4x 6Gb ports (no RAID)
- Z270: 6x 6Gb ports RAID capable
SATA, 4KB Random
Kicking things off with random access, we can see that everything stays in a mostly tight pack, including AMD, which is surprising given they have historically fell behind Intel's SATA controllers by a considerable margin. We can easily tell that switching to High Performance mode does have an impact on storage performance, as the dashed lines run higher than the solid lines across all three platforms. In researching our new storage test suite, we realized that it was far more important to focus in on the lower Queue Depths, so let's get a precise look at QD=1 from the above results:
Remember that in Latency Percentile plots, we want the 'ramp up' to be as vertical and as far to the left as possible. Starting with Balanced power profile (solid lines), we see all three systems seem to trade blows as the plot progresses. What is happening is that at lower loads (Queue Depths), the CPU is idle while waiting for the requests to complete. The balanced power profile allows CPUs to clock down when idle, so when the kernel interrupts the CPU with a completed IO, it must spin back up to fetch the data, ultimately resulting in the longer latencies seen in the non-HP results above.
Shifting over to the High Performance power profile results (dashed lines), we can see a few points. Z270 (blue) enjoys both greater gains vs. Balanced compared to X99 (grey), and also turns in the quickest latencies overall. X370 (red) does see gains while running at the faster clock speed, but it can't match the lower latencies of the Intel platforms.
A note on Balanced vs. High Performance. The code for my storage test is very lean as it is intended to get the most out of the storage device being tested and not bottleneck the CPU while handling very high IOPS. This means that running *only* this test on an SSD is not going to keep the CPU as 'awake' as it would normally be if it were actively doing something with the data being read. If the user had a task keeping the CPU busy (game engine, video encode, etc) while the storage IOs were taking place, the results should more closely represent our 'HP' (dashed line) figures. On the other hand, an idle system doing simple file copies would look more like the Balanced (solid line) results.
SATA, 128KB Sequential
I've cut off the high end at QD=8 here as all results converged at saturation beyond that point. Large block sequential transfers tend to fill and drain the queue in batches, meaning performance would be an equivalent of sliding back and forth from QD=1 to QD=8 on the plot lines above. While QD=2, 4, and 8 are all similar, QD=1 is what will show you differences caused by added spin-up delays due to the Intel CPUs waking from their lower power state. Note how the dashed lines are all higher than the solid lines.
PCIe (all systems can use additional lanes from CPU, but without bootable RAID support)
- X370: 4x 'spare' lanes for storage, NVMe support
- X99: Lanes can come from chipset or (more likely) direct from CPU
- Z270: 4x 'extra' lanes for M.2 / Optane + up to 3x M.2 NVMe bootable RAID*
* Z270 chipset M.2 RAID capable slots each displace a pair of SATA ports
PCIe, 4KB Random
Moving onto the faster NVMe SSDs, things got a bit more interesting, particularly when power savings came into play. The 6800K appeared to hit some sort of spin-down resonance with our workload right at QD=16, but more troubling was the 1800K, which seemed to hit a ceiling at ~110 kIOPS. This happened even in its high power state. Let us hold off on zooming in on QD=1 for a second and take a closer look at what was happening at that seemingly troublesome QD=16:
Starting from the left, the 7700K (blue) looks great regardless of mode. The 6800K did ok in HP mode but went sideways when allowed to clock down during idle (I even observed fluctuations in task manager / disk IO during this specific QD). Ryzen didn't share the 6800K's idle issue as both red plot lines are very closely matched, but the concern is that those lines should be over to the left with the others instead of running at nearly double the latency per IO.
Getting back to QD=1, we see a picture very similar to the earlier SATA results, with AMD pulling ahead of the idling competitors but falling behind them when all were at speed.
PCIe, 128KB Sequential
Similar to the SATA sequential test, but Ryzen did a bit better this time and the 7700K's aggressive idling penalized it a bit at the lower QD's.
Storage Summary
Overall Ryzen / X370 did well in these tests. There was a troubling plateau past QD=8 with random access on NVMe, but remember that even power users operate at or below QD=8 99.9% of the time. The bigger consumer focus being on the lower Queue Depths, results were very close to X99 and Z270 – very impressive considering those competing parts have enjoyed multiple generations of refinement.
So, did it ever occur to the
So, did it ever occur to the reviewer that the a bit slower performance in some software (games included) is actually due to poor optimizations?
The industry used the last decade or so to specifically optimize for Intel.
Ryzen is fairly new by comparison, but it demonstrated that it got up to 30% increase in performance through simple patches in games.
Audacity and many other software like it are not optimized for Ryzen architecture.
They are taking advantage of every possible trick in Intel’s hand, and yet barely anything or none of it actually benefits Ryzen performance-wise.
Plus, the Infinity fabric in Ryzen is sensitive to RAM speeds.
2400 MhZ speed on RAM is simply inadequate for Ryzen… 3000 MhZ would be better as that would raise it’s performance by about 10%.
Other than RAM speeds, software optimizations are required to take advantage of Ryzen’s capabilities.
Otherwise, you might as well be comparing apples and oranges right now.
It actually shows that Ryzen via ‘brute force’ is highly competitive for all Intel’s products… just imagine what might happen if we get developers to actively support for Ryzen – of course, this will probably require time as devs usually optimize for hardware they are paid to optimize for – and as we know, both Intel and Nvidia have deep pockets to sway devs to support their own hardware specifically and make AMD look bad (when in reality, its anything but).