Timings, timings, timings
My next step was to increase memory timings on the Intel Core i7-7700K processor in order to raise the memory latency timings and then measure performance before and after.
I took the same Corsair DDR4 memory to 20-20-20-36-3T and re-ran our SiSoft Sandra and IMLC tests.
The increases aren’t massive, but I found that full random results were 13.8% slower, in-page results were 15.8% slower and sequential results were 7.5% slower. The Intel memory tool reported memory latency that was 13.9% slower.
Next I ran the same tests shown in our vTune measurements above several times, averaged results and normalized them to the slower memory speeds in order to see what impact the added memory latency showed on each workload.
The results are incredibly compelling. In all our general applications, excepting WinRAR again, we see less than 3% advantage despite having a ~13-15% lower memory latency. WinRAR was our exception in vTune on its memory latency sensitivity and that shows itself here as a 13% improvement going from the slower timings to the faster, in-line with the synthetic latency improvements. But all of the other non-gaming applications have memory access patterns that are well cached or easily prefetched.
The gaming tests show a more variable set of results, but clearly show advantages in the Core i7-7700K with tighter memory timings. Average frame rates increased by 5-15% apart from Deus Ex: Human Revolution and Ghost Recon Wildlands, which we will discuss later. One thing we did not see is a correlation between the percentage of presumed memory latency dependency based on the percentages shown in vTune and the results above with before/after memory timing results. In truth, gaming workloads are incredibly complex and it is difficult to narrow it down to any single attribute affecting performance. For example, I would point to the extremely high threading efficiency of Ashes of the Singularity as an example of application that is capable of hiding memory latency.
Ashes of the Singularity
Grand Theft Auto V cannot make that claim, with a thread level efficiency weighing heavily in the 3-4 range.
Grand Theft Auto V
Two games stand out from the graph above as having very little impact from the raised memory latency of our testing. Deus Ex and Ghost Recon Wildlands show just 1-2% change. What makes them, and the larger outliers like Hitman, so different? While I can’t explain the why in this case, we can point to another data point that backs up our memory latency assertion of this story.
This graph shows the same data as before, but adds in results from the Ryzen 7 1800X processor. I have also normalized to the faster Intel platform configuration. As you can see, in both Deus Ex: Human Revolution and Ghost Recon Wildlands, the performance delta between the default (faster) Intel 7700K configuration and the Ryzen 7 1800X is small. By comparison, games like GTA V, Hitman and Rise of the Tomb Raider show much wider gaps in 1080p gaming performance.
Far Cry Primal is an interesting data point that shows a massive gap between the Ryzen and Intel processors. It is a glaring example that shows we don’t know everything about these workloads or the impact of the memory system (latency or otherwise).
Ashes of the Singularity: Escalation is a unique example in our sample set that indicates the Ryzen 7 1800X result is faster than the artificially-slowed Intel Core i7-7700K (outside of margin of error differences like we see in Deus Ex). As this is the application that AMD placed on a pedestal as being “optimized for Ryzen”, it would make sense that the higher thread utilization is in fact able to hide any inherent memory latency disadvantage Ryzen holds compared to Kaby Lake.
If you are wondering why I did not include the performance comparisons between Intel and AMD configurations here, the core and thread count difference made it more difficult to make reliable conclusions around the import of the before/after deltas. That discussion dives more into SMT implementation and efficiency.
Another Interesting Data Point in a Complex Discussion
That is a lot of testing, profiling and wordsmithing to come to what conclusion?
By using Intel’s vTune application profiler, I am confident in saying that PC gaming workloads are more sensitive to memory latency that most other non-gaming workloads. If that seems a little broad, we can narrow it to say gaming is more sensitive than most of the non-gaming workloads that are utilized by reviewers and analysts to inspect, measure and gauge the performance of a processor and platform. That is an important conclusion, even if it might seem obvious in retrospect. Workloads like Handbrake, CineBench, and Blender are streaming applications, meaning there is very little thread to thread variance on the memory read patterns. Games tend to have a threading pattern that lends itself to “core hopping” and a more random access pattern. The net result is more dependency on the memory latency of a platform.
Because of this, I think it is fair to claim that some, if not most, of the 1080p gaming performance deficits we have seen with AMD Ryzen processors are a result of this particular memory system intricacy. You can combine memory latency with the thread-to-thread communication issue we discussed previously into one overall system level complication: the Zen memory system behaves differently than anything we have seen prior and it currently suffers in a couple of specific areas because of it.
Based on our previous work and testing, we see some alleviation of this problem by increasing memory frequency on Ryzen. I quickly ran the same synthetic memory tests on the 1800X at 2933 MHz to see what kind decrease that gets us.
Increasing the memory frequency results in an 11% faster full random test, a 10.2% faster in-page test, and an 11.5% faster sequential test. The IMLC app shows around a 9% advantage. However, that still leaves Ryzen at 3.2x disadvantage over the faster settings on the Intel Core i7-7700K, still at 2400 MHz, on the in-page result and 39% slower in the full random result. Increasing memory speeds on Ryzen definitely help AMD but Intel still has an edge for the foreseeable future.
Another avenue that can help remove the performance gap between Ryzen and Intel CPUs are more highly threaded and thread-aware game engines. Ashes of the Singularity is going to be the poster child for this going forward but I hope that Intel is working with the other major vendors (UE, Unity) to implement similar changes. The ability to thread your game gives it the ability to adapt to slightly higher memory latency without adversely affecting performance; effectively “hiding” it. But in truth this is a significant ask for developers that are already strapped for time and resources. As we have seen NVIDIA resort to, AMD will likely need to seed engineers on-location at these development houses to put Ryzen and the methods necessary to help it perform at its peak at the forefront of developer’s minds.
Obviously one way to remove the dependence on CPU memory latency is the raise the resolution and image quality settings of the games in question. Though this does not remove the memory latency sensitivity of the game workload itself, move the bottleneck more towards the GPU gives the CPU and the game threads more time to wait on memory accesses, effectively “hiding” the latency. How effective resolution increases are in removing the Intel/Ryzen performance gap is going to vary based on the specific engine and workload, but I have seen instances swinging in both directions in our testing thus far.
Narrowing down the issues on Ryzen also leaves me wondering what other workloads might also be impacted. I found one such example in WinRAR, a compression tool that is widely used and just so happens to have a strong dependence on memory latency. While the team is still exploring, it is possible that AMD will want to address this concern with the coming Naples platform launch and the enterprise workloads that may or may not behave very different than the consumer testing we focused on today. AMD needs its push into servers to be a success and any red flags there are going to be just as important in the consumer space.
Even though we can’t make 100% assurances that our testing has solved the “Ryzen 1080p dilemma”, I feel confident we have made some significant strides. Memory latency is clearly an important factor for the current state of gaming on Ryzen, even if it mainly exposed at lower resolutions. How AMD is able to work around it, through future architecture revisions and with game and application development initiatives will be judged going forward.