We are up to two…

Rise of the Tomb Raider got a surprise performance update to improve Ryzen CPU frame rates!

UPDATE (5/31/2017): Crystal Dynamics was able to get back to us with a couple of points on the changes that were made with this patch to affect the performance of AMD Ryzen processors.

  1. Rise of the Tomb Raider splits rendering tasks to run on different threads. By tuning the size of those tasks – breaking some up, allowing multicore CPUs to contribute in more cases, and combining some others, to reduce overheads in the scheduler – the game can more efficiently exploit extra threads on the host CPU.
     
  2. An optimization was identified in texture management that improves the combination of AMD CPU and NVIDIA GPU.  Overhead was reduced by packing texture descriptor uploads into larger chunks.

There you have it, a bit more detail on the software changes made to help adapt the game engine to AMD's Ryzen architecture. Not only that, but it does confirm our information that there was slightly MORE to address in the Ryzen+GeForce combinations.

END UPDATE

Despite a couple of growing pains out of the gate, the Ryzen processor launch appears to have been a success for AMD. Both the Ryzen 7 and the Ryzen 5 releases proved to be very competitive with Intel’s dominant CPUs in the market and took significant leads in areas of massive multi-threading and performance per dollar. An area that AMD has struggled in though has been 1080p gaming – performance in those instances on both Ryzen 7 and 5 processors fell behind comparable Intel parts by (sometimes) significant margins.

Our team continues to watch the story to see how AMD and game developers work through the issue. Most recently I posted a look at the memory latency differences between Ryzen and Intel Core processors. As it turns out, the memory latency differences are a significant part of the initial problem for AMD:

Because of this, I think it is fair to claim that some, if not most, of the 1080p gaming performance deficits we have seen with AMD Ryzen processors are a result of this particular memory system intricacy. You can combine memory latency with the thread-to-thread communication issue we discussed previously into one overall system level complication: the Zen memory system behaves differently than anything we have seen prior and it currently suffers in a couple of specific areas because of it.

In that story I detailed our coverage of the Ryzen processor and its gaming performance succinctly:

Our team has done quite a bit of research and testing on this topic. This included a detailed look at the first asserted reason for the performance gap, the Windows 10 scheduler. Our summary there was that the scheduler was working as expected and that minimal difference was seen when moving between different power modes. We also talked directly with AMD to find out its then current stance on the results, backing up our claims on the scheduler and presented a better outlook for gaming going forward. When AMD wanted to test a new custom Windows 10 power profile to help improve performance in some cases, we took part in that too. In late March we saw the first gaming performance update occur courtesy of Ashes of the Singularity: Escalation where an engine update to utilize more threads resulted in as much as 31% average frame increase.

Quick on the heels of the Ryzen 7 release, AMD worked with the developer Oxide on the Ashes of the Singularity: Escalation engine. Through tweaks and optimizations, the game was able to showcase as much as a 30% increase in average frame rate on the integrated benchmark. While this was only a single use case, it does prove that through work with the developers, AMD has the ability to improve the 1080p gaming positioning of Ryzen against Intel.

Fast forward to today and I was surprised to find a new patch for Rise of the Tomb Raider, a game that was actually one of the worst case scenarios for AMD with Ryzen. (Patch #12, v1.0.770.1) The patch notes mention the following:

The following changes are included in this patch

– Fix certain DX12 crashes reported by users on the forums.

– Improve DX12 performance across a variety of hardware, in CPU bound situations. Especially performance on AMD Ryzen CPUs can be significantly improved.

While we expect this patch to be an improvement for everyone, if you do have trouble with this patch and prefer to stay on the old version we made a Beta available on Steam, build 767.2, which can be used to switch back to the previous version.

We will keep monitoring for feedback and will release further patches as it seems required. We always welcome your feedback!

Obviously the data point that stood out for me was the improved DX12 performance “in CPU bound situations. Especially on AMD Ryzen CPUs…”

Remember how the situation appeared in April?

The Ryzen 7 1800X was 24% slower than the Intel Core i7-7700K – a dramatic difference for a processor that should only have been ~8-10% slower in single threaded workloads.

How does this new patch to RoTR affect performance? We tested it on the same Ryzen 7 1800X benchmarks platform from previous testing including the ASUS Crosshair VI Hero motherboard, 16GB DDR4-2400 memory and GeForce GTX 1080 Founders Edition using the 378.78 driver. All testing was done under the DX12 code path.

The Ryzen 7 1800X score jumps from 107 FPS to 126.44 FPS, an increase of 17%! That is a significant boost in performance at 1080p while still running at the Very High image quality preset, indicating that the developer (and likely AMD) were able to find substantial inefficiencies in the engine. For comparison, the 8-core / 16-thread Intel Core i7-6900K only sees a 2.4% increase from this new game revision. This tells us that the changes to the game were specific to Ryzen processors and their design, but that no performance was redacted from the Intel platforms.

This patch thus narrows the gap between the Ryzen 7 1800X and the Core i7-6900K (and I assume the 7700K, etc.). Prior to today’s update the AMD CPU was 22.8% slower than the Intel part; now that gap is around 9%.

I was curious to see if we could find an easily discoverable “reason” for the performance differences on Ryzen, so we ran the same benchmarks on Rise of the Tomb Raider while running Windows Performance Monitor in the background, monitoring per-core CPU utilization. Running the older version of the game (v1.0.767.2) the thread utilization looks like this:

It’s kind of a mess, but rather than focus on any single line in this graph, instead observe the overall pattern. A single thread (orange, core 1) is leveraged heavily throughout the ~100 seconds of testing, while a second thread (yellow, core 15) also stands out from the rest, but slightly less so.

If we distill this down to the average CPU utilization for each core on the Ryzen 7 1800X, we can see the behavior is pronounced. Core 2 and 16 (sorry for Excel’s inability to understand starting at 0) are averaging 80% and 68% utilization respectively.

How does this change with the new patch?

Honestly, the pattern looks similar, with two threads doing more of the work than the others, but it seems to be more balanced.

And in fact, it is. Core 16 averages 71% utilization, Core 3 averages 55% while the rest of the processing cores are centered on ~35%. Looking at the graph above, the other cores skewed closer to 40% utilization and above. If we look at the total average CPU utilization, we found that the new RoTR patch showed LOWER CPU workload than the older version, indicating a more efficient use of processor cores in the game.

Per Segment Details

The Rise of the Tomb Raider benchmark is broken up into three segments: Mountain Peak, Syria, and Geothermal Valley. (Though, confusingly, the log files call them Spine of the Mountain, Prophet’s Tomb, and Geothermal Valley.) We quickly graphed the frame times on the Ryzen 7 1800X in both pre- and post-patch versions of the game.

All three sub-tests show lower frame time variance, indicating a smoother gaming experience across the board. The Geothermal Valley test shows the biggest difference in frame variance while the Mountain Peak test is where the largest raw performance gap is seen, giving Ryzen a 20-30% decrease in frame times.

Another Data Point to Favor AMD

Though our CPU utilization analysis doesn’t point to more efficiency in multi-threading on the Ryzen 7 1800X to affect gaming performance, it does indicate that architecture specifics were adapted to and thus we see a more efficient use of the threads that Rise of the Tomb Raider implements. Regardless of how it does it, Rise of the Tomb Raider becomes the second major example of a game developer working with AMD to improve the state of gaming on the Ryzen processor platform, complementing the previous work done on Ashes of the Singularity: Escalation.

I have heard, though not confirmed by the developer, that the biggest gains with this patch will come on Ryzen + GeForce configurations. If you asked me WHY it might be more heavily weighted to that configuration, I must admit that I don’t yet know, though I have put in questions at all the appropriate locales.

I am obviously not ready to declare the 1080p gaming performance issue with Ryzen “solved” but it is clear that AMD has a mission and direction. The pace may be slow and steady, but I continue to be told internally that it is ramping up. With the pending release of Threadripper processors this summer, AMD would love nothing more than to put to bed this nagging issue that has been the most painful of thorns.