3 Different Performance Measurements
Several odd things happened during our benchmarking and testing process with Civilization: Beyond Earth. First, because this game has Mantle support, our now standard Frame Rating capture-based performance analysis is only partially valuable. We have no way to universally run an overlay on Mantle, required for the capture testing to work. So, instead, we are using capture-based testing on the DX11 results for both the NVIDIA GeForce GTX 980 as well as the Radeon R9 290X, but we have to rely on the Beyond Earth's reported CSV file for frame times and average performance during Mantle testing.
But that wasn't the only concern we had. It seemed that the performance results using the in-game output file and our capture based solution were at odds to some degree. Take a look at the graph below.
Click to Enlarge
What you are seeing is Civilization: Beyond Earth running at 2560×1440 using the Ultra preset and the 8xMSAA setting on the GeForce GTX 980. But the three different results are were all gathered with different reporting methods. The blue line shows the results found from the in-game CSV file generated by Beyond Earth directly. The orange line shows the results from a FRAPS performance file. And the grey line shows the results of our Frame Rating capture-based (using FCAT) system. Clearly there are some major differences that need to be looked at.
Both the in-game test results and the FRAPS test results essentially mirror each other, and that makes sense based on how we know the game and how FRAPS measures frame times. Firaxis has confirmed that it measures frame times at the beginning of the update loop but before engine simulation step. The in-game and FRAPS results show a wide portion of variable frame times starting at around 18s and going until 38s; during this time the game benchmark is in a zoomed out state, at it's most stressful on the CPU and GPU. That high frame time variance is bad news (potentially) as it would mean there is stutter to be found. Keep in mind this is what a SINGLE GTX 980!
The Frame Rating / FCAT results look very different though – the grey line shows a very smooth and consistent set of frame times through that exact same time span. The rest of the benchmark run matches up pretty well (pre-18s and post-38s) with the normal differences expected between FRAPS and capture-based performance testing. Something is very odd about that 18-38s window of reporting – the game engine frame times and FRAPS see results in a VERY different way than what is being shown on the screen (and thus being captured and analyzed by Frame Rating).
Also interestingly, the span of time between 7s and 14s shows some frame time variation, but it does so on all three reporting methods. So how is what is happening there any different than what is happening between 18s and 38s?
I will also note that while the animation and movement is slow in that 18-38s time frame, my experiences watching the game with my own two eyeballs does not appear to be indicative of a gaming environment with large, alternating swings between 12ms and 25ms frame times. In my years of testing GPU hardware with an eye towards frame rate consistency, I believe that would stand out dramatically and immediately.
Why does this come up in this story though? Because the AMD Radeon R9 290X DX11 results do not show any of that performance data difference:
Click to Enlarge
With the R9 290X, all three sets of results are essentially the same, or at least differ in the ways I would expect. Notice that the Frame Rating / FCAT results are still showing a smoother, more well paced experience than the FRAPS or in-game results between 0-19s and from 37s to the end. That 18s to 38s window has some more frame time variance than the rest of the Beyond Earth test run, but the results are consistent across the three frame time sets we recorded.
So what does this mean? Which results are the best, or the most accurate? What does each set of results indicate is happening inside the game engine or inside the GPU driver? The truth I'm not quite sure yet. The game engine results match FRAPS (as I would expect) and clearly the game developer should know how to get accurate results from various hardware. But my eyes tell me that the wild oscillation on the frame times just isn't accurate. But if it isn't accurate, then why would AMD's results match across all three test methods? I just don't know yet. I have questions into NVIDIA, AMD, and Firaxis to attempt to solve it but I can say that my results have been duplicated, so we should have answers very soon.
What I do know is that our Frame Rating and capture methodology has been working for a while now and I have seen more errors and problems fixed and resolved via that performance measurement system than I have even written about on this site. So I trust it, and I will continue to trust it, along with my eyes and what I see on the screen. So, for our performance testing with Civilization: Beyond Earth I will be including Frame Rating based results for both the GTX 980 and the R9 290X under DX11.
Hopefully, very soon, we'll get AMD to help us build the necessary overlay for Mantle so that we no longer have to have a separation of performance data – some results from capture and some from specific in-game log files. In order to truly validate a smooth and seamless gaming experience, we need capture testing to work across the board.