VR Performance Evaluation
With some help from NVIDIA, VR evaluation just got a whole lot more simple.
Even though virtual reality hasn’t taken off with the momentum that many in the industry had expected on the heels of the HTC Vive and Oculus Rift launches last year, it remains one of the fastest growing aspects of PC hardware. More importantly for many, VR is also one of the key inflection points for performance moving forward; it requires more hardware, scalability, and innovation than any other sub-category including 4K gaming. As such, NVIDIA, AMD, and even Intel continue to push the performance benefits of their own hardware and technology.
Measuring and validating those claims has proven to be a difficult task. Tools that we used in the era of standard PC gaming just don’t apply. Fraps is a well-known and well-understood tool for measuring frame rates and frame times utilized by countless reviewers and enthusiasts. But Fraps lacked the ability to tell the complete story of gaming performance and experience. NVIDIA introduced FCAT and we introduced Frame Rating back in 2013 to expand the capabilities that reviewers and consumers had access to. Using more sophisticated technique that includes direct capture of the graphics card output in uncompressed form, a software-based overlay applied to each frame being rendered, and post-process analyzation of that data, we were able to communicate the smoothness of a gaming experience, better articulating it to help gamers make purchasing decisions.
VR pipeline when everything is working well.
For VR though, those same tools just don’t cut it. Fraps is a non-starter as it measures frame rendering from the GPU point of view and completely misses the interaction between the graphics system and the VR runtime environment (OpenVR for Steam/Vive and OVR for Oculus). Because the rendering pipeline is drastically changed in the current VR integrations, what Fraps measures is completely different than the experience the user actually gets in the headset. Previous FCAT and Frame Rating methods were still viable but the tools and capture technology needed to be updated. The hardware capture products we used since 2013 were limited in their maximum bandwidth and the overlay software did not have the ability to “latch in” to VR-based games. Not only that but measuring frame drops, time warps, space warps and reprojections would be a significant hurdle without further development.
VR pipeline with a frame miss.
NVIDIA decided to undertake the task of rebuilding FCAT to work with VR. And while obviously the company is hoping that it will prove its claims of performance benefits for VR gaming, it should not be overlooked the investment in time and money spent on a project that is to be open sourced and free available to the media and the public.
NVIDIA FCAT VR is comprised of two different applications. The FCAT VR Capture tool runs on the PC being evaluated and has a similar appearance to other performance and timing capture utilities. It uses data from Oculus Event Tracing as a part of the Windows ETW and SteamVR’s performance API, along with NVIDIA driver stats when used on NVIDIA hardware to generate performance data. It will and does work perfectly well on any GPU vendor’s hardware though with the access to the VR vendor specific timing results.
The second part of the tool is in the processing of the results. NVIDIA built the FCAT VR Analyzer tool to import data that is output from the capture software and show frametime, dropped frames, warped frames, synthesized frames (space warp), reprojection, etc. By creating charts based on these data points a technical analysis of the performance can be created and reasonable and accurate assertions about the user experience can be compiled.
Compared to the previous FCAT software, which required a dedicated hardware capture system, the new FCAT VR can be run in a software-only environment. This should help it gain considerably more traction with the online media as well as prove to be useful for end users that thrive on performance comparisons and online discussions. (Be prepared for the graphs to show up a lot on your favorite forum!)
I’ve been helping test and improve FCAT VR for a number of months now, and one of the key areas that I pushed NVIDIA was on hardware capture validation of the software-based performance metrics. It was a surprisingly difficult task – splitting and capturing the video going to a VR headset was an unknown, creating the overlays required for post-processing analyzing was complex due the API variances, and accurately portraying minute performance differences in an always-on-VSync environment is tricky. But we were able to do it, and in doing so independently validate the results that FCAT VR shows and match it with end-user experiences.
Though I don’t imagine many people will be looking to duplicate the methods, the process is complex and sensitive to specific hardware.
- The HDMI output from the graphics card to the VR headset is divided with a high bandwidth HDMI splitter with one output going to the headset itself and the other to a Datapath capture card.
- The capture card duplicates the EDID of the headset in order to trick the system into giving us full resolution, full refresh rate video.
- We use VirtualDub (or another compliant capture tool with low overhead) to capture either full resolution video (2160×1200) or cropped video.
- An overlay runs on the system being tested that shows two varying color segments – one that is inserted during game engine render, and one that is inserted after final warp and shifts from the VR API.
- That captured video is post-process analyzed by a custom extractor tool to look for skipped colors in the patterns, multiple frames with the same color, etc. (This process is basically identical to previous FCAT.)
- A data file is generated that can be imported and used in the same FCAT VR Analyzer software to be compared.
That process is significantly more time consuming than the software-only methods, and in truth it provides less information due to the Vsync-enabled nature of VR gaming.
FCAT VR Analyzer allows us to easily shift and crop the data sets in the tool to better align captures. This is useful for GPU to GPU comparisons but is necessary to compare two captures from the same run. I was able to use both the software and hardware capture at the same time and look at the results from nearly 100 different benchmark tests.
In the top graph of this image, the brighter green line represents the hardware captured data results while the darker green line is the software based data. By properly aligning the peaks of the data sets, we can determine that the two results match up. The key here is that when the frame render time crosses ~11ms (or in this case dips below), we go from running at 90 FPS to running at 45 FPS which shows a very dramatic spike on the hardware captured data. While accurate and useful, the hardware capture doesn’t tell us the whole story and makes it hard to see how the performance of the GTX hardware is behaving. From what we see in the software-based data, the render time is well maintained at under 20ms, but above 11ms level for smooth 90 FPS gaming. When the frame time drops to sub-11ms, we see the frame rate hit 90 FPS and then eventually cycle back to 45 FPS as we move around in the game area.
Which brings us to the bottom two graphs presented by the FCAT VR Analyzer tool. These are individual and more detailed looks at each data set, attempting to show the gaming experience of each by quantifying the impact on dropped frames and warped frames. The graph looks at within the previous second, how many of the frames shown in the headset were “real” frames as rendered and expected by the game, how many were synthesized (Oculus ASW) and how many were dropped. The scale on the left is 0-90, representing the 90 frames available to be rendered each second. The amount of yellow or red in the graph at each time period tells you how many of those frames were dropped, or had to be warped by the VR APIs.
Beyond just the graphs, the tool allows us get access to a host of information like number of frames rendered, the delivered frame rate and my favorite, the unconstrained frame rate. While the delivered FPS generated metric tells you the frame rate as presented to the user with the 90 FPS Vsync lock in place, unconstrained tells us what the frame rate WOULD BE if the limits of vertical sync were not in place. While this is not directly attributable to any performance on current VR systems, it does tell us what the headroom of a GPU is for higher quality settings, higher refresh screens or future VR hardware and allows us to make interesting comparisons between hardware configurations.
The tool has grown quite a bit over the last few months and now has the ability to import large amounts of source data files and filter, helping users with massive amounts of benchmarks and comparisons to build (like us). You can adjust colors, set regions, zoom and export the graphs all from within the tool. We still have some requests in for further customization but the root tool works amazingly well for the timeline it was developed on.
While I am not permitted to share any of our extensive data set at this point, that is coming soon. I think you’ll be surprised and impressed by the granularity that we can now dive into with VR performance analysis, something that was sorely missing from coverage across the world of GPUs. We’ll dive more into which hardware performs best and in which scenarios in March, so stay tuned.