Why things are different in VR performance testing
We are starting down the rabbit hole of performance evaluation of VR gaming.
It has been an interesting past several weeks and I find myself in an interesting spot. Clearly, and without a shred of doubt, virtual reality, more than any other gaming platform that has come before it, needs an accurate measure of performance and experience. With traditional PC gaming, if you dropped a couple of frames, or saw a slightly out of sync animation, you might notice and get annoyed. But in VR, with a head-mounted display just inches from your face taking up your entire field of view, a hitch in frame or a stutter in motion can completely ruin the immersive experience that the game developer is aiming to provide. Even worse, it could cause dizziness, nausea and define your VR experience negatively, likely killing the excitement of the platform.
My conundrum, and the one that I think most of our industry rests in, is that we don’t yet have the tools and ability to properly quantify the performance of VR. In a market and a platform that so desperately needs to get this RIGHT, we are at a point where we are just trying to get it AT ALL. I have read and seen some other glances at performance of VR headsets like the Oculus Rift and the HTC Vive released today, but honest all are missing the mark at some level. Using tools built for traditional PC gaming environments just doesn’t work, and experiential reviews talk about what the gamer can expect to “feel” but lack the data and analysis to back it up and to help point the industry in the right direction to improve in the long run.
With final hardware from both Oculus and HTC / Valve in my hands for the last three weeks, I have, with the help of Ken and Allyn, been diving into the important question of HOW do we properly test VR? I will be upfront: we don’t have a final answer yet. But we have a direction. And we have some interesting results to show you that should prove we are on the right track. But we’ll need help from the likes of Valve, Oculus, AMD, NVIDIA, Intel and Microsoft to get it right. Based on a lot of discussion I’ve had in just the last 2-3 days, I think we are moving in the correct direction.
Why things are different in VR performance testing
So why don’t our existing tools work for testing performance in VR? Things like Fraps, Frame Rating and FCAT have revolutionized performance evaluation for PCs – so why not VR? The short answer is that the gaming pipeline changes in VR with the introduction of two new SDKs: Oculus and OpenVR.
Though both have differences, the key is that they are intercepting the draw ability from the GPU to the screen. When you attach an Oculus Rift or an HTC Vive to your PC it does not show up as a display in your system; this is a change from the first developer kits from Oculus years ago. Now they are driven by what’s known as “direct mode.” This mode offers improved user experiences and the ability for the Oculus an OpenVR systems to help with quite a bit of functionality for game developers. It also means there are actions being taken on the rendered frames after we can last monitor them. At least for today.
Think of it simply. In traditional PC gaming, a game sends commands to DirectX (or any API) that then communicates with the GPU that then works with Windows to determine when to flip to a new frame on the monitor. With VR gaming, the game sends commands through DirectX and the VR API, and it’s up to the Oculus / OpenVR SDK to determine if and when that frame should be sent directly to the HMD. The VR software stack helps to do time warps, image shifting and last second position updates. Without tools that can get relevant information from these SDKs and APIs, much of what we can record and report stops at the game engines’ Present() commands.
While part of our process down below is using PresentMon to get a data point, we need to work with Valve and Oculus to open up some more access to key timers and counters to bring VR performance evaluation up to the same echelon as standard PC gaming.
Our Current VR Performance Testing Process – A Work in Progress
Let’s walk step by step through how we are starting to measure VR gaming performance.
- Use PresentMon to measure Present() calls from OpenVR SDK to the viewport on the companion monitor.
- Report TotalGPURenderMs time from the SteamVR performance tools.
- Compare above results to HMD user experience and try to match up shifts in frame rate, if any.
First, it’s important to know what tools like PresentMon and Fraps are measuring. Because the game engine actually submits the frame to the VR SDK, through a function like “ovr_SubmitFrame”, and the VR system handles the DX integration and display output. What we do use PresentMon to measure is the Present() command through DXGI for the SDK to draw out to the companion monitor. With all the VR games we have tested today, a subset of the image being shown to the user in the HMD is shown on the monitor as well. This is useful for other people in the room to watch, for troubleshooting, and in our case, to try and balance performance measurement.
I think it is VERY IMPORTANT to reiterate that PresentMon is not measuring the submission of frames to the VR SDKs. Nothing we have available to us today will do that unless the game itself decides to log that for us (and we are pushing for it). But, with some precise system setup, we think we have found at least one example where the output on the screen in the viewport (companion monitor) can, in fact, match the experience of the HMD.
Our second data point comes from the SteamVR tool kit. If you dive into the options and look for a performance category you will locate a section that allows you copy logs of performance heuristics. These logs offer a LOT of data, with two different ways of viewing them.
I won’t try to go into detail on what is being presented here today, but take a look at the negative vales you get for items like “WaitGetPoses Called” and “New Poses Ready”. Those refer to the OpenVR platform receiving new readings on the location of the headset and controller in space, before handing that data off to the game to begin compiling its next frame.
The one I was most interested in was Total GPU which tells us, in milliseconds, the total amount of time the GPU worked to render the current frame. While there are other processes, both on the GPU and on the CPU, GPU render time is often going to be the bottleneck for performance and makes up the majority of the process non-idle time for more graphically intensive games.
Combining these two data points shows strong correlation that I’ll detail below. But looking at one without the other proves to be incomplete.
The game we selected for this initial testing process is The Gallery – Ep. 1: Call of the Starseed. I asked Ben over at Polygon, who has seen more VR games than likely anyone else, which game appeared to his eye to be the most graphically intensive for the HTC Vive – he pointed to The Gallery. It didn’t disappoint.
Developer Cloudhead games helped me get some early access to the title and walked me through some of the performance decisions they made for VR. This is also one of the few games at launch that is giving gamers the flexibility they demand in terms of in-game image quality settings. You have options for Low, Medium and High presets with more available for users wanting to select AA options, effective resolution and more.
The game is beautiful and easily the most impressive title I have seen in a VR headset to date. In the short time I had with it, the adventure style game play is great and the traversal method makes perfect sense for room-scale gaming.
Seems like a proper test rig
Seems like a proper test rig for VR will have to include a head on a stick with fast cameras for eyes and microphones in the ears. Then the “stick” will need six axis motion control, three for neck and at least three for torso, to provide repeatable movements to test tracking and that timewarp you described. And then you’ll need a pair of robot hands for games that track the controllers as well. Otherwise you risk arguments over whether Reviewer L moves faster then Reviewer P, or can’t bend over enough to demonstrate the glitch that others have reported.
Maybe NASA has a spare Robonaut to loan you.
Ryan – will any of the
Ryan – will any of the headsets ‘fall back’ to 75 hz (like the Oculus DK2) before switching to a Vsync of 1/2 (and i’m assuming applying something like Async Timewarp) as your first couple of graphs show?
This is a great beginning.. looking forward to hearing about how USB controllers and hubs affect latency as well in the future..
Okay, when does the
Okay, when does the GSYNC/FreeSync Vive come out?
The Vive will do the
The Vive will do the half-framerate drop. The Rift instead keeps timewarp running ALL the time, without changing the framerate. This means that EVERY frame gets warped before display, so every frame has the lowest possible orientation latency (effectively the hardware latency). If a frame does not complete rendering in time, the previous frame is warped in its place. By not messing with the rendered framerate, you avoid some odd bugs some developers are experiencing with SteamVR (certain rendertime values will suddenly change when they should otherwise be static, for example).
Keep up the good work pcper,
Keep up the good work pcper, we need you guys to keep them honest! Very interesting stuff this.
Agree keep on this as I
Agree keep on this as I havent come across any other techsite that is doing this deep analysis!
Awesome work Ryan & Allen.
Awesome work Ryan & Allen. Thanks for the great info. Keep the VR benchmarks coming.
I remember during the
I remember during the kickstarter, that Oculus had a “latency” module that you can place within the socket of one of the lenses. Unsure if it is available for the newer version. And I think games would have to implement that feature, where as the unit is more for developers….
And unsure how useful would the latency results would be…
Glad it was you guys that
Glad it was you guys that released a methodology to “benchmarking” VR. When you try this out later could you include dual gpu configurations would that be measurable in the current methodology?
Are you planning to discuss with the developers to have a dev/reviewer mode for the SDK to allow you to benchmark ideally and easily as possible?
Very interested to see how
Very interested to see how sli/crossfire play in, hope to see some benchmarks of some older cards especially ones just outside of the minimum requirements (r9 280x/gtx 960, etc)
Keep up the great work!