A start to proper testing
PresentMon is the beginnings of a new tool for testing DX12 titles and even those on the UWP.
During all the commotion last week surrounding the release of a new Ashes of the Singularity DX12 benchmark, Microsoft's launching of the Gears of War Ultimate Edition on the Windows Store and the company's supposed desire to merge Xbox and PC gaming, a constant source of insight for me was one Andrew Lauritzen. Andrew is a graphics guru at Intel and has extensive knowledge of DirectX, rendering, engines, etc. and has always been willing to teach and educate me on areas that crop up. The entire DirectX 12 and Unified Windows Platform was definitely one such instance.
Yesterday morning Andrew pointed me to a GitHub release for a tool called PresentMon, a small sample of code written by a colleague of Andrew's that might be the beginnings of being able to properly monitor performance of DX12 games and even UWP games.
The idea is simple and it's implementation even more simple: PresentMon monitors the Windows event tracing stack for present commands and records data about them to a CSV file. Anyone familiar with the kind of ETW data you can gather will appreciate that PresentMon culls out nearly all of the headache of data gathering by simplifying the results into application name/ID, Present call deltas and a bit more.
Gears of War Ultimate Edition – the debated UWP version
The "Present" method in Windows is what produces a frame and shows it to the user. PresentMon looks at the Windows events running through the system, takes note of when those present commands are received by the OS for any given application, and records the time between them. Because this tool runs at the OS level, it can capture Present data from all kinds of APIs including DX12, DX11, OpenGL, Vulkan and more. It does have limitations though – it is read only so producing an overlay on the game/application being tested isn't possible today. (Or maybe ever in the case of UWP games.)
What PresentMon offers us at this stage is an early look at a Fraps-like performance monitoring tool. In the same way that Fraps was looking for Present commands from Windows and recording them, PresentMon does the same thing, at a very similar point in the rendering pipeline as well. What is important and unique about PresentMon is that it is API independent and useful for all types of games and programs.
PresentMon at work
The first and obvious question for our readers is how this performance monitoring tool compares with Frame Rating, our FCAT-based capture benchmarking platform we have used on GPUs and CPUs for years now. To be honest, it's not the same and should not be considered an analog to it. Frame Rating and capture-based testing looks for smoothness, dropped frames and performance at the display, while Fraps and PresentMon look at performance closer to the OS level, before the graphics driver really gets the final say in things. I am still targeting for universal DX12 Frame Rating testing with exclusive full screen capable applications and expect that to be ready sooner rather than later. However, what PresentMon does give us is at least an early universal look at DX12 performance including games that are locked behind the Windows Store rules.
So let's look at some data provided by PresentMon and how it compares to other tools in the market.
- Intel Core i7-5960X + X99
- NVIDIA GTX 980 Ti (364.00)
- AMD Fury X (16.3)
If you want to use PresentMon for yourself, you can download the source code at GitHub today. You'll need to run it in an elevated command prompt in Windows 10 and the output is clean enough to sort through with Excel.
Ashes of the Singularity (DX12)
This graph shows a 60 second segment of frame times as produced by PresentMon. This test Ashes of the Singularity was set to Vsync off and uncapped frame rendering, showing frame times faster than the 16ms / 60Hz of the display it was connected to. There appears to be quite a bit of variance in the frame times as they are shown, though the average range is in the 60-65 FPS mark. Specific performance of the system aside, this shows us that gathering data of Present commands in DX12 is possible through the PresentMon tool.
Gears of War Ultimate Edition (DX12/UWP)
In a similar vein, this result shows that we can gather frame time data for the Gears of War Ultimate Edition UWP based game, something that was impossible until yesterday! Also note that we have gone with a higher refresh rate display (120Hz) to give this game that caps at the maximum display refresh rate more room to show performance deltas. No frame time ever goes below 8.33ms but the margin between that and 16.6ms gives a much more granular view of performance.
But how does PresentMon compare to other tools? To check on that I ran Fraps, PresentMon and our Frame Rating / FCAT-based system through the DX11 version of Rise of the Tomb Raider.
Rise of the Tomb Raider (DX11) – PresentMon vs. Frame Rating
Here is the same run of the game being compared between the PresentMon data that looks at the Present calls from the OS and our Frame Rating / FCAT / capture-based testing with an overlay, etc. Interestingly, the blue line of Frame Rating shows a much smoother experience with more consistent frame times than the green line from PresentMon. It's obvious that something occurs between the OS present commands and the image being displayed on the screen, something we have posited from the very beginning, and thus the results are somewhat contentious.
Rise of the Tomb Raider (DX11) – PresentMon vs. Fraps
Maybe its not a surprise to anyone then that the data from PresentMon looks very similar to results from Fraps, more or less the consumer standard for performance evaluation. The frame time swings are much larger in both cases though it appears at least in this test run, Fraps results have even wider swings.
Rise of the Tomb Raider (DX11) – PresentMon vs. Fraps vs. Frame Rating
A quick look at all three results overlaid on each other shows the differences in data between capture-based testing and the OS-level Present call data. There are arguments for the value of each data set to be sure, and maybe even the relationship between them on a per-game level, but I can tell you that my "feeling" of how Rise of the Tomb Raider played in this testing relates more with the Frame Rating result than PresentMon or Fraps.
All of that being said, for today, the best and only way to measure frame time performance of UWP apps is with PresentMon or with other tools built off of the open source PresentMon code. Let's look at what Gears of War Ultimate Edition shows when comparing a GTX 980 Ti and a Radeon R9 Fury X.
Gears of War Ultimate Edition – PresentMon
I made the red line representing the Fury X semi-transparent to help with visualizing the results, but otherwise we are looking at PresentMon data as provided and described above. The green team is consistently running at lower frame times (higher frame rates) with the GTX 980 Ti and it shows significantly fewer spikes in frame times than the Fury X. It should be noted that the benchmark and game play of Gears feel SIGNIFICANTLY better with the recently released 16.3 driver from AMD than with 16.2 even though the overall advantage still lies with NVIDIA.
This is just some sample data we have been gathering over the last 24 hours with PresentMon and I am excited to continue playing with the application to measure performance and driver improvements on DX12 and UWP games going forward. That being said, I am hopeful that the community will take the code provided by Andrew and his team to build applications with additional features and perhaps a UI that can improve usability. I know we are working on some early changes and BAT files to work around this application; I expect many others in forums are on the same path.
The Windows Store and UWP
PresentMon is a great tool that gives us a better look at DX12 and UWP applications previously unavailable to us, but we are still dedicated to the capture-based testing that has brought about such significant change in the industry over the last few years. Hopefully I will soon be able to combine results from this application and an updated suite of capture-based tools to really dive into the differences between reported results, getting closer to the holy grail of performance and animation measurement.
I’ll have to test PresentMon
I’ll have to test PresentMon on my exotic hardware.
It looks very interesting.
Gears of War: Ultimate Edition DirectX 12 // 2.0GHz VIA QuadCore E C4650 + 4GB NVIDIA GeForce GTX 960
What possessed you spend
What possessed you spend money on a CPU like that?
Because I like the quality
Because I like the quality stuff (embedded hardware not consumer) which are motherboard with PC-grade components. I am oriented conservative to the founder of Mini-ITX form factor as an alternative to Intel’s and AMD’s chips. I want to stand out from the mainstream. And of course the performance of new generation quad-core procesor, are designed for high performance computing with SIMD up to AVX2, VT CPU virtulization technology, hardware security and long years are delivering industry leading performance per-watt and improved multi-tasking ability without consuming more power.
And that’s enough for me.
I’m intrigued. Would you
I’m intrigued. Would you like to participate in an interview about your PC tastes?
I guess you know that CPU is
I guess you know that CPU is a massive bottleneck to a GTX960 though right?
Single core performance (Passmark) of 527. Compare that to an FX-4300 which is a bottleneck already in plenty of games with a GTX960 and that has a score of 1411.
So under 40% the performance of a CPU that is already a bottleneck seems pretty bad.
(it’s weird you mention “high performance computing”, maybe per-per-watt is good I don’t know, but regardless it’s still a massive bottleneck to any gaming you might do)
Are you kidding me?
Are you kidding me?
You compare AMD FX-4300 3.8GHz @ 4.0GHz (95W TDP, 2x 2MB L2 cache, 4MB L3 cache) with VIA QuadCore C4650 2.0GHz (18W TDP, 2MB L2 cache) ?!?
Yes, 2.0GHz QuadCore CPU is bottleneck but for 128-bit nV GTX 960 PCIe 3.0 x 16 GPU is another big bottleneck and this is only PCIe 2.0 x4
PCI-Express 2.0 x16 @ x4 /2.0 GB/s (20 GT/s)/
• 2 GB/s for 1-way ->
• 4 GB/s for 2-way ->
PCI-Express 3.0 x16 /15.75 GB/s (128 GT/s)/
• 16 GB/s for 1-way ->
• 32 GB/s for 2-way ->
How do you build PresentMon?
How do you build PresentMon? There’s no exe, it’s all source code
you would have to compile
you would have to compile using c++ something like http://www1.cmc.edu/pages/faculty/alee/g++/g++.html