We got to spend some time with the new Fable Legends benchmark courtesy of Microsoft to continue our look at DX12 performance.
When approached a couple of weeks ago by Microsoft with the opportunity to take an early look at an upcoming performance benchmark built on a DX12 game pending release later this year, I of course was excited for the opportunity. Our adventure into the world of DirectX 12 and performance evaluation started with the 3DMark API Overhead Feature Test back in March and was followed by the release of the Ashes of the Singularity performance test in mid-August. Both of these tests were pinpointing one particular aspect of the DX12 API – the ability to improve CPU throughput and efficiency with higher draw call counts and thus enabling higher frame rates on existing GPUs.
This game and benchmark are beautiful…
Today we dive into the world of Fable Legends, an upcoming free to play based on the world of Albion. This title will be released on the Xbox One and for Windows 10 PCs and it will require the use of DX12. Though scheduled for release in Q4 of this year, Microsoft and Lionhead Studios allowed us early access to a specific performance test using the UE4 engine and the world of Fable Legends. UPDATE: It turns out that the game will have a fall-back DX11 mode that will be enabled if the game detects a GPU incapable of running DX12.
This benchmark focuses more on the GPU side of DirectX 12 – on improved rendering techniques and visual quality rather than on the CPU scaling aspects that made Ashes of the Singularity stand out from other graphics tests we have utilized. Fable Legends is more representative of what we expect to see with the release of AAA games using DX12. Let's dive into the test and our results!
Fable Legends is a gorgeous looking game based on the benchmark we have here in-house thanks in some part the modifications that the Lionhead Studios team has made to the UE4 DX12 implementation. The game takes advantage of Asynchronous Compute Shaders, manual resource barrier tracking and explicit memory management to help achieve maximum performance across a wide range of CPU and GPU hardware.
One of the biggest improvements found in DX12 is with CPU efficiency and utilization though Microsoft believes that Fable Legends takes a more common approach to development. During my briefings with the team I asked MS specifically about what its expectations were for CPU versus GPU boundedness with this benchmark and with the game upon final release.
One of the key benefits of DirectX 12 is that it provides benefits to a wide variety of games constructed in different ways. Games such as Ashes were designed to showcase extremely high numbers of objects on the screen (and correspondingly exceedingly high draw calls). These are highly CPU bound and receive large FPS improvement from the massive reduction in CPU overhead and multi-threading, especially in the most demanding parts of the scene and with high-end hardware.
Fable Legends pushes the envelope of what is possible in graphics rendering. It is also particularly representative of most modern AAA titles in that performance typically scales with the power of the GPU. The CPU overhead in these games is typically less of a factor, and, because the rendering in the benchmark is multithreaded, it should scale reasonably well with the number of cores available. On a decent CPU with 4-8 cores @ ~3.5GHz, we expect you to be GPU-bound even on a high-end GPU.
That's interesting – Fable Legends (and I agree most popular PC titles) will see more advantages from the GPU feature and performance improvements in DX12 than the CPU-limited instances that Ashes of the Singularity touch on. Because this benchmark would essentially be maxed out in the CPU performance department by a mainstream enthusiast class processor (even a Core i7-3770K, for example) the main emphasis is on how the GPUs perform.
With this feedback, I decided that rather than run tests on 5+ processors and platforms as we did for Ashes of the Singularity, I would instead focus on the GPU debate, bringing in eight different graphics cards from all price ranges on a decently high end CPU, the Core i7-6700K.
The Fable Legends benchmark is both surprisingly robust in the data it provides and also very limited in the configurability the press was given. The test could only be run in one of three different configurations:
To simplify comparing across hardware classes, we’ve pre-selected three settings tiers (Ultra @ 4K, Ultra @ 1080p, Low @ 720p) for the benchmark. The game itself allows much more finer-grained settings adjustment to enable the game to playable on the largest set of hardware possible.
I wasn't able to run a 2560×1440 test and I wasn't able to find a way to turn off or enable specific features to get finer grain results on what effects AFFECT GPUs in different ways. I'm sure we'll have more flexibility once the game goes live with a public beta later in the fall.
Running the test is built to be dead simple and idiot proof: run a .bat file and then click start. You are then presented with 3939 frames of scenery that look, in a word, stunning. Check out the video of the benchmark below.
The benchmark runs at a fixed time step so the number of frames does not differ from GPU to GPU or resolution to resolution. Instead, the amount of time it takes the test to run will change based on the performance of the system is it running on. Takes me back to the days of the Quake III timedemo… Microsoft claims that writing the test in this manner helps to reduce variability so that the game is always rendering the exact same frames and data sets.
Results are provided in both simple and complex ways depending on the amount of detail you want to look at.
At the conclusion of the benchmark you'll be greeted by this screen with a Combined Score that can be directly compared to other graphics card and systems when run at the same resolution and settings combination. That score is simply the average frame rate multiplied by 100, so this screenshot represents a run that came back at 27.95 average FPS over the entire test.
The GPU timings breakdown is interesting though: it provides six buckets of time (averaged in milliseconds throughout the whole test) that represent the amount of time spent in each category of rendering work.
- GBuffer Rendering is the time to render the main materials of the scene. (UE4 is a deferred renderer, so all the material properties get rendered out to separate render targets at the start, and then lighting happens in separate passes after that.)
- Dynamic lighting is the cost of all the shadow mapping and direct lights.
- Dynamic GI is the cost of our dynamic LPV-based global illumination (see http://www.lionhead.com/blog/
2014/april/17/dynamic-global- illumination-in-fable-legends/ ). Much of this work runs with multi-engine, which reduces the cost.
- Compute shader simulation and culling is the cost of our foliage physics sim, collision and also per-instance culling, all of which run on the GPU. Again, this work runs asynchronously on supporting hardware.
- Transparency is alpha-blended materials in the scene, which are generally effects. We light these dynamically using Forward Plus rendering.
- Other is untracked GPU work. It represents the difference between the GPU total time and the sum of the tracked categories above.
It will be interesting see how this breakdown favors NVIDIA or AMD for different workloads.
For those of us that want even more, a CSV is created for each test run that goes into extraordinary detail of timings on a per-frame basis. I'm talking crazy detail here.
Click to Enlarge…
That's less than HALF the columns of information provided! Everything from frame time to GPU thread time to GPU time spent rendering fog is in here and it honestly warrants more attention than I am able to spend on it for this story. Once the game is released and we have access to the full version of the game (and hopefully still this kind of benchmark detail) we can dive more into how the CPU and GPU threads cooperate.
As I mentioned above, my focus for Fable Legends lies with the GPU performance rather than scaling capability across CPUs and platforms. Also, because this test can ONLY run on DirectX 12, rather than both DX11 and DX12, it's not possible for me to demonstrate vendor to vendor scaling from one API to another.
- Intel Core i7-6700K (Skylake, 4-core)
- Graphics Cards
- NVIDIA GeForce GTX 980 Ti
- NVIDIA GeForce GTX 980
- NVIDIA GeForce GTX 960
- NVIDIA GeForce GTX 950
- AMD Radeon R9 Fury X
- AMD Radeon R9 390X
- AMD Radeon R9 380
- AMD Radeon R7 370
- NVIDIA: 355.82
- AMD: 15.201.1151.1002B2