A very early look at the future of Catalyst
AMD let us spend some time with a very early prototype driver that attempts to implement a software frame pacing algorithm.
Today is a very interesting day for AMD. It marks both the release of the reference design of the Radeon HD 7990 graphics card, a dual-GPU Tahiti behemoth, and the first sample of a change to the CrossFire technology that will improve animation performance across the board. Both stories are incredibly interesting and as it turns out both feed off of each other in a very important way: the HD 7990 depends on CrossFire and CrossFire depends on this driver.
If you already read our review (or any review that is using the FCAT / frame capture system) of the Radeon HD 7990, you likely came away somewhat unimpressed. The combination of a two AMD Tahiti GPUs on a single PCB with 6GB of frame buffer SHOULD have been an incredibly exciting release for us and would likely have become the single fastest graphics card on the planet. That didn't happen though and our results clearly state why that is the case: AMD CrossFire technology has some serious issues with animation smoothness, runt frames and giving users what they are promised.
Our first results using our Frame Rating performance analysis method were shown during the release of the NVIDIA GeForce GTX Titan card in February. Since then we have been in constant talks with the folks at AMD to figure out what was wrong, how they could fix it, and what it would mean to gamers to implement frame metering technology. We followed that story up with several more that showed the current state of performance on the GPU market using Frame Rating that painted CrossFire in a very negative light. Even though we were accused by some outlets of being biased or that AMD wasn't doing anything incorrectly, we stuck by our results and as it turns out, so does AMD.
Today's preview of a very early prototype driver shows that the company is serious about fixing the problems we discovered.
If you are just catching up on the story, you really need some background information. The best place to start is our article published in late March that goes into detail about how game engines work, how our completely new testing methods work and the problems with AMD CrossFire technology very specifically. From that piece:
It will become painfully apparent as we dive through the benchmark results on the following pages, but I feel that addressing the issues that CrossFire and Eyefinity are creating up front will make the results easier to understand. We showed you for the first time in Frame Rating Part 3, AMD CrossFire configurations have a tendency to produce a lot of runt frames, and in many cases nearly perfectly in an alternating pattern. Not only does this mean that frame time variance will be high, but it also tells me that the value of performance gained by of adding a second GPU is completely useless in this case. Obviously the story would become then, “In Battlefield 3, does it even make sense to use a CrossFire configuration?” My answer based on the below graph would be no.
An example of a runt frame in a CrossFire configuration
NVIDIA's solution for getting around this potential problem with SLI was to integrate frame metering, a technology that balances frame presentation to the user and to the game engine in a way that enabled smoother, more consistent frame times and thus smoother animations on the screen. For GeForce cards, frame metering began as a software solution but was actually integrated as a hardware function on the Fermi design, taking some load off of the driver.
Until today, AMD did not integrate any kind of frame metering on multi-GPU solutions and simply rendered frames as quickly as possible when the game engine asked them to. That might seem like the best answer without doing any analysis and that is likely the same conclusion AMD came to. But as it turns out, as we have proven in our various benchmark results and video comparison, that just isn't true. All animations are not created equal.
AMD came to me last week with a prototype driver that integrates a software frame metering or frame pacing technology. What is important here is that AMD is having to rebuild the driver pipeline around this software model and as such it is going to take some time to get it 100% correct. Also, because the company started work on this over a month ago, the base driver version for this prototype driver is something in the 13.2 stack – not the 13.5 used in our Radeon HD 7990 review.
What changes in the new driver? A new algorithm is being implemented that measures frame render times on a continuous basis to determine how long that frame should be displayed on the screen. AMD is calling this measurement the game's "heartbeat" and that information is used to insert a delay into the Present() call return going back to the game. The Present() call is used by the game to know when a frame has been rendered and another is ready to be taken by the GPU for work.
Previously, in GPU-bound instances, AMD was actually sending Present() complete calls at almost the same time, to which the game replied with data that was similarly close together. When both GPUs rendered the frames, they rendered it about the same speed (since the scenes are so similar) and thus they were presented in a nearly completely overlapped way, resulting in the very small slivers of frames shown on the screen: runts. Essentially adding an offset to frames being rendered.
In this diagram, the unmetered display output shows runts because of unevenly paced frames. The metered output adds a little delay but produces a better overall animation.
As the workload changes AMD is able to update the frame delay offset in real time. If frames begin to take longer to render due to a change in scenery, then the driver will add more delay into the next present call in preparation to have balanced frame presentation on the screen. It can seem counter-intuitive to introduce latency into the game engine pipe to make things smoother, but in truth we are oversimplifying the problem in our explanation.
I asked AMD about a "polling time" associated with this new measurement and was told that in fact it was continuous because of its complete integration into the rendering pipeline. This will likely add some CPU overhead in the driver but it would appear pretty minimal compared to the work that a typical GPU driver is handling already.
There is still a lot of work to be done on the prototype driver that AMD is showing here today that includes tweaking the algorithm for individual games and fine tuning of the implementation. But for a first attempt and a very quick turnaround, we are pretty impressed with the results on the following pages.
Many more results on the coming pages…
AMD is still planning on releasing this driver in a beta form in the summer but I wouldn't be surprised to see the schedule moved up a bit with some pressure with the Radeon HD 7990 release and better than expected results thus far. AMD continues to promise the ability to enable and disable this feature in the control panel as well as to enable it on a per-game basis, something that NVIDIA hasn't done yet. There are debates on whether or not there are actually input latency benefits to AMD's current method and we are still finding a way to test that at PC Perspective.
Reports from most users are telling us that you NEED to download these files for a solid comparison!
Crysis 3 – 13.5 beta vs Prototype 2 Comparison
One thing to note: this fix does not yet address Eyefinity + CrossFire problems. The prototype and the current implementation of the fix are only going to address single monitor configurations due to the differences in how the multiple rendered images are composited. Resolutions up to 2560×1600 are handled by a hardware compositor while the 5760×1080 and above Eyefinity resolution use a software implementation that is apparently much more complex (and causes quite a few graphical issues we'll dive into later).
How We Tested
Our testing was done with the exact same setup as our recently published Radeon HD 7990 review. Except this time I have dropped the results from the Radeon HD 7970s in CrossFire in favor of the new HD 7990 results with the Prototype 2 driver. Due to limited time and the fact that the Eyefinity results were unaffected, you are only going to see 2560×1440 results for now!
|Test System Setup|
|CPU||Intel Core i7-3960X Sandy Bridge-E|
|Motherboard||ASUS P9X79 Deluxe|
|Memory||Corsair Dominator DDR3-1600 16GB|
|Hard Drive||OCZ Agility 4 256GB SSD|
AMD Radeon HD 7990 6GB
NVIDIA GeForce GTX TITAN 6GB
NVIDIA GeForce GTX 690 4GB
AMD: 13.5 beta (HD 7990)
AMD: Frame Pacing Prototype 2 (HD 7990)
|Power Supply||Corsair AX1200i|
|Operating System||Windows 8 Pro x64|
What you should be watching for
- HD 7990 13.5 beta vs HD 7990 Prototype 2 – Here's the big question – what changes and by how much? Ideally we want to see more consistent frame times in our Frame Rating system.
- HD 7990 Prototype 2 vs GTX 690 – If the driver works as we were promised, how does it affect the performance compared to the GTX 690?
- HD 7990 Prototype 2 vs GTX Titan – Same here for the Titan!