The New Cayman Architecture
The new Cayman architecture from AMD is the first big change to the GPU in some time introducing a new VLIW4 design as well as improved tessellation and AA performance. But can the HD 6970 and HD 6950 stand up to the power of the GF110-based GTX 580 and GTX 570 cards and how does PowerTune technology get into the mix?
IntroductionIn October AMD released a refresh of the Radeon brand called the HD 6800 series built around the Barts GPU – a slightly modified version of the Cypress GPU used Radeon HD 5000 series last year. The reviews of the Radeon HD 6870 and HD 6850 were positive but not glowing simply because the markets they addressed were crowded with other worthy options from their own previous releases as well newer cards from NVIDIA. At the time of our initial briefing about Barts we also started to hear about Cayman, the true next-generation architecture that would be the successor to the Cypress design.
After a couple of small delays, Cayman is finally here as the Radeon HD 6900-series of graphics cards and represents a modestly dramatic shift in GPU computing architecture compared to previous cards. The results are likely not exactly what AMD had expect, thanks to some issues beyond its control, but the Radeon HD 6970 and HD 6950 update AMD’s lineup of graphics cards for the holidays and continue the recently-renewed march-step of competition between two bitter rivals.
A Somewhat Completely New Architecture
Cayman originally wasn’t supposed to be. Built on the 40nm process technology and based heavily on the designs used in the Evergreen architectures, AMD’s leadership had hoped that this winter we would be talking about a much different GPU; one built on a 32nm process and one built from the ground up to be very different. But due to process limitations at TSMC, the years-ago promised process node was no longer going to be available and AMD reverted to a backup plan and focused all of its attention on it. The result is the birth of the Cayman design we are seeing today that combines some of the changes that AMD had wanted to integrate at the 32nm level but with much of it reserved for another time.
While interesting from a politics perspective, what really matters to gamers is the results we are seeing today and what products are going to be available on the shelf as you read this.
With the release of the HD 6900 series of cards only a single HD 5000-series option remains on the playing field for the official lineup. The dual-GPU HD 5970 stakes a claim as the fastest single graphics card in the market though we will likely see a dual-GPU variant of the 6000-series sooner rather than later. Also note that AMD has mysteriously left the GTX 470 out of this comparison on the NVIDIA side though we will include it in our benchmarks on the many pages of this review.
AMD had some lofty design goals when coming into the follow up to Evergreen. Whether or not they were able to meet them with the relatively sudden shift in technology has yet to be seen but the primary focus here was on efficiency; both in terms of the architecture and performance per area and in terms of power with performance per watt. AMD also needed to increase their geometry performance as it was the one area where NVIDIA had a dominant performance lead. New image quality features with enhanced AA were also included on the HD 6900 series feature list.
The design of the Cayman architecture should look very familiar to you at a high level when compared to either Barts or Cypress. Cayman will include a set of 24 SIMD engines each housing 64 ALUs for a total shader count of 1536 on the full die. You might notice that this is 64 fewer shader processors than the Cypress GPUs from the HD 5870 that was listed at 1600 SPs – that is due to the biggest shift in this architecture: a move to a VLIW4 design. Cayman also includes dual graphics engines that essentially will double the geometry performance as well upgrade ROPs for AA performance enhancements and a faster GDDR5 memory bus.
The new core design on the HD 6900-series of GPUs moves from a complex 5-instruction design to a simplified 4-instruction design. The previous architecture included 4 simple ALUs and 1 complex “T” unit that could perform other functions as well. The new VLIW design essential replicates the “T” unit across all four stream processors making the design easier to schedule and manage while improving the overall performance / mm^2 by about 10% according to AMD’s numbers.
ROP improvements include double speed on 16-bit integer ops and an even more dramatic gain in 32-bit ops.
Attempting to address the GPU computing arena, AMD also spent some time update the compute engine with features like asynchronous dispatch that allows for multiple compute kernels to spawn threads at the same time independently. Double precision operations get an automatic boost in performance here as well since the rate moves from 1/5 of the single precision numbers in the previous design to a 1/4 scale thanks to the VLIW4 design.
As I noted above, with dual graphics setup engines the Cayman architecture essentially is more than doubling the tessellation performance of the previous generation. This is done by simply duplicating the graphics engine and also adding the off-chip buffer support.
AMD targeted what they are calling the sweet spot of tessellation usage in games for performance improvement. In the graphic above you can see that AMD decided that the most frequency used tessellation factors (used to calculate the amount of tessellation done at any particular LOD) range from 6-11 and we see a spike of nearly 3x performance there. The performance gains that AMD notes here are impressive if tested to be true.
Introduced with the Barts architecture, the Cayman design also sees the inclusion of Enhanced Quality AA (EQAA) with new modes that allow up to 16 coverage samples per pixel but with very little performance hit over standard MSAA levels. This method will also introduce custom sample patterns and filters though for now only presets are available in the driver. To enable this feature all you have to do is select “Enhance application settings” in the Catalyst Control Panel AA options.
Also, the post-process based morphological AA mode enabled in the Barts design will be available on the HD 6900 cards.
Also, the post-process based morphological AA mode enabled in the Barts design will be available on the HD 6900 cards.
Probably the most interesting change to the HD 6900-series of cards is the inclusion of a new technology called PowerTune that’s goal is clamp the GPU TDP to a pre-determined level. By integrating control processors to monitor GPU activity in real time, the GPU has the ability to dynamically adjust the clock rate to enforce a TDP. This gives the driver (and potentially end users) the ability to directly control the GPU power draw via an algorithmic approach to guarantee consistent performance across various products.
The reason for this technology is this: it allows AMD to push the clock speeds of its GPUs higher without having to worry about what they call “outlier” applications that push up power draw to dangerous levels. Applications like FurMark, 3DMark 03 Game Test 4, the Perlin Noise test in 3DMark Vantage, etc push GPUs to a higher power draw level than even the toughest of available consumer games. Because of this, in previous generations, AMD has been forced to keep the stock frequencies at a low enough level that even if a user really stresses the design with “OCCT SC8”, they wouldn’t fry their board. By constraining those cases, by lowering the clock speeds as TDPs are reached, AMD can set the default clock speed higher for better performance in games without the risk of killing cards. Or so the theory goes.
The reason for this technology is this: it allows AMD to push the clock speeds of its GPUs higher without having to worry about what they call “outlier” applications that push up power draw to dangerous levels. Applications like FurMark, 3DMark 03 Game Test 4, the Perlin Noise test in 3DMark Vantage, etc push GPUs to a higher power draw level than even the toughest of available consumer games. Because of this, in previous generations, AMD has been forced to keep the stock frequencies at a low enough level that even if a user really stresses the design with “OCCT SC8”, they wouldn’t fry their board. By constraining those cases, by lowering the clock speeds as TDPs are reached, AMD can set the default clock speed higher for better performance in games without the risk of killing cards. Or so the theory goes.
This graph demonstrates how the theory behind the power management works and it seems to share a lot with the Intel technology utilized in primary system processors. The power containment keeps the GPU from pulling more power than it would have otherwise allowed (the red line) for a scaled total power draw seen by the dotted line.
This could and does have an effect on some applications available today. The Perlin noise test in 3DMark Vantage causes the GPU clock frequency to scale up and down as power draw exceeds the threshold set by AMD at the factory. AMD wants to point though that even though frequency scales the performance is very consistent over time so even when this happens it shouldn’t be noticed by the consumer.
For enthusiast users that are worried, AMD is going to give us some control over this feature in the CCC by allowing gamers to increase the maximum power draw by as much as 20%. In the example above, you can see that by increasing maximum power by only 5% the clock rate fluctuates quite a bit less and at 10%, the GPU stays at 800 MHz consistently.
AMD is adamant that they have not seen a case with either the HD 6970 or HD 6950 where a real-world game has caused this feature to become enabled but it is definitely something for us to watch out for as newer titles stretch these cards down the road.
AMD is adamant that they have not seen a case with either the HD 6970 or HD 6950 where a real-world game has caused this feature to become enabled but it is definitely something for us to watch out for as newer titles stretch these cards down the road.
With the new Cayman architecture dissected, let’s take a look at the new HD 6900-series graphics cards and see what detailed specifications they have for us.