AMD Details Carrizo Further

AMD’s Carrizo gets uncovered more with details about design and efficiency.

Some months back AMD introduced us to their “Carrizo” product.  Details were slim, but we learned that this would be another 28 nm part that has improved power efficiency over its predecessor.  It would be based on the new “Excavator” core that will be the final implementation of the Bulldozer architecture.  The graphics will be based on the latest iteration of the GCN architecture as well.  Carrizo would be a true SOC in that it integrates the southbridge controller.  The final piece of information that we received was that it would be interchangeable with the Carrizo-L SOC, which is a extremely low power APU based on the Puma+ cores.

A few months later we were invited by AMD to their CES meeting rooms to see early Carrizo samples in action.  These products were running a variety of applications very smoothly, but we were not informed of speeds and actual power draw.  All that we knew is that Carrizo was working and able to run pretty significant workloads like high quality 4K video playback.  Details were yet again very scarce other than the expected timeline of release, the TDP ratings of these future parts, and how it was going to be a significant jump in energy efficiency over the previous Kaveri based APUs.

AMD is presenting more information on Carrizo at the ISSCC 2015 conference.  This information dives a little deeper into how AMD has made the APU smaller, more power efficient, and faster overall than the previous 15 watt to 35 watt APUs based on Kaveri.  AMD claims that they have a product that will increase power efficiency in a way not ever seen before for the company.  This is particularly important considering that Carrizo is still a 28 nm product.

New CPU Core, New Design Platform, and a New GCN Core

The Bulldozer architecture for AMD has not panned out exactly as planned for them.  While Intel was focusing on high levels of IPC combined with decent frequency scaling, AMD went for a design with shared resources aimed at high clock speeds and highly parallel workloads.  While the architecture has kept AMD afloat, it was not nearly the success that they had hoped for.  So far we have seen Bulldozer, Piledriver, and Steamroller cores hit the market with varying levels of success.  Now it is time for Excavator.

The last new core based on the Bulldozer architecture has been named Excavator and is being introduced with the Carrizo APU.  Excavator is designed to provide around 5% greater IPC than the previous Steamroller, but will do so at less power.  Excavator has doubled the L1 cache size from previous implementations, and it also adds the latest instructions to the mix.

AMD does not have the resources to do hand layouts on every core out there.  They instead rely on a lot of automated place and route.  This typically causes transistor budgets to inflate due to the inefficiencies of the software and the use of standard cell designs.  It also causes the die size to be larger due to wasted space from using the more regularly shaped standard cell libraries.  AMD has done two things that have positively impacted their design of Excavator to make it more competitive and more power efficient.

The CPU guys have worked closely with the GPU engineers to utilize a High Density Cell Library.  GPUs have been historically characterized by running slower, but having very dense designs that can do a lot of work per cycle.  CPUs have traditionally been faster, but leakier designs with more cache rather than more logic.  AMD gambled on using these high density libraries to design Excavator, and it appears to have paid off.  AMD has shrunk each Excavator module by an impressive 23% as compared to the previous Steamroller implementation.  They have also utilized a more GPU-centric metal stack which not only enables greater density, it has a positive effect on shorter interconnects between functional units.    Previous generations of CPUs have used the tapered metal stack to improve transistor switching speed.

While AMD did not say this up front, these design changes will impact the overall top speed of the Carrizo parts.  This is not necessarily a bad thing.  It is unlikely that an Excavator based 4 module CPU will ever make it to the desktop, and certainly if there was one it would have a hard time reaching 4 GHz.  In a mobile application which does not reach such speeds anyway, it will actually have a positive impact on overall performance and battery consumption.  As we can see by the power/speed curves provided by AMD, we see better power scaling to transistor switching speed than the previous “high speed optimized” Steamroller cores.  We see a crossover in power/speed at around the 20 watt point, but also consider that this is the TDP “per module” rather than the entire chip.  In a 35 watt TDP mobile APU, each module/core pair will be in the 10 to 13 watt TDP range, so the scaling at those power ratings is superior to that of the previous Steamroller.  The modules can either run faster at the same TDP, or they can run more efficiently at the same speed.

The GPU portion of the APU has also seen a lot of work.  It will be based on the latest GCN architecture design, so it has all the latest bells and whistles that AMD has put into cores such as the desktop “Tonga” GPU.  In this case the CPU guys helped the GPU engineers with their design flows to improve overall frequency while consuming less power.  This also can go the other way in that they can keep the same frequency, but decrease power consumption by some 20%.  This change is allowing AMD to enable all 8 GCN cores in even their lowest power APUs based on Carrizo.  This was not possible with Steamroller and we would see a max of 6 GCN cores in the 15 watt TDP range.  This will improve graphics performance dramatically in those parts without breaking the TDP budget.

« PreviousNext »