HotChips 2012

At Hot Chips 2012 AMD has given us our first glimpse of Steamroller


Ah, the end of August.  School is about to start.  American college football is about to get underway.  Hot Chips is now in full swing.  I guess the end of August caters to all sorts of people.  For the people who are most interested in Hot Chips, the amount of information on next generation CPU architectures is something to really look forward to.  AMD is taking this opportunity to give us a few tantalizing bits of information about their next generation Steamroller core which will be introduced with the codenamed “Kaveri” APU due out in 2013.

AMD is seemingly on the brink of releasing the latest architectural update with Vishera.  This is a Piledriver+ based CPU that will find its way into AM3+ sockets.  On the server side it is expected that the Abu Dhabi processors will also be released in a late September timeframe.  Trinity was the first example of a Piledriver based product, and it showed markedly improved thermals as compared to previous Bulldozer based products, and featured a nice little bump in IPC in both single and multi-threaded applications.  Vishera and Abu Dhabi look to be Piledriver+, which essentially means that there are a few more tweaks in the design that *should* allow it to go faster per clock than Trinity.  There have been a few performance leaks so far, but nothing that has been concrete (or has shown final production-ready silicon).

Until that time when Vishera and its ilk are released, AMD is teasing us with some Steamroller information.  This presentation is featured at Hotchips today (August 28).  It is a very general overview of improvements, but very few details about how AMD is achieving increased performance with this next gen architecture are given.  So with that, I will dive into what information we have.


Hot Chips 2012: An Introduction to Surround Computing

At Hotchips today we get our first glimpse of what is in store from AMD.  The entire presentation is very general, with the first portion touching upon what AMD considers the “Surround Computing Era”.  In a nutshell surround computing touches upon nearly every aspect of a person’s life having contact with some kind of processing solution.  Keyboards and mice will no longer be the primary method of interaction with the computing world, but rather it will go towards gestures, voice, location recognition, facial recognition, and pattern/behavior anticipation and prediction.  These will be combined with rich graphics and representations of people and environments superimposed over reality.  AMD expects this to become a reality in the next 20 years, and they are tailoring their technology to meet these needs.

The next portion covers what they are expecting to do with HSA.  The primary goal is to make the CPU and GPU equal partners in computing.  To achieve this they must make the transition seamless and transparent to programmers and users.  Instead of relying on products like OpenCL to expose the GPU functionality, AMD is working to make the hardware directly accessible to programmers through high level languages like C, C++, Python, Javascript, and HTML 5.  The GPU portion will have shared virtual memory, coherency, and support context switching natively.  This is not new information, but rather a recap of what we learned at last year’s Fusion Developer’s Summit.  Because AMD is combining the CPU and GPU on one piece of silicon, they have complete control over how these pieces not only communicate with each other, but with the outside world.  This integration will be much tighter in future generations of products.

That is all well and good, but we are far more interested in what is coming up next.  With that, AMD shares with us a very brief overview of what they intend to deliver with Steamroller.

Looking back over the past year we see that Bulldozer is not a bad architecture; it just is not all that great.  If viewed in a vacuum it provides an interesting solution to an increasingly parallel software environment.  The ability to handle many threads effectively without inflating die size to a significant degree is the hallmark of the Bulldozer architecture.  Unfortunately for AMD, there were enough downsides to the design that it was viewed as a failure not just against Intel’s Sandy Bridge and Ivy Bridge processors, but also against the previous generation AMD Phenom II X6 series of products.  The primary issues that we see deal with power consumption and heat, effective thread handling, and lower than expected IPC.

Some of these issues were addressed with the Piledriver update.  The biggest fixes involved power and heat.  Bulldozer was really rushed to market, and as such it was not fully optimized.  To achieve competitive yields and bins, the design was a bit more “loose” than what was aimed for.  More transistors were used than would be necessary if more time had been given in the design process.  But AMD was against a wall, and they needed to get Bulldozer out the door.  Piledriver improves upon IPC, and probably most importantly, the power and clocking issues that Bulldozer suffers from.  Trinity shows us that the design can achieve very good power savings vs. clockspeed, and the small performance bump that it exhibits is very welcome.

« PreviousNext »