Bulldozer to Vishera

AMD’s new Vishera based processors promise better performance and thermals


Bulldozer is the word.  Ok, perhaps it is not “the” word, but it is “a” word.  When AMD let that little codename slip some years back, AMD enthusiasts and tech journalists started to salivate about the possibilities.  Here was a unique and very new architecture that promised excellent single thread performance and outstanding multi-threaded performance all in a package that was easy to swallow and digest.  Probiotics for the PC.  Some could argue that the end product for Bulldozer and probiotics are the same, but I am not overly fond of writing articles containing four letter colorful metaphors.

The long and short of Bulldozer is that it was a product that was pushed out too fast, it had specifications that were too aggressive for the time, and it never delivered on the promise of the architecture.  Logically there are some very good reasons behind the architecture, but implementing these ideas into a successful product is another story altogether.  The chip was never able to reach the GHz range it was supposed to and stay within reasonable TDP limits.  To get the chip out in a timely manner, timings had to be loosened internally so the chip could even run.  Performance per clock was pretty dismal, and the top end FX-8150 was only marginally faster than the previous top end Phenom II X6 1100T.  In some cases, the X6 was still faster and a more competent “all around” processor.

There really was not a whole lot for AMD to do about the situation.  It had to have a new product, and it just did not turn out as nicely as they had hoped.  The reasons for this are legion, but simply put AMD is competing with a company that is over ten times the size, with the resulting R&D budgets that such a size (and margins) can afford.  Engineers looking for work are a dime a dozen, and Intel can hire as many as they need.  So, instead of respinning Bulldozer ad nauseum and releasing new speed grades throughout the year by tweaking the process and metal layer design, AMD let the product line sit and stagnate at the top end for a year (though they did release higher TDP models based on the dual module FX-4000 and triple module FX-6000 series).  Engineers were pushed into more forward looking projects.  One of these is Vishera.

Click here to read the rest of the Vishera Review!

Bulldozer as it Should Have Been

Piledriver is the overall code name for the architecture behind Vishera.  We have previously seen Piledriver in the current Trinity APUs that were initially offered this past Spring/Summer.  We also finally were able to see Trinity on the desktop earlier this month.  On the Bulldozer side the desktop, four module plus 8 MB L3 cache product was code named Zambezi.  Vishera is essentially comprised of four Piledriver modules with 8 MB of L3 cache available to it at maximum.  Trinity on the other hand is a maximum of two Piledriver modules with no L3 cache, though it does prominently feature the VLIW4 based GPU.

Vishera is a heavy redesign of Zambezi.  It is not exactly a new chip, but the differences between it and Zambezi are significant.  Nearly every aspect of the design has been addressed, and a lot more time has been spent on layout and timings.  Looking at the image below, we see exactly how much has been added to Vishera to improve overall performance.  There are a lot of little additions throughout the entire design, and the hope is that all of these little changes will add up to a far better performing product.

Not to oversimplify, but AMD had to make Zambezi a bit more leaky when it comes to transistors.  They did this to get the chip working close to the design specifications.  I believe the original target for the 4 module/8 core top end chip was 4GHz, but in the end the FX-8150 was released at 3.6 GHz with a 125 watt TDP rating.  So right off the bat relative performance of this part is going to be lower than original expectations.  Per clock performance again took a hit when AMD had to loosen up timings between different components and caches to again get the CPUs to work on a more consistent basis.  These two factors allowed AMD to improve overall yields and bins, but the price excised for these changes was performance and power consumption/heat production.

AMD essentially went over Zambezi with a fine toothed comb.  Not only did they implement the improvements as listed above, but they also fixed a lot of the timing issues.  On top of that they were able to reduce the overall leakage (though I am again being way too general here) by replacing soft-edge flip-flops (jitter tolerant designs which increased power consumption) with hard-edge flip-flops (more hand tuned designs which show improved power characteristics).  All indications point to the original Bulldozer being highly automated in design while Piledriver shows a greater amount of hand tuning.

The changes in Piledriver are not only designed around increased IPC, multi-core efficiency, and power consumption.  AMD added in support for FMA3 (Intel’s response to AMD’s FMA4) as well as F16C extensions (converts between 16 bit floating point and 32 bit floating point).  AMD now covers all of the major new extensions that are currently available.

Looking over it all, I think we can view Vishera as what Zambezi should have been.  While Zambezi was not exactly stillborn, it was more than a tad under-cooked.  Vishera is a much more competent design, and it hits all of the original specifications set for Zambezi.  It is just unfortunate that it is essentially a year later than what many had hoped for.  As such, it now has to compete with the latest Intel Ivy Bridge products.  We could probably go on for ages with all of the changes that AMD did, but in the end we have a part that is much closer to expectations than what we have seen so far.

Each Piledriver module that is included in Vishera is comprised of four x86 decode units, two integer execution units, a single 2 x 128 bit shared floating point/MMX/SSE/AVX unit, and 2 MB of L2 cache.  In the fully functional Vishera CPU, there will be four modules and 8 MB of L2 cache available to the units.

« PreviousNext »