I am pretty sure I am not the only person who has read these Bulldozer reviews (including Ryan’s here at PC Perspective) and had that particular reaction.  Bulldozer was supposed to bulldoze the competition.  It turns out it barely outpaces its own predecessor, the Phenom II X6 1100T.  In fact, in terms of IPC, the older Thuban architecture gives it a sound thrashing when both are clocked at 3.3 GHz.  So why should I be impressed with this processor?

I guess the answer is… you shouldn’t.  At least not yet.  I distinctly remember back in November of 2007 being invited to Lake Tahoe to test and report about the first Phenom samples that were available for limited testing.  We were not allowed to take the samples home with their new AM2+ based motherboards.  When going over the results of the tests with Ryan (I was not part of PCPer at the time) we quickly saw that the 2.6 GHz Phenom was unable to keep up with the Core 2 Q6600 from Intel.  This was a little surprising, as we expected the original Phenom to clean house due to its very forward looking architecture (HT, IMC, beefier FP/SIMD units, etc.).  The original Phenom had its fair share of problems, to say the least.  TDPs were very high, there was the revision B2 bug that was solved in B3, and due to the 65 nm process it did not nearly have as much cache as was needed to make it a more efficient product.

Click to read the rest of this post.

Time passed and we were eventually introduced to the Phenom II products which fixed all of those issues.  AMD finally had a product that could match the high end Core 2 Quad CPUs of the time in nearly every aspect.  Unfortunately for AMD, Intel released the Nehalem/i7 based processors to the market.  Parity was not retained with the new architecture from Intel, and AMD has been scrambling to keep up ever since.

We see a few similarities with the Bulldozer launch, but it does not seem quite so dire.  There are no major bugs like the B2 problem with Phenom.  TDPs are not out of control (though they are not all that great).  Overall performance falls around that of the i5 2500 and they are offered at around the same price point.  There are a lot of interesting aspects to the architecture, and it is quite forward looking.  Unfortunately for AMD, there is a lot more tuning that needs to be done to achieve the potential of this architecture.

One big highlight of this release is that of the large L2 and L3 caches.  In previous generations AMD and GLOBALFOUNDRIES could not shrink the SRAM cell as effectively as Intel could with their process.  With the 32 nm HKMG/SOI process from GF, this is no longer an issue.  In fact, the geometry of the SRAM cells are overall slightly smaller than what Intel can currently achieve with their 32 nm process.  This is why we see a total of 16 MB of caches onboard a fully functional Bulldozer chip.  The L2 caches are clocked at core processor speeds while the L3 cache is clocked at the same speed as the Northbridge (2.2 GHz in this case).  This is a big boost from the previous quad core Phenom II (8 MB total cache with 512 KB per core).  This doubling of onboard cache should be more than adequate to feed the four modules with data to keep the execution units from data starvation.  AMD also did a lot of work on the memory controller and it has a maximum speed of DDR-3 1866.  Bandwidth to main memory should not be an issue with this processor.  The downside is that there is less L1 cache for each module, and each integer unit has a pretty paltry 16KB of L1D cache (Phenom II had 64 KB of L1D per core).  Also add into the equation that per clock latencies for these caches were increased.  While this is somewhat offset by higher core clock speeds, the differences do have a major impact on IPC.

So why exactly is it not performing up to spec?  We are somewhat baffled by it, as the previous “new” product that AMD released garnered rave reviews.  The “Bobcat” core which powers the quick and energy efficient Ontario and Zacate products did everything that was expected of it.  Low power consumption, high performance as compared to competing devices, and a very competent graphics portion all wrapped up into one outstanding product.  Why did Bulldozer fail to impress?

I think there are several reasons for the disappointing performance of this part.  The design is very forward looking and complex.  Data management is likely the overarching reason for the results.  The integer and FP/SIMD units are simply not being utilized to their full potential.  The front end, namely the prefetch, predict, and decode units, are simply not optimized when dealing with the workloads we have tested with.  There are some corner case areas where Bulldozer simply blows away the competition, but these are not common by far.  The smaller L1D cache and the much higher latencies throughout the cache system also will have a deleterious effect on overall processor performance at the clockspeeds we are currently seeing.

Happily AMD is not done with the architecture.  We have now been hearing that AMD is aggressively moving up the “Piledriver” refresh which promises to improve general x86 performance by 10% to 15% per core per clock.  Such a boost in IPC should allow this next product to more adequately compete with Intel and their latest Sandy Bridge parts.  Unfortunately for AMD, it will be Ivy Bridge that will be on the market by the time the desktop Piledriver CPUs hit.

Bulldozer is not a bad product, and it certainly is a big step up from where the original Phenom was at during the time of its release.  It just is not a world beater.  The architecture has a lot of promise, and these performance kinks will be worked out.  Unfortunately, it is going to be a while before we see AMD in a position to leapfrog Intel for the performance crown.  It will not be until GLOBALFOUNDRIES reaches 22 nm in a few years that AMD has another window of opportunity to release a product that could overshadow what Intel has to offer.  Then again, when the 22 nm shift occurs for AMD, Intel will be introducing their 2nd generation 22 nm parts (Haswell).

Some competition in the marketplace is better than none.  This will not sound the death knell of AMD, but it is not going to give AMD a significant boost.  No, the next boost will hopefully come from Trinity, the next generation APU that AMD plans to release in Q1 2012.  Hopefully that particular design will more adequately deliver on the promises of this architecture.