While much has been made of Phenom’s lack of performance when compared to the Intel Core 2 based products, I was curious how much faster the Phenom was than the older Athlon X2. It turns out that AMD had been quite busy under the hood, and the results were more impressive than I had originally expected.It is not much of a stretch to say that AMD may have been a tad bit aggressive to pursue a monolithic quad core architecture while still using their aging 65 nm process. The chips we have are big (280 mm square +), hot, and power hungry considering their clockspeed. Perhaps a case of “A Bridge Too Far”, but one that was forced upon them by the continued competition from Intel in the form of their Core 2 Duo and Core 2 Quad families of products. While the Phenom does come close to catching Intel in terms of per core performance, it does fall short in most applications. This has lead most reviewers and quite a few enthusiasts to pronounce that AMD is “toast”.
While it certainly is disappointing that AMD could not overcome the performance mark that Intel has set, I think that it is actually quite interesting to see how far AMD has come with the performance of the Phenom when compared to its previous Athlon X2 architecture. It is generally accepted that Phenom is faster than Athlon, but exactly how much faster is it? I decided to take a look at three different parts and measure the performance and power consumption, and see exactly how much AMD has improved with their new architecture.
At one time this chip gave Intel that sinking feeling…
From 10,000 feet, the differences between the core architectures do not appear that great. Luckily for AMD, the devil is in the details. The basic core setup has not changed in that it still features the 3 issue decode, the same amounts of L1 and L2 as the 65 nm X2 variants, integrated memory controller and hypertransport links. AMD has gone through and tweaked much of the design, and have doubled quite a few aspects of the processor. The engineers have doubled the bandwidth to the L1 and L2 caches, doubled the TLB entries, and features a SSE unit that is now 128 bits wide vs. the earlier 64 bit wide unit. The memory controller received a lot of work so it can now function as either a dual channel unit (2 x 64 bit channels), or a ganged unit (1 x 128 bit channel), depending on how it is set in the motherboard BIOS.
The monolithic quad core is a marvel of CPU design… but unfortunately it might have been brought out on the wrong process node.
The addition of the L3 cache is the largest difference between the old and the new, and 2 MB of extra cache shared between four cores is supposed to be a great boon for performance. The best part of all the caches is that they are exclusive, and cache memory is not wasted due to replicating data. The only downside of having a L3 cache is that main memory accesses have another level of latency to deal with, as memory requests go from L1 to L2 to L3 before going to main memory. This should not affect overall performance all that much, because most of the common instructions will be located somewhere in those three levels of cache on the processor die.
The Shoehorn Theory
While AMD is not exactly a small company, it is dwarfed by the size and resources of Intel. As such a few simple truths must be recognized. Truths such as how the Episcopal Church does not recognize the Papacy, Baptists don’t recognize each other in the liquor store, and AMD recognizes they cannot afford to have wildly differentiated products addressing multiple spaces. The Phenom that we see today tries to cover two major areas for AMD, and it addresses one of those areas very well. The other… not as well as what Intel is able to do with their Core 2 architecture.
Barcelona was built from the ground up to be a competitive and scalable server processor. In that functionality it succeeds very well. In 2P and 4P applications, the scalability and performance of the 2.5 GHz Barcelona matches or exceeds that of the 2.93 GHz quad core Xeons based on the Core 2 architecture. The large exclusive caches, integrated memory controller, and HyperTransport links combined with a solid and performant core design makes the Barcelona a strong competitor in the multi-socket server market. These features all take up die space and create a much more complex product. Intel on the other hand still utilizes a front-side bus architecture which cuts down on transistor counts and complexity, and allows Intel to concentrate more on core performance and their highly flexible L2 cache. The downside to Intel’s focus is that multi-socket scalability is greatly reduced because all data traffic, whether it is from core to core or core to memory, has to share the same FSB. When you attach two dies to the same FSB (as the quad core Xeons do) then it congests traffic even more. Efficiency when going from 2P to 4P servers takes a dive when compared to what AMD can do with Barcelona.
I wonder which one runs at 2.6 GHz?
This leaves AMD with the slightly unpalatable position of having to utilize these large server chips to support their desktop aspirations. This is not to say that Intel is not doing the same thing with their lower end Xeon’s, but it certainly is not the case with their larger L2 cache products and their top end Itanium series. At 280 mm square+, Phenom is a large chip. Even when fabbed on 300 mm wafers AMD can only place approximately 200 complete dies per wafer, and using the Murphy Model Yield, only about 60 good dies are gathered per wafer. Consider that each wafer costs about $3000 to $5000 to finish, and not counting packaging costs, each raw die is worth between $50 and $83. Once you consider finishing costs and company overhead, it is no wonder that AMD’s margins are pretty low. Not all is lost though, as AMD is able to do some things to recover bad dies, as well as utilize partially defective dies as their triple core options.
AMD has no other option than to soldier on with what they have and take their lumps for presenting a server grade design as a desktop performance part. Now that the pesky L3 TLB errata has been fixed with the B3 steppings, AMD is finally shipping a significant number of processors to server OEMs. The B3 stepping is also being aggressively sold into the desktop marketplace, and so far the reaction to the $235 US 2.5 GHz Phenom has been, well, pretty phenomenal. Sales have not exactly skyrocketed AMD into first place, but the margins on these chips are still decent enough to keep the company afloat, and user response has been overwhelmingly positive. We really have to look past most of the benchmarks and peer into the user experience. Once we consider that most users can buy a feature comparable AMD supporting motherboard for around $20 to $50 less than an Intel model, and the admittedly small performance differences between the Intel quad core which retails for about $50 more than the AMD product, we can see that AMD is carving out a niche for themselves by being smart with pricing with both CPUs and chipsets. Then again, it could be less about smarts and more about necessity at this point.
The Phenom is a pretty significant upgrade to the X2 series of processors, so let us explore exactly how much of a difference there is between these products.