Failing to Live Up to Expectations
In the Fall of 2007 journalists were invited to Lake Tahoe to test the latest and greatest CPU from AMD. The Phenom was a monolithic die quad core part with 2 MB of shared L3 cache. This was supposed to be the next big step for AMD. Intel still used dual die setups for their quad parts and still had not introduced their own integrated memory controller. Many thought that this would be another slam dunk for AMD in their fight with Intel. They were incorrect.
The original Phenom had many problems. First off all it could not adequately clock above 2.6 GHz without a tremendous amount of power applied. It also did not have the IPC to compete with Intel in both single threaded and multi-threaded loads. While it was faster than the previous Athlon X2 parts in most instances, it was not the boost that AMD needed. It also suffered from a bug that caused errors in scientific workloads that was not discovered until shortly before launch. AMD had to disable some functionality to get around this, but it also caused a significant performance hit. AMD fixed this issue with the B3 revision of the Phenom, but a lot of damage had been done.
AMD diversified their Phenom lineup to offer quad, triple and dual core products based off of the same quad core die. The disabled units gave AMD the flexibility to utilize cores that may not meet TDP standards for quad core performance, or other chips that may feature defective cores due to manufacturing. Seemingly the only bright spot for this release was that AMD finally had a mostly competitive chipset with the 790 series combined with the SB750 which provided a much better experience than the previous SB600 series of southbridges which featured less than robust SATA support and speed.
This was still not enough to stop the erosion of marketshare from a mostly non-competitive part that simply could not get past some basic limitations. Intel continued to improve upon the Core 2 products with the Penryn core and further leave AMD behind. AMD’s Phenom on their 65nm PD-SOI process was just not going to cut it. So AMD went back to the drawing board.
The next chip was far more competitive and could actually clock high enough to give Intel a run. The Phenom II processors fixed all of the major issues of the original Phenom, but it added in the new 45 nm process with immersion litho and low-k support. The initial Phenom II 920 and 940 chips were clocked at 2.8 GHz and 3.0 GHz respectively. The 940 very nearly matched the performance of the Intel Q9650, but could not get close to the QX 9770. Sadly for AMD, Intel had already released the latest Nehalem CPUs which featured Intel’s first foray into integrating northbridge features into a CPU. Nehalem would be the basis for the next several generations of parts from Intel that would stretch into 2017.
AMD could not compete with Nehalem, but they were able to carve a niche out for themselves with the Phenom II. They would then release the six core Thuban which gave the Phenom II brand a boost in heavily multithreaded software. They also implemented some basic clock boost control that allowed all six cores to run at 3.2 GHz while boosting 3 cores and less to 3.7 GHz in lightly threaded applications.
These were some of the last truly competitive chips from AMD when looking at server and desktop performance. AMD was going to make a very aggressive move with an architecture that was unproven and new to the marketplace. Sadly, it would not reap the benefits they were hoping for.
The Bulldozer that Dozed Very Little
The concept of CMT computing is a logical one that in theory could provide a great amount of performance for a very small overall increase in die space. Sadly, this did not turn out to be the case. Bulldozer came out hot and underperforming. While it did excel in multi-threaded applications which could leverage 8 threads or more, it just simply ran slower than expected and the single thread IPC was sub-par even compared to the older Phenom II. Intel just smiled and kept producing their Sandy Bridge and then Ivy Bridge products that would consistently trounce Bulldozer in performance and power consumption.
AMD made some pretty hefty improvements with Piledriver, but it still was not enough to catch up to Intel and their stable of high performing products. Intel further enlarged the gap and took away nearly every advantage AMD had with their Xeon based Ivy Bridge-E cores. Intel also charged between $450 to $1000 for the privilege to own one of these processors. Piledriver would be the last dedicated desktop CPU from AMD. It has powered the FX series on the AM3+ platform since late 2012. An amazing 4.5 years have gone past with only one minimal performance jump during that time. AMD did release a hyperclocked FX-9000 series of chips, but these sat at a very, very toasting 220 watt TDP. Not every motherboard could power this chip, and cooling requirements were stringent. It offered performance on par overall with the i7 3770K, but the downsides to this processor were just too much for most users.
On the APU side AMD continued to evolve the Bulldozer architecture and improve upon it. Steamroller and then Excavator slowly overcame many of the downsides to the architecture and offered better IPC and better power consumption through every step of the way. It was too little and too late though for the CMT based architecture. Instead of continuing along this path, AMD brought in veteran designer Jim Keller to do a clean sheet design that borrowed from previous architectures sparingly, but where necessary. Jim helped provide the magic that powered the Athlon 64, and AMD was hoping he would bring that touch to their next generation of products.
AMD cancelled all further development on CMT based parts after Excavator and went full speed on Zen.
Great article Josh – one of
Great article Josh – one of the best I’ve read on PCper in a while. One for the old gits to reminisce me thinks…
I originally got into
I originally got into computers in High School in the 80s, but it was really expensive so I couldn't afford anything. It really wasn't until 1996 that I had the funds to start exploring hardware. That is when I bought my first machine myself and in about 5 months had started to fiddle with it. Adding the 3DFX Voodoo Graphics card supercharged my interest. Was hooked ever since. Wished I had the chance to play with some of the older AMD parts pre-95.
Pretty much the same as me.
Pretty much the same as me. Amigas till the early 90’s, then onto 486 > P120 > Orchid Righteous 3D yadda yadda. I was 40 the other day which is depressing!
Love it. I was around for a
Love it. I was around for a lot of this, but it was before I started building; very cool to know that this CPU race at least used to be a very close one. Can only hope that becomes the case again.
These things seem to cycle
These things seem to cycle around. The only thing really different about this time is that while Intel hasn't been aggressively pushing the industry, it is certainly not in a weaker position architecturally as compared to the Pentium !!! and Pentium 4 days.
hey josh and guys thanks for
hey josh and guys thanks for the history lesson.
In my article I thought I was using k6 but I guess it was an Athlon.
Anyway thanks again.
Thanks for reading!
Thanks for reading!
Great write up josh
Great write up josh
thank you for this
“Going with a x86 decode with
“Going with a x86 decode with a “risc-y” core solved a lot of problems and we have essentially have had that solution ever since.”
I think using statements like this causes confusion. The micro-ops are not equivalent of RISC instructions. They are probably quite long and complicated because they embed a lot of information about the original AMD64 instruction and they may include a lot of run time data also, like register renaming stuff. In my opinion, RISC and CISC are obsolete terms. Modern processors are closer to CISC with a few RISC like features. The main thing you want is fixed instruction length encoding to allow for easier pipelining and super scalar, out-of-order execution. You also don’t want a large number of complicated addressing modes. Even with an old CISC ISA, those can mostly be worked around. They just have the compilers not use complex addressing modes and the complex, irregular length instruction encoding is converted to micro-ops that the backend can pipeline and such. I don’t consider even ARM ISA to be anywhere close to a traditional RISC ISA. It has a huge number of very specialized instructions which is the exact opposite of RISC ISAs. It is cleaner and simpler to decode than x86, but it is not RISC.
I seem to remember
I seem to remember discussions back in the day when talking about this way of decoding x86 instructions, and they would often term it "RISC-y". It certainly is not RISC, but you can see how they would be using such a term to describe it back in 1995.
Very good article covering
Very good article covering the major CPU milestones for AMD. I hope the younger readers who may not be that familiar with past AMD successes take the time to understand the advances made by AMD and the effect on keeping Intel R&D moving forward at a more rapid pace. As in any market, competition brings out the best in everything. Better products, better pricing and more rapid advances. Let’s hope AMD continues with the initial success of Ryzen.