Bulldozer Architecture
The long awaited Bulldozer cores are finally here for consumers but are they worth the wait?
Introduction
Bulldozer. Since its initial unveiling and placement on the roadmap many have called the Bulldozer architecture the savior of AMD, the processor that would finally turn the tide back against Intel and its dominance in the performance desktop market. After quite literally YEARS of waiting we have finally gotten our hands on the Bulldozer processors, now called the AMD FX series of CPUs, and can report on our performance and benchmarking of the platform.
With all of the leaks surrounding the FX processor launch you might be surprised by quite a bit of our findings – both on the positive and the negative side of things. With all of the news in the past weeks about Bulldozer, now we can finally give you the REAL information.
- Bulldozer First Release and the State of 32nm AMD Parts
- AMD Bulldozer Processor hits 8.429 GHz – New World Record!
- AMD Bulldozer FX Processor Benchmarks Leaked
Before we dive right into the performance part of our story I think it is important to revisit the Bulldozer architecture and describe what makes it different than the Phenom II architecutre as well as Intel’s Sandy Bridge design. Josh wrote up a great look at the architecture earlier in the year with information that is still 100% pertinent and we recount much of that writing here. If you are comfortable with the architeture design points, then feel free to skip ahead to the sections you are more interested in – but I recommend highly you give the data below a look first.
The below text was taken from Bulldozer at ISSCC 2011 – The Future of AMD Processors.
Bulldozer Architecture Revisited
Bulldozer brings very little from the previous generation of CPUs, except perhaps the experience of the engineers working on these designs. Since the original Athlon, the basic floor plan of the CPU architecture AMD has used is relatively unchanged. Certainly there were significant changes throughout the years to keep up in performance, but the 10,000 foot view of the actual decode, integer, and floating point units were very similar throughout the years. TLB’s increasing in size, more instructions in flight, etc. were all tweaked and improved upon. Aspects such as larger L2 caches, integrated memory controllers, and the addition of a shared L3 cache have all brought improvements to the architecture. But the overall data flow is very similar to that of the original Athlon introduced 14 years ago.
As covered in our previous article about Bulldozer, it is a modular design which will come in several flavors depending on the market it is addressing. The basic building block of the Bulldozer core is a 213 million transistor unit which features 2 MB of L2 cache. This block contains the fetch and decode unit, two integer execution units, a shared 2 x 128 bit floating point/SIMD unit, L1 data and instruction caches, and a large shared L2 unit. All of this is manufactured on GLOBALFOUNDRIES’ 32nm, 11 metal layer SOI process. This entire unit, plus 2 MB of L2 cache, is contained in approximately 30.9 mm squared of die space.
Continue reading our review of the AMD FX Processor (codenamed Bulldozer)!!
It is well known that Bulldozer embraces the idea of “CMT”, or chip multi-threading. While Intel supports SMT on their processors, it is not the most efficient way of doing things. SMT sends two threads to the same execution unit, in an attempt to maximize the work being done by that unit. Essentially fewer cycles are wasted waiting for new instructions or resultant data. AMD instead chose to implement multi-threading in a different way. For example, a Bulldozer core comprised of four modules will have eight integer execution units, and four shared 2 x 128 bit floating point/SIMD units. This allows the OS to see the chip as an eight core unit.
CMT maximizes die space and threading performance seemingly much better than SMT (it scales around 1.8x that of a single core, as compared to 1.3x that using SMT), and CMP (chip multi-processor- each core may not be entirely utilized, and the die cost of replicating entire cores is much higher than in CMP). This balance of performance and die savings is the hallmark of the Bulldozer architecture. AMD has gone through and determined what structures can be shared, and what structures need to be replicated in each module. CMT apparently only increases overall die space by around 5% in a four module unit.
A closer look at the units reveals some nice details. Note the dual MMX (SIMD-Integer) units in the FP/SIMD block. A lot of work has been done on the front end to adequately feed the three execution units.
Gone is the three pipeline integer unit of the Athlon. Bulldozer uses a new four pipeline design which further divides the workloads being asked of it. These include multiply, divide, and two address generation units. Each integer unit is fed by its own integer scheduler. The decode unit which feeds the integer units and the float unit has also been significantly beefed up. And it had to be. It is now feeding a lot more data to more execution units than ever before. The original Athlon had a decode unit comprised of 3 complex decoders. The new design now features a 4 decode unit, but we are unsure so far how the workload is managed. For example, the Core 2 had a 4 decode unit, three of which were simple decode, and the fourth was a complex. My gut feeling here is that we are probably looking at three decoders which can handle 80 to 90% of the standard instructions, while the fourth will handle the more complex instructions which would need to be converted to more than one macro-op. While this sounds familiar to the Core 2 architecture, it does not necessarily mean the same thing. It all depends on the complexity of the macro-ops being sent to the execution units, and how those are handled.
The floating point unit is also much more robust than it used to be. The Phenom had a single 128 bit unit per core, and Bulldozer now has it as 2 x 128 bit units. It can combine those units when running AVX and act as a single 256 bit unit. There are some performance limitations there as compared to the Intel CPUs which support AVX, and in those cases Intel should be faster. However, AVX is still very new, and very unsupported. AMD will have an advantage here over Intel when running SSE based code. It can perform 2 x 128 bit operations, or up to 4 x 64 bit operations. Intel on the other hand looks to only support 1 x 128 bit operation and 2 x 64 bit operations. The unit officially supports SSE3, SSE 4.1, SSE 4.2, AVX, and AES. It also supports advanced multiply-add/accumulate operations, something that has not been present in previous generations of CPUs.
In terms of overall performance, a Bulldozer based core should be able to outperform a similarly clocked Intel processor featuring the same number of threads when being fully utilized. Unfortunately for AMD, very few workloads will max out a modern multi-core processor. Intel should have a slight advantage in single threaded/lightly threaded applications. AMD does look to offset that advantage by offering higher clocked processors positioned against the slower clocked Intel units. This could mean that a quad core i7 running at 3.2 GHz would be the price basis for a 4 module Bulldozer running at 3.5 GHz.
Exact specifications have not been released for the individual parts, but we can infer a few things here. First off is the fact that it appears as though each core will utilize 2 MB of L2 cache. This is quite a bit of cache, especially considering that the current Phenom II processors feature 512 KB of L2 cache per core. Something that has allowed this to happen is buried in GLOBALFOUNDRIES 32 nm SOI process. They were apparently able to get the SRAM cell size down significantly from that of the previous 45 nm process, and allow it to also clock quite a bit higher. This should allow more headroom for the individual cores. With the shrink, we should also expect to see at least 8 MB of shared L3 cache, with the ability to potentially clock higher than the 2 GHz we see the current L3 caches running at.
Another great article Mr
Another great article Mr Shrout. I really appreciate the honesty you provide in your assessment of this processor. Hearing different things on the web I was very interested in this processor. Great new design for the future but it just doesnt seem to compete as well as I would like. I currently have a AMD Phenom II X4 965 and this review has me not really wanting to upgrade to it. I plan to get a new mobo and ddr3 ram so looks like that and a new vid card will be my only purchase in the near future. Although given this review would u recommend the FX-8150 or Phenom X6 1090T/1100T?
It is hard to deny the value
It is hard to deny the value of the X6 processors now based on their price. If money is kind of tight, I have no qualms recommend the 1100T.
Ryan and his crew please stop
Ryan and his crew please stop doing bogus article on fx 8150.unless amd or someone at ms or linux has some benchmark that test with fx feature in mind all test will be irelevent.fma4 alone isnt supported and you try to compare actual fma proc .intel. vs fx.comparing a cpugpu vs a cpu?core parking activated?threading issue in window?mobo bios issue that wont be fixed in w8.come on just stop benchmarking this cpu give a call to mobo.ms.amd then when they have fixes you can revisit[about6 month worth]some silly test seams to forget one fact very important here in 2 year intel will have fma4.2 year.amd isso far ahead they cant even speak to any one for fear of copying.it isnt they dont want to.the techno is just plain further ahead then all expected.people wonder why fx is selling like hotcake.when was the last time you had a 2 year in the futurtechno avail today?it rarelly happen .this is huge new for corp in the software business be it gaming or anything else.i bet a lot are hard at work opimising .or trying.for fma4 and all other lessb highlighted feature.but like i mention this proc was released 6 month too early
I wonder what Trinity will be
I wonder what Trinity will be like. hmmm
Nice article only the tables
Nice article only the tables are hard to readout 🙁
I understand the dilemma of sorting by name or rank.
But personally i really prefer ordering by rank, but that’s me.
A great solution would be to have mouse-over change the ordering so everyone can pick whats best for them.
Next to that some color coding would be nice of competing products
Light Blue for i5 2400 blue for 2500K and dark blue for 2600K
next to that dark green for the FX green for the X6 light green for the X4.
To just suggest something.
Like a said, good read, so so tables
Yeah, that’s fair. We’ll see
Yeah, that’s fair. We’ll see what we can do to improve that in the future!
It’s interesting to note that
It’s interesting to note that you guys came to a rather different conclusion than Anandtech did with regards to gaming performance with Bulldozer.
http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested/8
I’d definitely like to see some more testing done on this.
Notice they ran their games
Notice they ran their games at resolutions like 1024×768 and at the highest, 1680×1050 while I ran my tests at 1080p. In truth, the higher the resolution the less important the CPU performance tends to be.
To some people, they just want to know the raw gaming power of the CPU so running at low resolutions, sometimes even lower than is likely to be run by the gamer (who plays games at 1024×768 anymore??) will show the biggest differences.
In my case I thought it more pertinent to show the most “real world” cases and 1080p seemed to be the way to go.
You can’t argue the Civ V
You can’t argue the Civ V findings, but [H]ard|OCP used similar resolutions and found similar results to Ryan’s.
http://www.hardocp.com/article/2011/10/11/amd_bulldozer_fx8150_gameplay_performance_review
I would to see more gaming
I would to see more gaming benchmarks. Having only 2 games on there seems lazy to me. Where is Starcraft 2, Bad Company 2, Rage, The Witcher 2, and heck put World of Warcraft on there, you know games that people actually play. I don’t anyone who plays Lost Planet 2.
Oh and of course Crysis,
Oh and of course Crysis, Crysis Warhead, Crysis 2. Come on!
Did I miss in the article
Did I miss in the article where you explained why you used a 1090t instead of the top of the line 1100t for most of your benchmarks?
Ah, good point. We used the
Ah, good point. We used the 1090T results from a previous article (Llano I think) and didn’t have time to get in the 1100T to run the full allotment of tests before publication. Instead, with our time we had, I was able to run the 1100T through some our architectural analysis tests (core scaling, etc) and gaming.
So… how well does this CPU
So… how well does this CPU FOLD????
WOndering if it can handle bigadv folding…
You ain’t the only one! I
You ain’t the only one! I suggested it to Ryan but he hasn’t done it since the PS3. Mind you I’m a BOINCer myself.
Based on this review, it’s
Based on this review, it’s hard to justify upgrading from my Phenom II 955 especially when my PC is used mostly for gaming. I was hoping for better power consumption numbers when compared to what Sandybridge has.
The architecture is intriguing and has potential. It will be interesting to see what AMD comes out with the next iteration.
ill wait for piledriver for
ill wait for piledriver for improvements…
I just finished reading all 3
I just finished reading all 3 reviews( Anand, Toms and PCPER) and just like Yangorang said, WTF?!
The test show some consistencies but there is still a rather big difference in attitude and benches towards the 8150FX.
I think there is a bit of Fanboy-ism being implemeneted by ANAND and TOMS ( you can see by the comments as well) review. Granted it may not be a 2600k but its gets pretty close between a i5 and the i7 so I feel that those 2 reviews excerted much more biased in their writing towards the intel chip, even when the BD came close.
There are some crazy things like the power usage, but really? Most of the people posting dont really care about their lightbill( multi gpu, plethora of fans and 1100watt PSUs) so why are people complaining that much?
I already bought me ASUS CH-V 990fx mobo yesterday, and my AMDHD5970 (2gb) so I think I will just push on through with the BD. My last build was a core 2 duo so I thing I will be good non the less.
I don’t think it is a BAD
I don’t think it is a BAD processor necessarily, but I find it hard to recommend the FX-8150 over the Core i5-2500k or even the i7-2600k if you are building a new system from scratch.
You have a 990FX Bulldozer-ready motherboard and want to get rid of that older CPU? Sure, the AMD FX will improve your system somewhat.
As I mentioned in my conclusion page, the primary issue is AMD thinks its processors are worth more money they probably are for MOST work loads.
Thanks for the review Ryan. I
Thanks for the review Ryan. I bought a Core i5-2500K and z68 mobo 2 weeks ago, and I’m not regretting my purchase one bit. We’ll have to see if that sentiment persists thru to when Ivy Bridge comes out. 😛
No wonder AMD is attempting
No wonder AMD is attempting to get pile-driver out as soon as possible, they probably knew bulldozer wasn’t going to light the world on fire. Now the question is do you wait for Trinity/pile-driver and FM2. Somehow I think most people will wait, unless they already bought the AM3+ board. Isn’t Pile-driver and FM2 by Q2 2012?
OH Dear,
Doesn’t even look as
OH Dear,
Doesn’t even look as if its worth updating from an X6.
Hope the 7000 cards are good cause AMD could be in trouble.
Ryan,
Thanks for including an
Ryan,
Thanks for including an older intel proc (the Q9650) I have a QX9650 and I’ve been looking to upgrade and was hoping to head back to AMD with this Bulldozer release. Sadly I see a i7-2600 in my future.
Thank you as always for the great review!
Thanks for reading!
Thanks for reading!
Thorough, comprehensive,
Thorough, comprehensive, objective, and very informative review. Well done.
Well this make me wish I
Well this make me wish I hadn’t already bought it, since I have an 1100t… So much anticipation, and I suppose I’m about to be let down.
I couldn’t care less where
I couldn’t care less where AMD goes from here in their lineup. I’m done waiting for their next “fast” cpu, which is only going to be a pathetic 10-15% improvement anyway. I’ll have a 2500k under my hood now, and AMD will unfortunately be in my rear view, broken down on the side of the road overheating.
What is the deal with the
What is the deal with the performance?
Doesn’t it look strange that a 2 billion transistor chip (fx-1850) is a tad slower than a 0.9 billion chip (i7 2600k) of witch 1/4 is a gpu.
There are a few major improvements, but still.
Is that just an unpopular code or a task sheduler comunication mishap? Some people speak of imprvements in windows 8.
Could that be it? That Bulldozer is a year early, and not late at all?
Indeed. It seems ridiculous
Indeed. It seems ridiculous that 2 billion transistors nets them a slightly slower chip than even their last generation.
It doesn’t make sense. I think AMD needs hyper threading bolted on to extract more performance or something. All those transistors are going wasted, or it’s just insanely inefficient.
The module design was to be
The module design was to be AMD’s take on HyperThreading but better.
Yes, Windows 8 will help some, but even in AMD’s best case scenarios we are talking 4-10% improvement there.
In reality, we are just as confused how 2 billion transistors loses to 1.16 billion transistors this regularly.
The only other redeaming
The only other redeaming ansver could be that production techniques for bulldozer wafers must be dirt cheap and fast paced. Some Interlagos 16-core processors where mentioned to be around 85w on 1.6 ghz, so in some “lucky wafer” cases it could be considered an efficient chip. Hard to fathom if that is worth anything.
But really, if software wasn’t ready for a propper Bulldozer computing scenarios, they could just have made an advanced 8-core thuban, that was based on Llano cores just with the additional L3 cache and already enlarged 1MB L1 cache per core. That shouldn’t have taken more than 1.5 billion transistors. There’s just the questione of production cost…
Of course at some point. But
Of course at some point. But as I wrote in Sierra magazine at the time.
They also make small utility things like chairs, tables, desks,
Doll houses, dog houses, car ports, bird feeders, or even a woodworking
oregon artist might be able to move on to a little massive one.
As I landed in Asia after a nearly 24-hour flight from the United States to test new
X-ray technology that can see you earn really substantial amounts.
Visit my page :: woodworking classes denver (http://mywoodworkingtips.org/)
could it be because most of
could it be because most of applications are compiled using intel compiler and therefore are optimized on that architecture?
i’ve read this before. removing something from intel’s compiler boosted via cpu to almost 50% performance gain.
kindly explain ryan. tnx.
I was wondering the same, for
I was wondering the same, for a long time Intel was purposefully limiting SSE optimization to CPUs which returned Intel’s manufacturer string. (Instead of using CPU flags as intended)
I believe Intel agreed to end this practice last year but depending on when they actually implemented released it and when affected benchmarks were released….
Unfortunately as end users may also be using software compiled with the bogus compilers the results shown may be representative (until people stop using old software)
No guys, that is not the
No guys, that is not the issue here. And anyone that says the compiler is getting 50% perf advantage is probably lying.
I would imagine a large chunk
I would imagine a large chunk of the transistor difference is from the difference in L2 cache sizes.
I have a system with a GTX580
I have a system with a GTX580 and an older Intel i5-750 processor. I can run all of the games that I’ve seen tested with almost exactly (within 4FPS(on games running 60FPS or less)) the same frame rate as systems with better processors. (Al-though, if you want SLi/Crossfire older processors like mine may not keep up)
If you are running a system just for gaming it seems to be more useful to have a beefy GPU. I think that the way the games where tested in this review are perfectly acceptable, because they show REAL WORLD gaming performance.
(I run all my games at 1080P. My i5-750 is clocked at 3.8Ghz with turbo on (air cooling)). I use an EVGA P55FTW motherboard. My GTX580 is running at stock clocks.)
Nice review, keep it up guys!
Not everyone is running 1080p
Not everyone is running 1080p monitors yet. I run at 1680×1050 and will do so for quite a while yet. So to me at least, the cpu performance in a game definitely matters. When I saw just how pathetic the 8150 did in these benchmarks I couldn’t believe it.
Ok, so for my next computer,
Ok, so for my next computer, I have all the parts except the CPU and motherboard. I was planning to go Bulldozer instead of Sandy Bridge, but now I’m wondering if that’s the best decision. All things being equal, and price not an issue, would one want to go top of the line FX or top of the line i7? All around machine; some games; some digital processing.
Then there’s the SSD issue. About a year ago I put an SSD into an HP Core i7 desktop and reinstalled windows on 64gb with everything else on a 1tb drive. I was thinking of going the same way with the new computer (larger 3rd generation SSD), but now with the new Intel chipset and motherboards capable of caching with a small SSD, that enters into the equation of deciding between Intel and AMD. Again, which would be the ‘better’ machine? For gaming? For video processing?
Perhaps a good discussion topic for This week in Computer Hardware?
Mike Ungerman
Orlando, Fl