The long awaited Bulldozer cores are finally here for consumers but are they worth the wait?
Bulldozer. Since its initial unveiling and placement on the roadmap many have called the Bulldozer architecture the savior of AMD, the processor that would finally turn the tide back against Intel and its dominance in the performance desktop market. After quite literally YEARS of waiting we have finally gotten our hands on the Bulldozer processors, now called the AMD FX series of CPUs, and can report on our performance and benchmarking of the platform.
With all of the leaks surrounding the FX processor launch you might be surprised by quite a bit of our findings – both on the positive and the negative side of things. With all of the news in the past weeks about Bulldozer, now we can finally give you the REAL information.
- Bulldozer First Release and the State of 32nm AMD Parts
- AMD Bulldozer Processor hits 8.429 GHz – New World Record!
- AMD Bulldozer FX Processor Benchmarks Leaked
Before we dive right into the performance part of our story I think it is important to revisit the Bulldozer architecture and describe what makes it different than the Phenom II architecutre as well as Intel’s Sandy Bridge design. Josh wrote up a great look at the architecture earlier in the year with information that is still 100% pertinent and we recount much of that writing here. If you are comfortable with the architeture design points, then feel free to skip ahead to the sections you are more interested in – but I recommend highly you give the data below a look first.
The below text was taken from Bulldozer at ISSCC 2011 – The Future of AMD Processors.
Bulldozer Architecture Revisited
Bulldozer brings very little from the previous generation of CPUs, except perhaps the experience of the engineers working on these designs. Since the original Athlon, the basic floor plan of the CPU architecture AMD has used is relatively unchanged. Certainly there were significant changes throughout the years to keep up in performance, but the 10,000 foot view of the actual decode, integer, and floating point units were very similar throughout the years. TLB’s increasing in size, more instructions in flight, etc. were all tweaked and improved upon. Aspects such as larger L2 caches, integrated memory controllers, and the addition of a shared L3 cache have all brought improvements to the architecture. But the overall data flow is very similar to that of the original Athlon introduced 14 years ago.
As covered in our previous article about Bulldozer, it is a modular design which will come in several flavors depending on the market it is addressing. The basic building block of the Bulldozer core is a 213 million transistor unit which features 2 MB of L2 cache. This block contains the fetch and decode unit, two integer execution units, a shared 2 x 128 bit floating point/SIMD unit, L1 data and instruction caches, and a large shared L2 unit. All of this is manufactured on GLOBALFOUNDRIES’ 32nm, 11 metal layer SOI process. This entire unit, plus 2 MB of L2 cache, is contained in approximately 30.9 mm squared of die space.
Continue reading our review of the AMD FX Processor (codenamed Bulldozer)!!
It is well known that Bulldozer embraces the idea of “CMT”, or chip multi-threading. While Intel supports SMT on their processors, it is not the most efficient way of doing things. SMT sends two threads to the same execution unit, in an attempt to maximize the work being done by that unit. Essentially fewer cycles are wasted waiting for new instructions or resultant data. AMD instead chose to implement multi-threading in a different way. For example, a Bulldozer core comprised of four modules will have eight integer execution units, and four shared 2 x 128 bit floating point/SIMD units. This allows the OS to see the chip as an eight core unit.
CMT maximizes die space and threading performance seemingly much better than SMT (it scales around 1.8x that of a single core, as compared to 1.3x that using SMT), and CMP (chip multi-processor- each core may not be entirely utilized, and the die cost of replicating entire cores is much higher than in CMP). This balance of performance and die savings is the hallmark of the Bulldozer architecture. AMD has gone through and determined what structures can be shared, and what structures need to be replicated in each module. CMT apparently only increases overall die space by around 5% in a four module unit.
A closer look at the units reveals some nice details. Note the dual MMX (SIMD-Integer) units in the FP/SIMD block. A lot of work has been done on the front end to adequately feed the three execution units.
Gone is the three pipeline integer unit of the Athlon. Bulldozer uses a new four pipeline design which further divides the workloads being asked of it. These include multiply, divide, and two address generation units. Each integer unit is fed by its own integer scheduler. The decode unit which feeds the integer units and the float unit has also been significantly beefed up. And it had to be. It is now feeding a lot more data to more execution units than ever before. The original Athlon had a decode unit comprised of 3 complex decoders. The new design now features a 4 decode unit, but we are unsure so far how the workload is managed. For example, the Core 2 had a 4 decode unit, three of which were simple decode, and the fourth was a complex. My gut feeling here is that we are probably looking at three decoders which can handle 80 to 90% of the standard instructions, while the fourth will handle the more complex instructions which would need to be converted to more than one macro-op. While this sounds familiar to the Core 2 architecture, it does not necessarily mean the same thing. It all depends on the complexity of the macro-ops being sent to the execution units, and how those are handled.
The floating point unit is also much more robust than it used to be. The Phenom had a single 128 bit unit per core, and Bulldozer now has it as 2 x 128 bit units. It can combine those units when running AVX and act as a single 256 bit unit. There are some performance limitations there as compared to the Intel CPUs which support AVX, and in those cases Intel should be faster. However, AVX is still very new, and very unsupported. AMD will have an advantage here over Intel when running SSE based code. It can perform 2 x 128 bit operations, or up to 4 x 64 bit operations. Intel on the other hand looks to only support 1 x 128 bit operation and 2 x 64 bit operations. The unit officially supports SSE3, SSE 4.1, SSE 4.2, AVX, and AES. It also supports advanced multiply-add/accumulate operations, something that has not been present in previous generations of CPUs.
In terms of overall performance, a Bulldozer based core should be able to outperform a similarly clocked Intel processor featuring the same number of threads when being fully utilized. Unfortunately for AMD, very few workloads will max out a modern multi-core processor. Intel should have a slight advantage in single threaded/lightly threaded applications. AMD does look to offset that advantage by offering higher clocked processors positioned against the slower clocked Intel units. This could mean that a quad core i7 running at 3.2 GHz would be the price basis for a 4 module Bulldozer running at 3.5 GHz.
Exact specifications have not been released for the individual parts, but we can infer a few things here. First off is the fact that it appears as though each core will utilize 2 MB of L2 cache. This is quite a bit of cache, especially considering that the current Phenom II processors feature 512 KB of L2 cache per core. Something that has allowed this to happen is buried in GLOBALFOUNDRIES 32 nm SOI process. They were apparently able to get the SRAM cell size down significantly from that of the previous 45 nm process, and allow it to also clock quite a bit higher. This should allow more headroom for the individual cores. With the shrink, we should also expect to see at least 8 MB of shared L3 cache, with the ability to potentially clock higher than the 2 GHz we see the current L3 caches running at.
I’m more interested in the
I’m more interested in the performance with a virtual environment. How does it perform with VM, VirtualBox, Hyper-Visor, etc? From a cost point view will it handle my needs or should I spend the extra $$$$ for intel? Thanks
1st, thanks for testing and
1st, thanks for testing and showing a Core2Quad in your review… many people still have the Core2Duo/Quads as they pretty much put Intel on the map again a few years ago and are still to this day very good CPU’s.
I have a Q9550 @ 4Ghz on hair and its perfect.
2nd, but disappointed with the gaming benchmarks and reviewing.
Because you used very few games and you also used only 1x video card.
What the results show is a GPU limitation and are not really testing the CPU.
This kind of testing only shows 1 thing which is pretty damn obvious, that at high resolutions and settings in games even a single GTX580 is limited, the CPU is idling.
These tests do not show the strengths and weaknesses of a CPU as the CPU is not working hard at all (gpu limit).
You either need to lower resolution to show how well the various games use the cores and respond to different CPU’s or use SLI/Crossfire cards/setups which DO often put A LOT more stress on the CPU and separate the sheep from the lions 🙂
Please do SLI or Crossfire testing and lets see how this CPU holds up!
Have you used the supplied
Have you used the supplied ASUS motherboard that was supplied as part of a kit from AMD? If so there might be some issues related to the MB. Post below.
There is some information that the Asus crosshair is not performing as well. Two sites used other motherboards, AsRock as well as Gigabyte Motherboards, and showed much different picture of performance.
I would really like to see a verification from my trusted site.
I guess I’ll ask around, but
I guess I’ll ask around, but I am about 99.99% sure that the motherboard isn’t making a big difference here. If the large majority of sites saw the same results and none of us thought anything fishy was going on, chance are it wasn’t.
But like I said, I can test another board from MSI or Gigabyte after the weekend when I return home.
I belive that AMD might be
I belive that AMD might be holding out a little here.
Think about it long term, AMD (unless I am misinformed) Have stopped production on everything that doesnt use the new bulldozer design. I think they did it a while ago.
Now they have these Bulldozers comming in equal to the Phenom 2 x6’s. Piledriver is due out Q1 next year, I’m thinking that either-
Bulldozer is ment to replace all current AMD chips, This brings all AMD users upto the same platform (AM3+) And all there factories can focus on streamlining the manufacturing of these new FX models. And then Piledriver will come in, replacing the FX8150 as the flagship and be so far up intels smoke pipe, that they sit there and think WTF just happend here.
AMD scrapped all previouse AM3/AM2+/AM2 munufacturing stated making these, then relized sommin was up and they were not performing, so to buy themselfs some time, they release these (witch arnt bad, there not great but not bad either)And are now working there asses off perfecting it with piledriver, letting intel snigger for the moment, As AMD Have another Athlon 64 up there sleave they just need to fix the kinks.
Or option 3
AMD cpu devision is now run by trained chimps and AMD cpu devision is about to sink.
AMD realise the real
AMD realise the real sustainable money is in servers. Everything about the Bulldozer points to AMD migrating slowly from client to server. You really think AMD’s plan to bring out a great CPU for gamers was to go for an 8 core model when games just aren’t that well threaded and that’s likely to be the case for years to come?
The marketing spin from AMD is transparent to anyone that knows tech. They’re trying to sell you A CPU that’s transitioning over to being a full on server design. Massively threaded, just what the server world wants.
I think they have a good few years yet of trying to squeeze every last bit of profit out of the value market, the gamers and the enthusiasts but their plan appears to be simple. Slowly increase the clock speed of Bulldozer over the next x years and make a real play for the high end server market.
Think about it this way, AMD is a small company compared to Intel. It doesn’t have the resources to develop CPU’s that will win big in all the different markets CPU’s play in. So why not try and sell server CPU’s to the clueless, use the bad parts to sell to the value market and with all real resources focused on making the best server CPU’s. They can beat Intel on price and Intel can do nothing but lower prices to compete, something they’ve never wanted to do in the server space.
It’s interesting watching this play out. All this nonsense about compilers and Windows 8 unleashing the true power of Bulldozer. The real story for me here is how AMD has managed to convince at least two markets that its making CPU’s for them with just a little spin when in reality its passing off its R&D to gamers and enthusiasts (AMD knew it would take several years to build up to really fast server class CPU’s that are massively parallel, why not sell that research along the way as Bulldozer FX-8150, the 8 core super CPU for a new generation!) Finally using its failures at the factory to supply the value market with a few cores, they don’t need more and its pretty much free money to keep the server machine fueled.
It’s pure genius really if you stop and look at the big picture. Pretty crappy for the AMD fans that have supported then all these years but maybe the moral there is, don’t think of huge corporations as your best friend, heh.
these guy gained some!
asus board are am3 and fx are am3+
also cpugpu against cpu(i72600k vs fx-8150)
come on if there are not cpu then all website shouldnt do benchmark gees not compared a cpugpu vs a cpu.it doesnt make sense.
i cant enumerate all the stuff but sufice it say that most of the banchmark on the web are bogus,hardwareheaven didnt use the amd kit but they still compared it to the i7 2600k lol so in the end even if they got better number it is still useless data .
i sure hope website compare happle with apple gees like the i7 960 to 990 serie they are cpu no cpugpu!
I suspect its a mere problem
I suspect its a mere problem of the software having to catch up with the hardware. It took quite some time before the AMD64 even had 64-bit software to run, and initial tests had 32-bit equivalents spanking the 64-bit systems.
Bad hardware, no just bad programming.
AMD went out on a limb with a completely new architecture. intel is just squeezing what’s left out of core2.
The way I look at it is this.
The way I look at it is this. Soon your pc will be gone and you will be running your monitors off of thin clients. So if amd can beef up their cpu’s to run several thin clients even in a gaming way, then they will be way ahead and everyone will be looking back at them thinking, wow amd was right on the money with switching to 8+ core CPUs especially since most games get their speed from gpus anyway these days. Dunno just a thought.
I can just imagine everyone having a main server in their house. I am already in the process of setting that up as we speak. Still in the planning stage, but I think it only makes sense, outside of my gaming rig that is. Just need to figure out a few details. But I am thinking I may use the bulldozer as the CPU in the server unless something else comes out that’s better by then. My house is already hardwired with cat5 in everyroom so it makes sense to me unless anyone else has a better suggestion.
Its just a matter of time
Its just a matter of time when AMD will regain the king of the hill where Intel has already been since C2D. But will the water turns its tide if Intel has already washed every shore of opportunity with their vast amount of resources. Let’s face it, even though this chip seems to be a failure, it had opened up a whole new thing on the computer world. Multi-threading is the thing of the past, “Multi-Core” functionality is soon to rise. Let us be thankful that a company such as AMD has the guts to restructure the processor, that we can see new insight coming out of it. Bulldozer may not compete with the SB i5 and i7 but it will give software developers specially Microsoft the idea to utilize those monstrous 8 core chip for a better performing computer. Remember “two is better than one”, time will come when computers will recognize that 1 is not 2, more sensible.
But some thing we need to
But some thing we need to keep in mind 8 core will be only fully support by windows 8 right now amd and microsoft working hand in hand to place a patch for at least bost the 8 core in windows 7
There’s a patch for all
There’s a patch for all Windows out already its called Linux!
Well I can say one thing
Well I can say one thing after looking at how many programs are compiled, most are optimized with Intel’s instrution set and not amd’s AMD has it’s own set of CPU instruction for the FX chip and as of yet no… programs or benchmarks writen with them compiled. with the ne MSVC 2011 the AMD instrutions will be avalible for DEV’s but will take some time to get on the market.
Need to correct myself
Need to correct myself windows 8 preview does have some of the AMD instrution precompiled.
people bash on AMD to much,
people bash on AMD to much, it really isn’t that bad, its mainly for multi-purpose for doing many things all at once on one computer. AMD is so much better then intel in that section,
understand that intel has a set standard so its easy to work with. You can overclock but its not really meant for that,
Amd is meant for overclocking, i don’t know a single AMD product that’s not overclocked, and what I’ve notice when I’ve done test with an AMD product is that the more i have on my screen the fast it gets. AMD is a product that needs to be worked, while intel has that set standard, and my test with intel is that the more i have and do on my screen the slower it gets, but i do considered intel to have the advantage cause people want that set standard because if intel is working at 80% it’ll stay there while AMD will be at 70% and needs to be worked to get there so if u want AMD to work faster open up a lot of pages and start working it
if your looking for just gaming intel is the way to go, but if your cool and do a bunch of other stuff AMD is for you
and i would recommend gtx graphics cards there best in my opinion but i haven’t worked with AMD graphics cards so i cant give a comment on that its just what i us
I want new PC for
I want new PC for animation, graphics designing purpose. I am not that technically sound. Someone suggested me FX 8150. Can u help me?
Just to throw in a comment
Just to throw in a comment that is a bit special-case, but certainly matters to me. I’ve been writing a 3D game/simulation engine for a while now, and all of a sudden I notice my linux computer (with FX8150) was much faster than my windoze computer (with slightly older 4-core phenom2 CPU at same clock speed).
When I tracked down the reason, it was because the older phenom2 cannot execute the 256-bit AVX/FMA instructions. I have 32-bit and 64-bit versions of key SIMD assembly-language routines, and the 256-bit AVC/FMA versions are almost twice as fast! Since they are fairly key routines, this one advantage of the FX8150 (AVX/FMA), makes a huge difference to me!
I just bought a new motherboard and another FX8150 for my windoze computer, so it is on a level playing field with my linux system.
PS: From my perspective, 64-bit SIMD with 16 ymm registers and AVX/FMA instructions is a BIG deal. True, many people couldn’t care less, and many application that could benefit – haven’t been rewritten to take advantage of these new instructions.
Oh, and BTW, the speed comparison on these routines between my assembly-language routines and compiled C code with optimization turned up to maximum is hilarious — as in 6 to 12 times faster!
Comparisons are invalid
Comparisons are invalid unless you use 1866 memory with the 8150. The 1090 does not support 1866. Why would you dumb down for a comparison when you could show the 8150 with 1866 vs a 1090 with 1333… why not show the best they both can do?
A major advantage of the 8150 is the ability to run faster memory.