The desktop version of Llano is on the testing blocks and we can tell you what its strengths, and weaknesses, are – finally.
Just a couple of weeks ago we took the cover off of AMD’s Llano processor for the first time in the form of the Sabine platform: Llano’s mobile derivative. In that article we wrote in great detail about the architecture and how it performed on the stage of the notebook market – it looked very good when compared to the Intel Sandy Bridge machines we had on-hand. Battery life is one of the most important aspects of evaluating a mobile configuration with performance and features taking a back seat the majority of the time. In the world of the desktop though, that isn’t necessarily the case.
Desktop computers, even those meant for a low-cost and mainstream market, don’t find power consumption as crucial and instead focus on the features and performance of your platform almost exclusively. There are areas where power and heat are more scrutinized such as the home theater PC market and small form-factor machines but in general you need to be sure to hit a homerun with performance per dollar in this field. Coming into this article we had some serious concerns about Llano and its ability to properly address this specifically.
How did our weeks with the latest AMD Fusion APU turn out? There is a ton of information that needed to be addressed including a look at the graphics performance in comparison to Sandy Bridge, how the quad-core "Stars" x86 CPU portion stands up to modern options, how the new memory controller affects graphics performance, Dual Graphics, power consumption and even a whole new overclocking methodology. Keep reading and you’ll get all the answers you are looking for.
We spent a LOT of time in our previous Llano piece discussing the technical details of the new Llano Fusion CPU/GPU architecture and the fundamentals are essentially identical from the mobile part to the new desktop releases. Because of that, much of the information here is going to be a repeat with some minor changes in the forms of power envelopes, etc.
The platform diagram above gives us an overview of what components will make up a system built on the Llano Fusion APU design. The APU itself is made up 2 or 4 x86 CPU cores that come from the Stars family released with the Phenom / Phenom II processors. They do introduce a new Turbo Core feature that we will discuss later that is somewhat analogous to what Intel has done with its processors with Turbo Boost.
A large portion of the chip is of course the "Radeon Core Array" or the GPU-based SIMD units that will handle the graphics computing tasks and GPU-based portions of the heterogeneous software. This is a Direct X 11 class GPU though with obviously fewer stream processors at a lower frequency than we have seen in discrete cards. A new UVD (unified video decoder) is included for improved visual quality and efficiencies.
The memory controller on the APU is a dual-channel DDR3 design that has been redesigned quite a bit in order to improve performance on the combined CPU/GPU workload. On discrete graphics cards, even low-end GPUs will have access to hundreds of GB/s of bandwidth, while on the Llano design the entire chip has less than 30 GB/s for all tasks. We will go over some of the physical and architectural changes a bit later.
On the chipset side of the "Lynx" platform, which is what the desktop derivative of Llano is dubbed, AMD has two options for you, the A75 and the A55. The A75 offers SATA 6G ports and USB 3.0 support while the A55 doesn’t but will cost you a bit less. We’ll go into some more details on the chipset-specific features on the following page along with some CPU (Socket FM1) and motherboard images.
This labeled diagram of the Llano APU shows the die space given to each of these different components. The array of graphics processing units dominates the design taking up about 50% of the space; a fact that AMD likes to point out in comparison to the ~25% on Intel’s Sandy Bridge. The four x86 CPU cores don’t take up nearly as much physical space if you don’t include the hefty 4MB of L2 cache. The DDR3 memory controller is other dominant physical feature followed by the PCIe channels and display connections at the bottom of the image.
I mentioned earlier that the memory controller had gone through some changes with the Llano design in order to attempt to make up for the memory bandwidth deficiencies seen moving from a discrete controller to an integrated one. Mike Goddard of AMD, when speaking at the Llano Tech Day in Abu Dhabi, described a "Radeon Memory Bus" that allowed the GPU SIMD array to access system memory at a "very high bandwidth" and that is given priority access to system memory. The fact is that memory bandwidth is the single biggest bottleneck for integrated graphics performance on processors found in cell phones, notebooks and desktops. Graphics performance will scale nearly linearly with memory bandwidth increases and the first company to really figure this problem out will take a dramatic lead. Even with Llano, it still hasn’t happened as no matter how much "priority" is given to the GPU for memory access, you are still limited to the 29.6 GB/s that the dual-channel DDR3 memory controller can provide.
The "Fusion Compute Link" provides a way for the GPU portion of the APU access memory shared with the CPU to allow for improved performance on applications that use coherent memory. OpenCL and other GPGPU applications can benefit quite a bit from hardware that doesn’t need to spend time copying data around the APU and this internal pathway prevents that in some cases. There is no shared cache between the CPU and GPU portions of the APU though which is in contrast to the shared L3 cache on the Sandy Bridge processor from Intel.
The x86 CPU cores on the Llano APU are based on the same "Stars" architecture as the current generation of Phenom processors though with some minor tweaks to improve the IPC (instructions per clock) performance by ~6%. These are the first Stars cores built on the 32nm process technology at GLOBALFOUNDRIES so there is a bit more of a question about their performance and efficiency. The target TDPs for the mobile market are 35W and 45W while the desktop market will see at least 65W and 100W versions later in the year while the CPU frequencies will scale from 1.4 GHz to 2.9 GHz with the lower end finding its way into notebooks.
The memory controller on the Llano APU is likely the most modified portion of the design. With a maximum notebook bandwidth of only 25.6 GB/s and a max of 29.8 GB/s on the desktop designs, AMD claims that the GPU on the Llano chip still sees a 4x bandwidth increase over previous generations. Considering AMD’s previous generation was a chipset-based integrated graphics solution this statistic doesn’t sound nearly as impressive though without the reduced latency, power and smaller footprint associated with Llano which is a drastic improvement for mobile system designers. AMD claims of "discrete level graphics on a chip" do live up to the claim but without a doubt the memory bandwidth constraints of standard CPU-class memory controllers are still holding graphics technology back.
AMD Turbo Core Technology
After the first generation of Turbo Boost technology on the Intel Nehalem processors it was obvious that AMD needed to offer a similar implementation on its processors to stay current. The theory of being able to combine a multi-core processor at lower frequencies and a single-core processor at higher frequencies into a single TDP has really made the consumer’s life much better.
As we have come to see over the last few years with the changing workloads on processors, power consumption and active core count varies quite a bit based on the task the PC is focused on at the time. The above diagram that AMD created gives us a general of view of how web, productivity, 3D creation and video creation workloads affect the active CPU count. You can see for the web and productivity scenarios all four cores are used less than a few percentage of the time and even two cores are used at most 20% of the time. When we get into 3D and video production though the capability of software to take advantage of multiple cores expands and 3-4 cores are used nearly 50% of the time during video creation.
With this power consumption and core utilization information it is easy to see then why finding a way to take advantage of the TDP headroom is so essential to designing the most efficient processor.
AMD’s method to monitor and take advantage of this headroom is different than the analog method that Intel has integrated on its processors. AMD Turbo Core actually digitally measures the activity of the CPU to estimate power consumption / TDP being used on a per core basis with integrated power monitoring logic and then passes that information to the APU north bridge. The NB sums all the power and TDP information and passes it to a third P-state manger logic portion that dithers clock speed in order to stay within the pre-determined TDP of the APU.
AMD’s version of Turbo differs from Intel’s by being a digitally measured activity source that then has very specific power steppings. The Turbo Mode on Llano will thus be much more reliable and consistent processor to processor than Intel’s Turbo Boost Technology that relies on analog measurements and even ambient temperature that will vary from system to system and chip to chip. As a reviewer, the consistency is nice but there are definitely advantages from Intel’s stance that allows each piece of silicon to theoretically meet its own peak performance.