The Future is Still Fusion
So where am I going with all this? Well, the answer is obvious and has been looking us in the face. Up and down the line, AMD is going with the APU. We just have not gotten to the point where this is applicable to every marketplace. There are also some very large hurdles still in the way.
Kaveri will be the first true incarnation of the Fusion idea. It will feature the hUMA support which allows for shared memory address support for CPU and GPU portions, as well as other tricks and features which will allow the GPU portion handle greater workloads when necessary outside of graphics. Kaveri will be hampered at the beginning due to the lack of software support. The HSA foundation still has not finalized the specification, and that is expected next year. Software coding is still in the relative dark ages and will stay that way until tools like HSAIL (HSA Intermediate Layer) are implemented. C++ and Java are on their way to natively supporting hUMA type architectures, but they again are not quite there yet. Most of the heavy lifting on the software side will appear in late 2014.
The Gigabyte F2A88X-UP4 is a very full featured board and aimed directly at enthusiasts. It also will likely be quite affordable as well.
This situation is actually quite similar to the transition to 64 bit processing. AMD released the Athlon 64 series into a marketplace without a functional 64 bit OS on the desktop side. Microsoft eventually came out with WinXP 64, but it was far from a runaway success and support for many peripherals was often quite lacking. It was not until Vista and Win7 that we had fully functioning 64 bit OS. By the time these were released, Intel had regained the lead in performance with 64 bit processing. AMD will be the first out of the gate with a fully functional hUMA APU. Intel does have a heterogeneous processing product in Knights Corner, but this is not a desktop product. In fact, Knights Corner sits in a similar position to the consumer as Itanium did back in 2004. This is not to say that Intel does not have a plan for heterogeneous computing, as we have seen their integrated graphics portion become much more serious in terms of performance and compatibility. So far Intel has not laid out plans to implement a solution like HSA/hUMA, but as with desktop 64 bit computing in the early 2000s Intel is holding their cards close to their chest.
AMD is moving away from AM3+ as the enthusiast platform, and we are seeing the first stages of this with FM2+. Just as Intel moved away from socket 1366 to socket 1156/1155/1150, AMD will be doing the same for FM2+ and the enthusiast. The first signs of this actually come from Gigabyte with their latest A88X/FM2+ announcements. These are most certainly enthusiast level products which will leverage the advantages of Kaveri over the previous Trinity/Richland parts (namely PCI-E 3.0, hUMA). The only real problem here is that GLOBALFOUNDRIES’ 28 nm process is late to market for AMD and their CPUs. Not only that, but the die shrink from 32 nm to 28 nm is not enough for AMD to implement a full GCN unit with a four module CPU (8 threads) economically and at a 100 to 125 watt TDP envelope. This is the primary reason why we will continue to see AM3+ Vishera processors sold so that AMD has a higher performance, high thread count CPU that they can offer the market. This could potentially be offset by a little clue that AMD gave some time back on a roadmap; there is the distinct possibility of releasing a 3 module Kaveri part next year well after the two module units have been on the market a while. If this 3 module part comes to market in a timely manner, it could potentially be a viable enthusiast level part in terms of thread count and performance.
The G1.Sniper A88X is the very top of the line FM2+ that Gigabyte is showing. This board is very much over the top and will be one of the first truly high end FM2+ boards available. We have yet to see if companies like Asus will release other high end boards that will compete in this space.
30% of AMD’s business is still the non-APU designs. This means that AMD has to support these platforms which require higher performance and thread counts than what upcoming APUs can handle. One of the hurdles that AMD has yet to get over is to effectively integrate APUs into a server environment. Theoretically these parts are ideal for high performance computing, but AMD has a ways to go before they have the infrastructure in place to launch the APU into this space. The building blocks are there, as evidenced by their purchase of SeaMicro. Not only does AMD have a leg up in the micro-server market, but they have some significant IP that they can leverage with larger chips. The Fabric that SeaMicro uses to communicate between nodes is high speed featuring relatively low latency. AMD could beef up the design to work with the larger server chips and provide enough bandwidth and low latency communication as to render HypreTransport obsolete. AMD could then focus all of their designs on using PCI-E 3.0 interconnects rather than have two families of CPUs/APUs which require either PCI-E 3.0 or HyperTransport.
AMD still has a lot of low level design work to do if they are planning a full scale assault on the server market with APUs. Cache coherency using the SeaMicro Fabric, balancing NUMA vs. hUMA memory support, validating new southbridge products for the server market, and a variety of other issues keep AMD from releasing new Opterons based on Steamroller/GCN APUs over the next year. Excavator looks to be the architecture that will pave the way towards a true Opteron APU that will compete against Xeons of the time.
During the next several years, core counts should not go up dramatically on the desktop and notebook market. Tablets and handhelds will follow suit and not go crazy with core counts (except MediaTek and that pesky 8 core unit they are pushing). Multi-core aware software is still not entirely common, much less applications which can utilize more than four threads. Instead, I think we will see both sides continue to dedicate die space to more graphics functionality. More CPU cores do not necessarily mean better overall performance and value in most workloads. I think Intel certainly sees that with their desktop parts as evidenced by Haswell. The majority of Intel’s CPUs in the desktop space support four threads maximum, until they get to the enthusiast level parts which will support eight to twelve threads.
HSA/hUMA has the potential to really change the landscape of personal computing (desktops, notebooks, and tablets/handhelds). Many things have to fall into place before it can really take off though. It is hard to say when that will happen and if Intel will catch up quickly.
2014 will certainly be a year of transition for AMD. By the end of 2015 I fully expect their lineup of products from top to bottom be APU based. For the higher module count parts, we have to wait for 22/20 nm process nodes to open up for AMD. This should happen in late 2014. This transition will not be seamless, nor will it be smooth. AMD has to continue to convince users that their higher end offerings for AM3+ and C32/G34 are good enough to compete with Intel, all the while working on getting the APU above the $150 price point. Eventually at 20/22 nm AMD will have a four module (or more) APU available that will satisfy enthusiasts and power users alike, but that is some time off. Until then, AMD will stave off the competition by pushing Kaveri, keeping Kabini competitive, and keeping the royalties from the semi-custom group rolling in. AMD continues to have a solid foundation of products, but they have certainly lost mindshare from some very vocal groups. While many have criticized AMD for purchasing ATI when they did, the technology and expertise from ATI is really one of the major things keeping AMD afloat at this time. The future really is fusion, and while AMD struggles with CPU performance, they are a couple steps ahead of the competition in creating a truly heterogeneous solution. The situation is eerily similar to the transition to 64 bit computing back in 2003/2004.