ARM Releases Top Cortex Design to Partners
The Cortex-A72 and Mali-T880 headline ARM’s latest release
ARM has an interesting history of releasing products. The company was once in the shadowy background of the CPU world, but with the explosion of mobile devices and its relevance in that market, ARM has had to adjust how it approaches the public with their technologies. For years ARM has announced products and technology, only to see it ship one to two years down the line. It seems that with the increased competition in the marketplace from Apple, Intel, NVIDIA, and Qualcomm ARM is now pushing to license out its new IP in a way that will enable their partners to achieve a faster time to market.
The big news this time is the introduction of the Cortex A72. This is a brand new design that will be based on the ARMv8-A instruction set. This is a 64 bit capable processor that is also backwards compatible with 32 bit applications programmed for ARMv7 based processors. ARM does not go into great detail about the product other than it is significantly faster than the previous Cortex-A15 and Cortex-A57.
The previous Cortex-A15 processors were announced several years back and made their first introduction in late 2013/early 2014. These were still 32 bit processors and while they had good performance for the time, they did not stack up well against the latest A8 SOCs from Apple. The A53 and A57 designs were also announced around two years ago. These are the first 64 bit designs from ARM and were meant to compete with the latest custom designs from Apple and Qualcomm’s upcoming 64 bit part. We are only now just seeing these parts make it into production, and even Qualcomm has licensed the A53 and A57 designs to insure a faster time to market for this latest batch of next-generation mobile devices.
We can look back over the past five years and see that ARM is moving forward in announcing their parts and then having their partners ship them within a much shorter timespan than we were used to seeing. ARM is hoping to accelerate the introduction of its new parts within the next year.
The latest Cortex-A72 design is part of a portfolio of technologies that will be based around TSMC’s 16 nm FinFET process. It looks to be actually using the 16 nm FF+ process, which is an optimization in performance from the initial 16 nm FF release that is in risk production now. The A15 designs were primarily 28 nm HKMG and the latest A53 and A57 products are based on planar 20 nm HKMG. 20 nm was not as big of a jump in efficiency and performance from 28 nm, but it did see improvements. 16 nm FF+ is going to be a much larger jump over both of the others. The combination of design and advanced process manufacturing allows ARM to claim that the A72 is going to be around 3.5 times the sustained performance of the A15 at the same power draw. It also allows ARM and its partners to achieve the same level of performance at a 75% savings in power. Obviously there is flexibility in the design that can be leveraged by ARM’s partners depending on the usage scenario they are designing for.
The A72 can be used with the A53 in a big.LITTLE configuration to further improve power efficiency over a variety of workloads. The A53 will also be enhanced by the use of the 16 nm FF+ process technology to provide even lower power consumption (or high performance). Part of this flexibility is provided by the new and improved “Northbridge” that ARM has designed to combine all of these pieces of IP. The CoreLink CCI-500 is a new design that succeeds the CCI-400 that is found in products such as the Exynos 5433. It is reportedly significantly faster than the CCI-400 and allows greater memory bandwidth to be delivered to the CPUs and GPU. CCI stands for “CoreLink Cache Coherent-Interlink” for those interested.
Graphics also get a boost with the new Mali-T880 graphics processor. Naturally, ARM was pretty tight lipped about actual specifications for this part. It shares the same 16 nm FF+ process technology as the CPU and CCI-500 portion (it is a SoC, afterall). ARM claims that has 1.8 times the performance of the previous Mali-T760 devices that made their appearance in 2014. They also go on to claim that it will reduce power consumption by 40% on similar workloads. Partners will of course choose if they want faster or more power efficient processors for their particular products.
ARM also features two extra components that can be added to a SoC. The Mali-V550 and Mali-DP550 controllers are video and display processors respectively. The V550 looks to be a dedicated video accelerator that can handle up to 4K playback. It will also utilize the ARM TrustZone secure video path for that particular playback, which in all likelyhood means that it provides easy content protection for manufacturers utilizing these parts in their devices. We also expect to see handheld and tablet sized screens reach 4K in the 2016 time period. The DP-550 looks to be taking care of communicating with the screen.
As I had mentioned earlier, ARM is doing its best to improve time to market with its products. Several years back ARM introduced the POP (Processor Optimization Pack) with the Cortex A9 series and above. The push here is to provide very specific designs and rules to allow partners to choose any level of power and performance that they deem necessary for their products. This improves time to market by taking a lot of the guesswork and custom engineering to get a decent product out the door. This design initiative will allow the Cortex-A72 to clock up to 2.5 GHz on TSMC’s 16 nm process. Eventually ARM may work with other pure-play foundries to implement ARM designs on those particular processes, but for now TSMC’s 16 nm FF+ is the go-to process node for this generation of products.
When we step back and look at what ARM is doing, it actually takes a step back away from the idea of cloud computing. ARM is attempting to move MORE of the workload back onto the device rather than having servers stored in vast data centers providing the compute power to handle complex programs and workloads. This is not a bad idea, as we still can have instances of shoddy connectivity in our so-called modern world. It also gives ARM a reason to keep providing faster products! It is really interesting to see the rise and fall of centralized computing resources (mainframes, PCs, mobile computing, cloud computing, etc.).
ARM continues to push forward with evolutionary designs that can be licensed by 3rd party manufacturers around the world. The A72 should be very competitive with the custom designs of Apple, NVIDIA, and Qualcomm. While those others will likely have faster custom implementations, none will have the flexibility and widespread use of the stock Cortex-A72. Even erstwhile competitors NVIDIA and Qualcomm have released recent SoCs which utilize the licensed A53/A57 cores. It is quite likely that in the future we will see these manufacturers yet again license these cores while waiting for their custom parts to come to market. So far ARM has 10 companies licensing the new core and IP. Competition is good, and ARM likes to try to stay a step ahead of the behemoth Intel. Expect 2016 to be as interesting as 2014 was and 2015 is turning out to be.
Yes, but it’s going to be a
Yes, but it’s going to be a few months before the technical details are released. There are already people claiming that it well be as powerful as the Apple custom Cyclone/Cyclone 2 core, but unless ARM added some extra execution ports, it’s still up in the air if Arm Holdings has made a reference design core that can best Cyclone’s 6 IPC(per core), as the current ARM Holdings reference design A57/A53 can only do 3 IPC(per core). That coherent BUS/Fabric on the DIE can allow up to 16 cores with the CCI-500 interconnect family, along with finer grained dynamic core scaling, and 2 cores could be running, while the other 14 could be idled/gated off, for the new ARM offerings. Man a 16 core tablet, or chromebook SKU would be nice, but I’m waiting on the PowerVR wizard with the hardware ray tracing to get some design wins for Imagination Technologies.
If ARM Holdings wants to stay in the running against the custom offerings of its top tier architectural licensees, it is going to have to get those per core IPCs up to at least 6, or they will be at a disadvantage. Arm Holdings new Mali-T880 GPU family, will increase the shader clusters to eight in the T880. Imagination Technologies’ PowerVR series is making the other mobile GPU players remain on an aggressive improvement schedule, or risk falling behind, Imagination Technology’s(IT) MIPS based 64 bit processors will be targeted to compete, and some of IT’s IP also includes SMT as an option for their licensees, so ARM will have to be looking at SMT also. Well at least ARM holdings gets royalties, be the chips Cyclone, Denver, or other custom microarchitecture, as well as the ARM holdings reference designs.
The jury is still out on the Power8, third party licensees products, be they using the power8 reference design, in the server arena, or cutting down the Power8 core count to derive products for the PC/Laptop market, the Power8 is the heavy hitter of the RISC bunch with 10+ IPC per core, and 8 dynamically scalable processor threads per core. Any aspirations for both ARM players, and the MIPS players in the PC/Laptop market may be dashed once the Power8 licensees begin to derive scaled down SKUs targeting markets other than servers. At least ARM holdings, and IT still have their GPU IP to add to the equation. CPU designs are scalable, in the same manner that GPU designs can be scaled from discrete to integrated, and the Power8 design being RISC, is going to be easy to scale down, by its many third party licensees. The power8’s are a beast in the server room scaled to beat the Xeon, it’s just a perfect example of the wide range of markets, that RISC designs are able to perform in.
^^^ Eyes bleeding.
^^^ Eyes bleeding.