Improving Upon and Surpasing the Cortex-A57
ARM is not disclosing every single improvement and optimization, but they have provided us with a few examples. The first portion to get massaged is what we consider the Front-end.
Branch prediction is often one of the first things looked at when figuring out a new design. It is rather important as it cuts down on mis-predictions and pipeline stalls and flushes. Other improvements again increase overall performance and decrease the time that it waits for data. Perhaps, with aggressive power gating, some of these functions can go into a low power state much sooner. ARM has not provided us with nearly that granularity of information.
The next part after what is typically called the front-end is the decode/rename portion. ARM is a RISC based processor that can more efficiently convert instructions into macro and micro ops when compared to Intel’s x86 processors (CISC vs. RISC). ARM has improved their SIMD/FP units, to help push it into the server market as well as improve potential performance in even mobile applications, which can lean heavily on floating point/SIMD instructions (image editing, effects, etc.).
Dispatch-Retire does multiple things. Dispatch allows out-of-order operations so as to optimize usage of execution units. Retire then “graduates” instructions.
ARM has not explained to us how they intend to compete with advanced functionality like Intel’s AVX instructions, but they are not standing still in that department. Richer content on mobile devices will inevitably leverage more floating point and SIMD capabilities (such as the aforementioned image manipulation). To keep ahead of the curve, ARM has improved their FP/SIMD units to handle common instructions much faster than before. Not only have they improved performance, but they have yet again improved power efficiency.
Caches are typically quite power hungry in a high performance design. ARM has again gone over it with a fine toothed comb to improve latencies, performance, and power efficiency through a multitude of methods. Improving L1 and L2 performance of course directly improves the efficiency of the execution units by providing them with necessary data and instructions effectively.
The Accelerator Coherence Port is a high speed interface that communicates directly with the optional hardware accelerators that ARM offers. These DSPs and accelerators need coherency, otherwise executing instructions on these parts would likely require direct reads and writes to main memory versus accessing the relevant data in cache or, in a worst case scenario, having a separate memory address space that requires more read/writes than would be desired.
Summing it Up
The Cortex-A72 has all of the hallmarks of being a disruptive product in a competitive field. We still will not count Intel out, or the likes of Apple, Qualcomm, and NVIDIA with their home-grown products, but the A72 looks to be a very large jump in efficiency and performance from everything else that has come before it. It looks to leapfrog the previous A57 by a significant amount, and simply blows the doors off of the previous A15- and A17-based parts.
The combination of a massive redesign combined with the latest process technologies from TSMC and Samsung/GLOBALFOUNDRIES make this particular part look very inviting. Improvements in the ISA as well as FP/SIMD units should allow this particular part to compete in some server spaces. We still do not know how that will shake out as Intel has a near stranglehold in that environment. Power efficiency has not been far from Intel’s plans, if ever since the Pentium 4 days, so ARM has a significant hill to climb in terms of corporate mindshare. If ARM has anything really going for them, it is the huge amount of installed devices in the mobile marketplace that will help with software development and acceptance.
(Editor's Note: ARM did make note to us during the tech day about the in-roads they are making in the server and corporate environments. While it is true that ARM-based servers indeed have a significant hill to ascend, there are customers using and implementing ARM hardware in production today. The biggest example was PayPal, taking advantage of HP Moonshot platforms to accelerate real-time data analytics at a 9x lower acquisition cost, 2x lower power consumption and a 7x higher rack/node density.)
ARM is also aiming at getting these parts out to market far sooner than they have in the past. Several years ago, ARM would announce a product, but we would not see it in the marketplace for at least a year and a half. They have cut down that time in recent releases, but the A72 looks to be incredibly important to them and they are accelerating the time to market for this part dramatically. Having said that, do not expect to see this product launched at Computex, or in Q3, or Q4 of this year. ARM still relies on its partners and licensees to take the design, validate it, and schedule foundry space to produce the part.
The Cortex-A72 could be a serious game changer for ARM and its partners. I have not swallowed the kool-aid but, if their claims are close to being true, then we will have very high performance parts that span from the 500 milliwatt range to 15 to 35 watts. So far there is no plan to go above that and hit the desktop scene, but high density servers using the higher wattage ARM parts could exhibit excellent performance and power efficiency for those applications.
Even if ARM does not make a significant impact in the server market with A72, the improvements in performance and efficiency will allow their mobile partners to better choose between those characteristics when designing a phone. Do you want a phone that performs similarly to a Cortex-A15 while improving battery life by 30% to 50%? They can do that. Keep battery life the same yet improve performance by a factor of 3X? This design can do that as well. ARM has a very compelling product with the A72. Partners can improve their power position as well, by implementing a big.LITTLE design when using the Cortex-A53 cores. There is a lot of potential flexibility here for ARM and its partners, but now we have to wait and see if it fulfills the promise.