28HPCU: Cost Effective and Power Efficient
ARM and UMC have announced that latest 28HPCU available for A53/A7 designs
Have you ever been approached about something and upon first hearing about it, the opportunity just did not seem very exciting? Then upon digging into things, it became much more interesting? This happened to me with this announcement. At first blush, who really cares that ARM is partnering with UMC at 28 nm? Well, once I was able to chat with the people at ARM, it is much more interesting than initially expected.
The new hotness in fabrication is the latest 14 nm and 16 nm processes from Samsung/GF and TSMC respectively. It has been a good 4+ years since we last had a new process node that actually performed as expected. The planar 22/20 nm products just were not entirely suitable for mass production. Apple was one of the few to actually develop a part for TSMC’s 20 nm process that actually sold in the millions. The main problem was a lack of power and speed scaling as compared to 28 nm processes. Planar was a bad choice, but the development of FinFET technologies hadn’t been implemented in time for it to show up at this time by 3rd party manufacturers.
There is a problem with the latest process generations, though. They are new, expensive, and are production constrained. Also, they may not be entirely appropriate for the applications that are being developed. There are several strengths with 28 nm as compared. These are mature processes with an excess of line space. The major fabs are offering very competitive pricing structures for 28 nm as they see space being cleared up on the lines with higher end SOCs, GPUs, and assorted ASICs migrating to the new process nodes.
TSMC has typically been on the forefront of R&D with advanced nodes. UMC is not as aggressive with their development, but they tend to let others do some of the heavy lifting and then integrate the new nodes when it fits their pricing and business models. TSMC is on their third generation of 28 nm. UMC is on their second, but that generation encompasses many of the advanced features of TSMC’s 3rd generation so it is actually quite competitive.
ARM has ported over their 28 nm Cortex A53 and Cortex A7 designs for UMC. These designs are also supported by ARM’s POP program. This POP program is a design assistance technology that aims to shave 4 to 5 months off of a design using those components. Essentially ARM has a design library program that the customers can leverage that does a lot of the low end implementation of design specifications. The customer then receives that design and implements more “hardening” work to further optimize the device for the implementation they are aiming for. This does have a price, but when a product can be brought to market that much quicker, it is worth it. Just licensing the ARM components and then implementing them in a design requires a significant team to do that. Having ARM do some of the initial design work through POP allows more players without as many resources to be competitive in the ARM SOC marketplace. For example, an A53 with design direct from POP can achieve around 200mv per GHz in the 1.7 to 2.0 GHz range. Further hardening and development will improve these numbers.
Material improvements are not the only thing that has been brought to the table with the latest 28HPCU process from UMC. Design rules and EDA software has also advanced over the years. EDA software suites take time to be fully developed for individual processes, and of course the advantage with this latest 28 nm node is that the software is fully caught up with the physical characteristics of 28HPCU.
Improvements that were created for the latest 14/16nm FinFET nodes have migrated to 28 nm products. While these 28 nm nodes are still planar and the transistor designs cannot be ported over, the layout strategies and design decisions that were developed for the newer nodes can be utilized in the older 28 nm products. This means better electrical characteristics as well as a denser packing of transistors.
The A53 and A7 parts are obviously budget products that still offer good performance for their price/complexity. These will be powering midrange cellphones (A53) and wearables (A7). By porting these particular products over to a highly optimized, last generation process node we can achieve very good performance, battery life, and cost for a wide range of products that will enhance the user experience.
Some will question why these SOC companies would utilize this older, but still solid node as compared to 14/16FF. As mentioned above price and availability is a big factor. These products will not be considered budget after spending that much money per good die. Also, there are size constraints at play here. A two or four core Cortex A7 SOC might actually be too small on 14/16 and the die will not have enough space for all the necessary pins. The economics of a midrange and budget SOC just do not fit into the constraints of cutting edge processes. For the short term we will only see the larger, more powerful SOCs that carry a high end price utilizing these next gene processes.
So what about something in between? GLOBALFOUNDRIES has recently offered their latest 22FDX product to market. This is a FDSOI based process node that improves upon die size and gives performance close to, or even better than, FinFET at that size. This improves top end performance and power consumption as compared to planar. It also has the added benefit due to back-biasing of having extremely low power requirements at low power/off states. Leakage is at a minimum with this particular design and it exhibits all of the advantages of FinFET without having the design and manufacturing complexities of those advanced structures. Sounds perfect, right? Unfortunately GLOBALFOUNDRIES is the only group offering this product, and line space and pricing is going to be suboptimal for most midrange and budget SOCs. Where it could really take off is high end wearables which will benefit from the higher performance and lower power consumption (better battery life) that 22FDX can provide vs. 28HPCU. FD-SOI has been receiving more attention lately with Samsung and STMicro offering 28 nm variants based on the tech. I’m sure we will start hearing about the efficacy of these nodes in the near future as compared to traditional HKMG based lines.
While the ARM/UMC 28HPCU announcement is not all that exciting on the surface, we see that these larger process nodes are still very, very important for the market. Improvements to the base 28nm HKMG process throughout these past four to five years have extended its lifespan and usefulness. The overall slowing of advanced process technologies has also contributed to the longer than planned lifespan of these mainstream processes. ARM and UMC collaborating on this process and the A53 and A7 designs insures that a cost effective avenue is available to IP customers that will provide mature yields and predictable bins. End users will benefit with more capable midrange and budget cellphone offerings as well as affordable wearables that provide all day battery life.
” A two or four core Cortex
” A two or four core Cortex A7 SOC might actually be too small on 14/16 and the die will not have enough space for all the necessary pins.”
Mind=blown. That’s absolutely crazy to consider.
Thanks, Josh Walrus
I don’t know how big an A7
I don’t know how big an A7 core is, but the sizes are getting ridiculously small, even for powerful Intel CPUs. Broadwell at 14 nm seems to only be around 10 square mm with a 2 MB L3 cache slice which is part of why you have few CPUs without integrated graphics. Even a 4 core without any GPU would only be about 60 square mm.
A7 is tiny. The core itself
A7 is tiny. The core itself at 28 nm is ludicrously small. Can't remember offhand, but under 10 mm sq. I think.
The 10 square mm figure is
The 10 square mm figure is for a large 14 nm core like Broadwell. The A7 is a super simple core by comparison.
A quick search turns up an old Anandtech article titled “arms-cortex-a7-bringing-cheaper-dualcore-more-power-efficient-highend-devices” (not sure if full links are allowed).
The article says “ARM claims a single Cortex A7 core will measure only 0.5mm2 on a 28nm process.” So the broadwell core at 14 nm is still about 20 times larger than the A7 if those numbers are correct. An A7 will always be part of a larger SOC, but for simpler devices, they could easily be pad limited. Going with a more advanced, expensive process would not be worth it.
Not “pad limited” with the
Not “pad limited” with the (POP)Package On Package technology(1) that they have been using for a long time! And the ARMv8A and earlier ISAs from ARM Holdings are RISC based so they will always take less transistors to implement that the CISC x86 based ISA. The reference design A53, A57, and the newer A72 cores are the 64 bit reference designs with the cortex A7 being a 32 bit ARM reference core design with a simpler ARMv7-A ISA! These ARM holdings offerings are for those not having or wishing to design their own custom cores, so it’s good to see ARM holdings, it’s lecensees and the fab industry tweaking the the 28nm process further to get more out of the mature 28nm process node.
If you want to see some small custom ARM cores wait until AMD gets fully into its K12 release with those ARM cores at 14nm and AMD maybe using its high density design libraries with its K12 custom ARM cores for even more ARM based CPU cores per unit area! AMD made the x86 based Carrizo core take up about 30% less space on the 28nm node just by using its high density design libraries normally used for GPU core layout, so AMD will be able to make its custom ARM cores take up even less space at 14nm relative to the other custom ARM core makers! The custom ARM core makers including AMD only License the ARMv8A ISA for ARM Holdings with each custom maker creating their own micro-architecture to run the ARMv8A ISA!.
I hope that once Zen is to market and AMD begins generating revenues from both Server/PC Zen variants that AMD will double down on making some very nice custom ARM core based Tablet APUs with Polaris based graphics. The x86 designs are probably only going to be used for the higher end tablet/laptop market while the software ecosystem for the mainstream tablet market is already using mostly ARM custom/reference based designs. Do not discount AMD coming into its own for the custom ARM based market once K12 and AMD’s newest GCN GPU IP starts to be available for mainstream custom ARM APU based tablets. AMDs reference A57 Seattle server SKUs will be eventually replaced by its custom K12 designs, and with the high density design libraries AMD has a method for getting even more cores(x86, or ARM, ISA based) onto any process node.
(1)
https://en.wikipedia.org/wiki/Package_on_package
I should have actually said
I should have actually said "pad limited" rather than pin. Old terminology. These are essentially the pads on the bottom side of the actual die which then connects to the organic substrate. That substrate can then be used in a PoP solution. If your die is too small, then you run out of space for all the necessary pads.
I did say pad limited, not
I did say pad limited, not pin limited. Stacking packages still requires pads of a certain size to get signals from one chip to another. Some of the newer technology uses very tightly spaced solder micro-balls but these will still take up some area. If you are talking about a 0.5 square mm die size (only one A7 core) that will only be 0.7 mm on a side. The whole SOC wouldn’t be that small, but you could easily be pushing the size limits with simpler chips. That size is on 28 nm; if this core was made on 14 nm, it would probably be literally microscopic. Unless a huge amount of other logic was included, the die size would be determined by the number of IO pads.
Thanks Josh for this
Thanks Josh for this analysis. It is very informative and elucidating. You have mentioned the benefits of the FD-SOI not less than a year ago. You say at 22nm it is as good or better than finfet. Does this mean that at process nodes smaller than 22nm finfet wins out?
Don’t know, but probably. An
Don't know, but probably. An interesting thing about FinFET is that it utilizes a fully depleted layer, just not the entire wafer is FDSOI. In other words, uses a bulk Si wafer but dopes where needed.