Hitting the Wall Early and Often!
We as consumers have taken process advancements for granted. Moore’s Law states that transistor densities will double every 18 months, and that has held true for a long time. Companies like NVIDIA set an aggressive pace in terms of new products and refreshes that would often span around 14 to 16 months from start to finish. So here we are some 22 months from the introduction of the HD 7970 and we see this same part refreshed as the R9 280X. During that time we saw some clock speed improvements as TSMC’s 28 nm process matured, but the basic performance of the chip is essentially unchanged. Some 14 months after the release of the first GK104 parts from NVIDIA, they too refreshed those exact same chips with the GTX 700 series. Again, we saw a small bump in performance due to higher clockspeeds, but there is no true next-generation part waiting in the wings.
What exactly has happened that has slowed the pace of advancement for graphics? There are two major factors seemingly at play; the rise of mobile computing and the chips that are powering this revolution, and the extreme slowdown of process migration as compared to historical trends.
I am still not entirely sure what voodoo ATI used at the time to get the basic R300 design to run on TSMC's 150 nm process as effectively as they did. The 9800 XT at 415 MHz is just sorta crazy, and it didn't break the bank when it came to TDPs. The original R300 was a singular moment in GPU history.
Mobile computing is perhaps the most tenuous reason, but it does make some sense in a variety of factors. Both AMD and NVIDIA have mobile graphics groups which take away design resources from their larger projects. While the modern Kepler and GCN architectures are able to scale from fairly low to really high in terms of TDP, they are not entirely effective when talking about the half watt space that are primarily where smartphones sit. NVIDIA has a totally different architecture for graphics in Tegra as compared to the desktop. AMD does not have a graphics architecture that will currently exist in that ultra-low TDP range, instead they utilize GCN for products that are 4 watts and up. NVIDIA is planning on opening up Kepler to those areas, but they are not there yet.
Mobile computing is also a growth area for these companies as compared to desktop and laptop graphics. R&D resources now have to be spread out to the different groups and they have to have competitive products, otherwise the company will not be able to cash in on those growth numbers that we have been seeing for the past several years. After mobile chips have been developed, then we fractionalize off software and hardware support so these products can be integrated effectively into a 3rd party product. This is all money shifted away from desktop graphics. Remember, desktop graphics is actually a shrinking market due to the effective integration of graphics not just in the mobile space, but also with higher powered CPUs/APUs from Intel and AMD.
Finally with mobile computing, we are seeing a lot more pressure on advanced process lines in terms of wafer buys. These ARM based chips are thriving at the 32 nm and 28 nm nodes. The vast majority of users are quite pleased by the performance of these products across different workloads, and they have excellent power characteristics. These are relatively small chips, so quite a few of them can be fit onto a wafer. The problem here is the economics. Margins are thin on these chips, and so the companies making the orders are probably much more aggressive in pursuing contracts, and leveraging different pure-play foundries against each other (TSMC, UMC, and GLOBALFOUNDRIES). Samsung then throws another wrench into the mix by not just fabricating their own parts (Exynos), but also selling fab space to their competition in the form of Apple. If these companies can in fact effectively negotiate lower priced wafers with promises of filling up the lines with orders, then companies such as TSMC will make less money per wafer as compared to more complex products like GPUs. Less money is less R&D for advanced process features, and this behavior also maximizes the already spent R&D investment on the current process. The end result here is less money being allocated towards advanced process development, so these advanced nodes will take longer to develop.
The accountants at the foundries have some very complex equations to maximize manufacturing and minimize expenses. The risk of falling behind is always there, but these foundries are used to being a process node behind the industry leader (Intel) and still being able to pull good profits. These foundries also get a significant cost break by adopting technology well after Intel has done the lion’s share of work (think optics, lithography, wafer handling, deposition, etc.) and monetary investment. Their motivation is to stay close, but not risk the bleeding edge. This is the opposite of what AMD did when they owned their own fabs, as their primary product competed directly with Intel. Now the GLOBALFOUNDRIES is on its own, it has slowed down its pace of next generation process technology introduction, much to AMD’s chagrin.
Mobile computing has been a steady stream of income for the foundries as more and more products require advanced chips to power them. Again, maximizing the investment in a current process line makes the company more money and leverages the expenses much more effectively than trying to jump to the next node as soon as possible.
This leads us into the slowdown of process technology that we are seeing. While previous process nodes have had their issues (130 nm had void migration, the jump to copper interconnects was not without problems, etc.) it seems like the current 28 nm HKMG node was perhaps the last “easy” jump that the foundry industry will see. This is not to say that 28 nm HKMG was easy, but the obstacles in the way towards 22/20 nm are pretty tremendous. Intel was able to get to 22 nm over a year and a half ago with very good results. This came about because of the billions that Intel invested in their fabrication technology. They are the first to have implemented Tri-Gate in mass produced parts. This was not an inexpensive endeavor in terms of money and man-hours. Now, the reason why Intel went with the Tri-Gate technology was not about beating its chest and proclaiming that they had the most advanced process available; the reason was that they had no real choice in the matter if they were going to produce high performance CPUs that would scale power effectively with clock speed.
Intel spent billions to get 22 nm Tri-Gate up and running. They are reaping the benefits of this technology each and every quarter that the rest of the industry lags behind.
22/20 nm processes can pack the transistors in. Such a process utilizing planar transistors will have some issues right off the bat. This is very general, but essentially the power curve increases very dramatically with clockspeed. For example, if we were to compare transistor performance from 28 nm HKMG to a 20 nm HKMG product, the 20 nm might in fact be less power efficient per clock per transistor. So while the designer can certainly pack more transistors into the same area, there could be some very negative effects from implementing that into a design. For example, if a designer wants to create a chip with the same functionality as the old, but increase the number of die per wafer, then they can do that with the smaller process. This may not be performance optimized though. If the designer then specifies that the chips have to run as fast as the older, larger versions, then they run a pretty hefty risk of the chip pulling just as much power (if not more) and producing more heat per mm squared than the previous model.
Intel got around this particular issue by utilizing Tri-Gates. This technology allowed the scaling of performance and power that we are accustomed to with process shrinks. This technology has worked out very well for Intel, but it is not perfect. As we have seen with Ivy Bridge and Haswell, these products do not scale in speed as well as the older, larger 32 nm Sandy Bridge processors. Both of the 22 nm architectures start pulling in more power than the previous generation when clockspeeds go past 4.0 GHz. Having said that, the Intel 22 nm Tri-Gate process is exceptionally power efficient at lower clockspeeds. The slower the transistors switch, the more efficient they are. These characteristics are very favorable to Intel when approaching the mobile sector. This is certainly an area that Intel hopes to clean up in. This is the area that is finally scaring all the other 3rd party SOC designers (Qualcomm, Samsung, NVIDIA, etc.) and potentially putting more pressure on the pure-play foundries to get it together.