Hitting the Wall Early and Often!
We as consumers have taken process advancements for granted. Moore’s Law states that transistor densities will double every 18 months, and that has held true for a long time. Companies like NVIDIA set an aggressive pace in terms of new products and refreshes that would often span around 14 to 16 months from start to finish. So here we are some 22 months from the introduction of the HD 7970 and we see this same part refreshed as the R9 280X. During that time we saw some clock speed improvements as TSMC’s 28 nm process matured, but the basic performance of the chip is essentially unchanged. Some 14 months after the release of the first GK104 parts from NVIDIA, they too refreshed those exact same chips with the GTX 700 series. Again, we saw a small bump in performance due to higher clockspeeds, but there is no true next-generation part waiting in the wings.
What exactly has happened that has slowed the pace of advancement for graphics? There are two major factors seemingly at play; the rise of mobile computing and the chips that are powering this revolution, and the extreme slowdown of process migration as compared to historical trends.
I am still not entirely sure what voodoo ATI used at the time to get the basic R300 design to run on TSMC's 150 nm process as effectively as they did. The 9800 XT at 415 MHz is just sorta crazy, and it didn't break the bank when it came to TDPs. The original R300 was a singular moment in GPU history.
Mobile computing is perhaps the most tenuous reason, but it does make some sense in a variety of factors. Both AMD and NVIDIA have mobile graphics groups which take away design resources from their larger projects. While the modern Kepler and GCN architectures are able to scale from fairly low to really high in terms of TDP, they are not entirely effective when talking about the half watt space that are primarily where smartphones sit. NVIDIA has a totally different architecture for graphics in Tegra as compared to the desktop. AMD does not have a graphics architecture that will currently exist in that ultra-low TDP range, instead they utilize GCN for products that are 4 watts and up. NVIDIA is planning on opening up Kepler to those areas, but they are not there yet.
Mobile computing is also a growth area for these companies as compared to desktop and laptop graphics. R&D resources now have to be spread out to the different groups and they have to have competitive products, otherwise the company will not be able to cash in on those growth numbers that we have been seeing for the past several years. After mobile chips have been developed, then we fractionalize off software and hardware support so these products can be integrated effectively into a 3rd party product. This is all money shifted away from desktop graphics. Remember, desktop graphics is actually a shrinking market due to the effective integration of graphics not just in the mobile space, but also with higher powered CPUs/APUs from Intel and AMD.
Finally with mobile computing, we are seeing a lot more pressure on advanced process lines in terms of wafer buys. These ARM based chips are thriving at the 32 nm and 28 nm nodes. The vast majority of users are quite pleased by the performance of these products across different workloads, and they have excellent power characteristics. These are relatively small chips, so quite a few of them can be fit onto a wafer. The problem here is the economics. Margins are thin on these chips, and so the companies making the orders are probably much more aggressive in pursuing contracts, and leveraging different pure-play foundries against each other (TSMC, UMC, and GLOBALFOUNDRIES). Samsung then throws another wrench into the mix by not just fabricating their own parts (Exynos), but also selling fab space to their competition in the form of Apple. If these companies can in fact effectively negotiate lower priced wafers with promises of filling up the lines with orders, then companies such as TSMC will make less money per wafer as compared to more complex products like GPUs. Less money is less R&D for advanced process features, and this behavior also maximizes the already spent R&D investment on the current process. The end result here is less money being allocated towards advanced process development, so these advanced nodes will take longer to develop.
The accountants at the foundries have some very complex equations to maximize manufacturing and minimize expenses. The risk of falling behind is always there, but these foundries are used to being a process node behind the industry leader (Intel) and still being able to pull good profits. These foundries also get a significant cost break by adopting technology well after Intel has done the lion’s share of work (think optics, lithography, wafer handling, deposition, etc.) and monetary investment. Their motivation is to stay close, but not risk the bleeding edge. This is the opposite of what AMD did when they owned their own fabs, as their primary product competed directly with Intel. Now the GLOBALFOUNDRIES is on its own, it has slowed down its pace of next generation process technology introduction, much to AMD’s chagrin.
Mobile computing has been a steady stream of income for the foundries as more and more products require advanced chips to power them. Again, maximizing the investment in a current process line makes the company more money and leverages the expenses much more effectively than trying to jump to the next node as soon as possible.
This leads us into the slowdown of process technology that we are seeing. While previous process nodes have had their issues (130 nm had void migration, the jump to copper interconnects was not without problems, etc.) it seems like the current 28 nm HKMG node was perhaps the last “easy” jump that the foundry industry will see. This is not to say that 28 nm HKMG was easy, but the obstacles in the way towards 22/20 nm are pretty tremendous. Intel was able to get to 22 nm over a year and a half ago with very good results. This came about because of the billions that Intel invested in their fabrication technology. They are the first to have implemented Tri-Gate in mass produced parts. This was not an inexpensive endeavor in terms of money and man-hours. Now, the reason why Intel went with the Tri-Gate technology was not about beating its chest and proclaiming that they had the most advanced process available; the reason was that they had no real choice in the matter if they were going to produce high performance CPUs that would scale power effectively with clock speed.
Intel spent billions to get 22 nm Tri-Gate up and running. They are reaping the benefits of this technology each and every quarter that the rest of the industry lags behind.
22/20 nm processes can pack the transistors in. Such a process utilizing planar transistors will have some issues right off the bat. This is very general, but essentially the power curve increases very dramatically with clockspeed. For example, if we were to compare transistor performance from 28 nm HKMG to a 20 nm HKMG product, the 20 nm might in fact be less power efficient per clock per transistor. So while the designer can certainly pack more transistors into the same area, there could be some very negative effects from implementing that into a design. For example, if a designer wants to create a chip with the same functionality as the old, but increase the number of die per wafer, then they can do that with the smaller process. This may not be performance optimized though. If the designer then specifies that the chips have to run as fast as the older, larger versions, then they run a pretty hefty risk of the chip pulling just as much power (if not more) and producing more heat per mm squared than the previous model.
Intel got around this particular issue by utilizing Tri-Gates. This technology allowed the scaling of performance and power that we are accustomed to with process shrinks. This technology has worked out very well for Intel, but it is not perfect. As we have seen with Ivy Bridge and Haswell, these products do not scale in speed as well as the older, larger 32 nm Sandy Bridge processors. Both of the 22 nm architectures start pulling in more power than the previous generation when clockspeeds go past 4.0 GHz. Having said that, the Intel 22 nm Tri-Gate process is exceptionally power efficient at lower clockspeeds. The slower the transistors switch, the more efficient they are. These characteristics are very favorable to Intel when approaching the mobile sector. This is certainly an area that Intel hopes to clean up in. This is the area that is finally scaring all the other 3rd party SOC designers (Qualcomm, Samsung, NVIDIA, etc.) and potentially putting more pressure on the pure-play foundries to get it together.
All the “next-generation
All the “next-generation 14nm” nodes are very similar, they’re basically “20nm” metal (64nm pitch double-patterned) with faster transistors — this applies to Intel “14nm” TriGate, TSMC “16nm” FinFET, GF “14nm” FinFET, ST “14nm” FDSOI Samsung “14nm” FinFET, there probably isn’t a single feature on any of the chips which is 14nm but they had to call them something which was better than 20nm.
TSMC wouldn’t call theirs 14nm because “fourteen” sounds like “go towards death” in Chinese — and STs 14nm FDSOI used to be called 20nm (which was at least honest) until their marketing realised that everyone else was calling their similar processes 14nm, so they renamed it…
They’re all a big advance on standard “20nm” planar (with the same metal stack) because lower leakage and lower operating voltage means lower power.
The issues are the risk and production difficulties and cost with new transistor structures, especially FinFET where Intel certainly had (and have?) issues with process variability, in spite of the fast they can sell both fast/leaky chips and slow/low power ones for more money than typical ones.
For all these processes (and 20nm bulk planar) the cost per gate is similar to or even higher than 28nm HKMG, which removes one of the big drivers for going to the next process node for many products. The industry was expecting EUV to come along and save its bacon, this not only hasn’t happened yet but will certainly miss the next node after these (“10nm”) which will need triple patterning — and good look with that, both for design and cost.
So the lower power and higher density will mean that more functionality can be crammed onto one chip, but also that this will cost more — which is an alien concept to an industry that for the last 40 years has assumed that the next process node will deliver more band for the same buck. Consumers may be in for a nasty shock when they find that their next super iGadget is even more expensive…
Thank goodness for marketing
Thank goodness for marketing and superstition to drive process naming! Thanks for the info. So strange to see these "advanced" nodes with the 20 nm back end. Gonna be an interesting next few years of process tech. Now we wait and see if all that money the industry invested in EUV will ever come to fruition.
came to this site first time,
came to this site first time, very impressive article, great read, thanks for that!! will stop by more often 🙂
” It looks to compete with
” It looks to compete with the GTX TITAN, but it will not leapfrog that part. It will probably end up faster, but by a couple of percentage points. It will not be the big jump we have seen in the past such as going from a GTX 580 or HD 6970 to a GTX 680 or HD 7970.”
Thats not really a fair comparison… you are comparing generational leaps compared to competing products.
The generational leap for the R290x is from the 7970. Similarly the GTX 780 is the generational leap from the GTX 680.
As for for how the R290x compares to the 7970.. it is about 59% faster give or take the application. Thats the biggest leap generation to generation for as long as I can remember.
Well, those really aren’t
Well, those really aren't generational leaps. They are bigger products based on the same GCN and Kepler architectures that were introduced with the HD 7970 and GTX 680 respectively. Titan has been out around a year now, and only now does AMD have an answer for that. All of them are based on 28 nm. So, those big chips are nice jumps in performance, but they are not the big architectural leaps that we have seen from the GTX 580 to GTX 680 or the HD 6970 to the HD 7970.
http://www.cadence.com/Commun
http://www.cadence.com/Community/blogs/ii/archive/2013/04/14/tsmc-2013-symposium-progress-in-20nm-16nm-finfet-and-3d-ic-technologies.aspx
Are theese mostly PR related or people just start assumptions from having a “%30 lower power consumption” on a “sram array” that, it will also be on the same level on 400mm2 GPU? Or both?
Some is a bit of marketing
Some is a bit of marketing hype, but the basics of timelines and products seems to be in line with what is expected. Yes, there will be smaller chips, there will be more power efficient chips, but I think we will see some power/clock scaling issues with 20 nm planar. It will be a better overall process than 28 nm HKMG, but do not expect miracles at the high end with large chips. I could be out in left field, but it seems awfully positive and shiny in that blog.
Great article Josh
Thnx
Great article Josh
Thnx
Thanks!
Thanks!
Wonderful article!
I wish
Wonderful article!
I wish everyone could read this so we would stop hearing all the “wahhh Intel/AMD/Nvidia doesn’t care about enthusiasts anymore” nonsense. Transistors don’t just get smaller on their own.
You rock, Josh — great
You rock, Josh — great piece. A few clarifications. IBM is still using PD-SOI at 22nm in Power8 (see http://bit.ly/15saFUm). They’ve got SOI-FinFET lined up for 14nm. The FD-SOI crowd is skipping directly from 28nm to 14nm, which they say will be ready next year before 14nm (bulk) FinFET (see http://bit.ly/1cGjZgi). (Tho 28nm FDSOI is already pretty awesome in terms of power & perf — it’s what got 3GHz & an extra day of smartphone battery life – see http://bit.ly/1hPLvri). And ST’s capacity in France is much more than you’ve indicated — and now they’re in the process of doubling it (thank you, Europe!) so they’ll be at 4500 wafer starts/week by the end of 2014 (see http://bit.ly/1bdvMfr). Leti will have models available for 10nm FDSOI in a couple months, and PDKs in Q314 (see http://bit.ly/1bdwadP).
Really good info here!
Really good info here! Thanks for joining in!
Thank you for this
Thank you for this comprehensive and complete article on technological limitations of SC industry vs graphics maturity. I work in the ST fab that develop 28 then 14nm FDSOI right now and this kind of article makes it worth the efforts (to not say the hard work!) we put in this technology.
14 nm FDSOI looks very, very
14 nm FDSOI looks very, very interesting. Can't wait to see how it progresses!