The Really Good Times are Over
Graphics advancement is slowed due to process issues.
We really do not realize how good we had it. Sure, we could apply that to budget surpluses and the time before the rise of global terrorism, but in this case I am talking about the predictable advancement of graphics due to both design expertise and improvements in process technology. Moore’s law has been exceptionally kind to graphics. We can look back and when we plot the course of these graphics companies, they have actually outstripped Moore in terms of transistor density from generation to generation. Most of this is due to better tools and the expertise gained in what is still a fairly new endeavor as compared to CPUs (the first true 3D accelerators were released in the 1993/94 timeframe).
The complexity of a modern 3D chip is truly mind-boggling. To get a good idea of where we came from, we must look back at the first generations of products that we could actually purchase. The original 3Dfx Voodoo Graphics was comprised of a raster chip and a texture chip, each contained approximately 1 million transistors (give or take) and were made on a then available .5 micron process (we shall call it 500 nm from here on out to give a sense of perspective with modern process technology). The chips were clocked between 47 and 50 MHz (though often could be clocked up to 57 MHz by going into the init file and putting in “SET SST_GRXCLK=57”… btw, SST stood for Sellers/Smith/Tarolli, the founders of 3Dfx). This revolutionary graphics card at the time could push out 47 to 50 megapixels and had 4 MB of VRAM and was released in the beginning of 1996.
My first 3D graphics card was the Orchid Righteous 3D. Voodoo Graphics was really the first successful consumer 3D graphics card. Yes, there were others before it, but Voodoo Graphics had the largest impact of them all.
In 1998 3Dfx released the Voodoo 2, and it was a significant jump in complexity from the original. These chips were fabricated on a 350 nm process. There were three chips to each card, one of which was the raster chip and the other two were texture chips. At the top end of the product stack was the 12 MB cards. The raster chip had 4 MB of VRAM available to it while each texture chip had 4 MB of VRAM for texture storage. Not only did this product double performance from the Voodoo Graphics, it was able to run in single card configurations at 800×600 (as compared to the max 640×480 of the Voodoo Graphics). This is the same time as when NVIDIA started to become a very aggressive competitor with the Riva TnT and ATI was about to ship the Rage 128.
Process technology at this time improved in leaps and bounds. Intel was always at or near the lead with others like IBM and Motorola keeping pace. TSMC was the first Pure-Play foundry selling line space to 3rd parties and others such as Chartered and UMC were competitive across all of their lines. TSMC has traditionally been the go-to foundry for the graphics industry, but around this time UMC was a close second. Within one and a half years from the introduction of the Voodoo 2 and TnT class of graphics adapters, TSMC was offering 250 nm lines for willing customers. NVIDIA was one of the first with the TnT 2 products, followed closely by 3dfx and the Voodoo 3. ATI was a little bit behind with the Rage 128 Pro, but they were making progress in keeping up.
Right after this we were introduced to the half-step for process nodes. TSMC released their 220 nm process for production and NVIDIA jumped on board with the original GeForce 256. We did not see the big jump in power and die size benefits that a full process node can give, but it did provide a quick transition for designers going to the next advanced node. Moving along we see the introduction of the 180 nm node and the GeForce 2 class of products. The GeForce 2 GTS was a 25 million transistor chip that was running at 200 MHz. Go back to the 2 million transistor Voodoo Graphics and we see that the chip design of the GeForce 2 GTS is 12.5x more complex running at four times the speed. Between the Voodoo Graphics and GeForce 2 GTS we see only a span of four years between these developments.
The NVIDIA Riva TnT was the first serious competitor for 3Dfx's lineup of cards, including the then new Voodoo 2.
The pace did not slow down there. Next up was the 150 nm half node from TSMC and the GeForce 3 series. This chip was a monster for the time. It was one of the first consumer level products that had a transistor count of around 57 million. The GeForce 4, which was released a year after the GeForce 3 and still using the 150 nm process bumped that count up to around 67 million. Then came the monster from ATI. The R300, which powered the Radeon 9700 Pro, was an astonishing 107 million transistors on the same 150 nm process. In the two years between 2000 and 2002 we see another quadrupling of transistor counts between two process nodes (and a half node at that) and another 100 to 150 MHz of speed for a complex GPU.
Around 2004 things started to slow down a bit, but that is a relative term as compared to the first eight years in 3D graphics. I had written an article at my old site that covered what I had expected to be a problem in the years following. “Slowing Down the Process Migration” discussed the inevitable slowing of process node transitions due to issues in materials, design strategies, and plain old physics. Little did I know some of the major issues that plagued the 130 nm jump (migrating voids, design rule changes midstream, etc.) would be solved and we again returned to a very regular cadence of process improvements. 130 nm lead to 110, 90, 80, 65, 55, 45, 40, 32, and now 28 nm. Graphics products did not inhabit every node, but they hit all of the major ones (45 and 32 nm were absent from most graphics platforms).
So where are we at now? In 2003 the top end product was the Radeon 9800 XT running at 412 MHz and was comprised of 117 million transistors using TSMC’s highly optimized 150 nm process. Today we are looking at the GTX TITAN based on the NVIDIA GK110 processor that weighs in at 7 billion transistors and around 850 MHz. This represents twice the raw clockspeed and an astonishing 70 times more complex in transistor design in the span of ten years. It is absolutely no wonder that we are spoiled by the constant stream of new products that advance the state of the art on a yearly basis with a major process node improvement every 18 months or so.
With this highly aggressive pace from year to year, why are we in graphics name only refresh-land right now? I am starting to see a lot of commenters discussing their displeasure at both NVIDIA and AMD for their lack of a true, next-generation GPU. The GK104 that originally powered the GTX 680 has morphed into a variety of products including the GTX 770 and GTX 760. The GTX TITAN based on GK110 was released last year and it has been repurposed for the GTX 780. AMD refreshed their lineups with last year’s Tahiti and Pitcairn chips, and the top end Hawaii chip (R9 290X) only reaches the complexity of last year’s GK110. These parts are all based on TSMC’s 28 nm process. Where exactly are the new chips and why aren’t we at 20 nm yet?
All the “next-generation
All the “next-generation 14nm” nodes are very similar, they’re basically “20nm” metal (64nm pitch double-patterned) with faster transistors — this applies to Intel “14nm” TriGate, TSMC “16nm” FinFET, GF “14nm” FinFET, ST “14nm” FDSOI Samsung “14nm” FinFET, there probably isn’t a single feature on any of the chips which is 14nm but they had to call them something which was better than 20nm.
TSMC wouldn’t call theirs 14nm because “fourteen” sounds like “go towards death” in Chinese — and STs 14nm FDSOI used to be called 20nm (which was at least honest) until their marketing realised that everyone else was calling their similar processes 14nm, so they renamed it…
They’re all a big advance on standard “20nm” planar (with the same metal stack) because lower leakage and lower operating voltage means lower power.
The issues are the risk and production difficulties and cost with new transistor structures, especially FinFET where Intel certainly had (and have?) issues with process variability, in spite of the fast they can sell both fast/leaky chips and slow/low power ones for more money than typical ones.
For all these processes (and 20nm bulk planar) the cost per gate is similar to or even higher than 28nm HKMG, which removes one of the big drivers for going to the next process node for many products. The industry was expecting EUV to come along and save its bacon, this not only hasn’t happened yet but will certainly miss the next node after these (“10nm”) which will need triple patterning — and good look with that, both for design and cost.
So the lower power and higher density will mean that more functionality can be crammed onto one chip, but also that this will cost more — which is an alien concept to an industry that for the last 40 years has assumed that the next process node will deliver more band for the same buck. Consumers may be in for a nasty shock when they find that their next super iGadget is even more expensive…
Thank goodness for marketing
Thank goodness for marketing and superstition to drive process naming! Thanks for the info. So strange to see these "advanced" nodes with the 20 nm back end. Gonna be an interesting next few years of process tech. Now we wait and see if all that money the industry invested in EUV will ever come to fruition.
came to this site first time,
came to this site first time, very impressive article, great read, thanks for that!! will stop by more often 🙂
” It looks to compete with
” It looks to compete with the GTX TITAN, but it will not leapfrog that part. It will probably end up faster, but by a couple of percentage points. It will not be the big jump we have seen in the past such as going from a GTX 580 or HD 6970 to a GTX 680 or HD 7970.”
Thats not really a fair comparison… you are comparing generational leaps compared to competing products.
The generational leap for the R290x is from the 7970. Similarly the GTX 780 is the generational leap from the GTX 680.
As for for how the R290x compares to the 7970.. it is about 59% faster give or take the application. Thats the biggest leap generation to generation for as long as I can remember.
Well, those really aren’t
Well, those really aren't generational leaps. They are bigger products based on the same GCN and Kepler architectures that were introduced with the HD 7970 and GTX 680 respectively. Titan has been out around a year now, and only now does AMD have an answer for that. All of them are based on 28 nm. So, those big chips are nice jumps in performance, but they are not the big architectural leaps that we have seen from the GTX 580 to GTX 680 or the HD 6970 to the HD 7970.
Are theese mostly PR related or people just start assumptions from having a “%30 lower power consumption” on a “sram array” that, it will also be on the same level on 400mm2 GPU? Or both?
Some is a bit of marketing
Some is a bit of marketing hype, but the basics of timelines and products seems to be in line with what is expected. Yes, there will be smaller chips, there will be more power efficient chips, but I think we will see some power/clock scaling issues with 20 nm planar. It will be a better overall process than 28 nm HKMG, but do not expect miracles at the high end with large chips. I could be out in left field, but it seems awfully positive and shiny in that blog.
Great article Josh
Great article Josh
I wish everyone could read this so we would stop hearing all the “wahhh Intel/AMD/Nvidia doesn’t care about enthusiasts anymore” nonsense. Transistors don’t just get smaller on their own.
You rock, Josh — great
You rock, Josh — great piece. A few clarifications. IBM is still using PD-SOI at 22nm in Power8 (see http://bit.ly/15saFUm). They’ve got SOI-FinFET lined up for 14nm. The FD-SOI crowd is skipping directly from 28nm to 14nm, which they say will be ready next year before 14nm (bulk) FinFET (see http://bit.ly/1cGjZgi). (Tho 28nm FDSOI is already pretty awesome in terms of power & perf — it’s what got 3GHz & an extra day of smartphone battery life – see http://bit.ly/1hPLvri). And ST’s capacity in France is much more than you’ve indicated — and now they’re in the process of doubling it (thank you, Europe!) so they’ll be at 4500 wafer starts/week by the end of 2014 (see http://bit.ly/1bdvMfr). Leti will have models available for 10nm FDSOI in a couple months, and PDKs in Q314 (see http://bit.ly/1bdwadP).
Really good info here!
Really good info here! Thanks for joining in!
Thank you for this
Thank you for this comprehensive and complete article on technological limitations of SC industry vs graphics maturity. I work in the ST fab that develop 28 then 14nm FDSOI right now and this kind of article makes it worth the efforts (to not say the hard work!) we put in this technology.
14 nm FDSOI looks very, very
14 nm FDSOI looks very, very interesting. Can't wait to see how it progresses!