The Really Good Times are Over
Graphics advancement is slowed due to process issues.
We really do not realize how good we had it. Sure, we could apply that to budget surpluses and the time before the rise of global terrorism, but in this case I am talking about the predictable advancement of graphics due to both design expertise and improvements in process technology. Moore’s law has been exceptionally kind to graphics. We can look back and when we plot the course of these graphics companies, they have actually outstripped Moore in terms of transistor density from generation to generation. Most of this is due to better tools and the expertise gained in what is still a fairly new endeavor as compared to CPUs (the first true 3D accelerators were released in the 1993/94 timeframe).
The complexity of a modern 3D chip is truly mind-boggling. To get a good idea of where we came from, we must look back at the first generations of products that we could actually purchase. The original 3Dfx Voodoo Graphics was comprised of a raster chip and a texture chip, each contained approximately 1 million transistors (give or take) and were made on a then available .5 micron process (we shall call it 500 nm from here on out to give a sense of perspective with modern process technology). The chips were clocked between 47 and 50 MHz (though often could be clocked up to 57 MHz by going into the init file and putting in “SET SST_GRXCLK=57”… btw, SST stood for Sellers/Smith/Tarolli, the founders of 3Dfx). This revolutionary graphics card at the time could push out 47 to 50 megapixels and had 4 MB of VRAM and was released in the beginning of 1996.
My first 3D graphics card was the Orchid Righteous 3D. Voodoo Graphics was really the first successful consumer 3D graphics card. Yes, there were others before it, but Voodoo Graphics had the largest impact of them all.
In 1998 3Dfx released the Voodoo 2, and it was a significant jump in complexity from the original. These chips were fabricated on a 350 nm process. There were three chips to each card, one of which was the raster chip and the other two were texture chips. At the top end of the product stack was the 12 MB cards. The raster chip had 4 MB of VRAM available to it while each texture chip had 4 MB of VRAM for texture storage. Not only did this product double performance from the Voodoo Graphics, it was able to run in single card configurations at 800×600 (as compared to the max 640×480 of the Voodoo Graphics). This is the same time as when NVIDIA started to become a very aggressive competitor with the Riva TnT and ATI was about to ship the Rage 128.
Process technology at this time improved in leaps and bounds. Intel was always at or near the lead with others like IBM and Motorola keeping pace. TSMC was the first Pure-Play foundry selling line space to 3rd parties and others such as Chartered and UMC were competitive across all of their lines. TSMC has traditionally been the go-to foundry for the graphics industry, but around this time UMC was a close second. Within one and a half years from the introduction of the Voodoo 2 and TnT class of graphics adapters, TSMC was offering 250 nm lines for willing customers. NVIDIA was one of the first with the TnT 2 products, followed closely by 3dfx and the Voodoo 3. ATI was a little bit behind with the Rage 128 Pro, but they were making progress in keeping up.
Right after this we were introduced to the half-step for process nodes. TSMC released their 220 nm process for production and NVIDIA jumped on board with the original GeForce 256. We did not see the big jump in power and die size benefits that a full process node can give, but it did provide a quick transition for designers going to the next advanced node. Moving along we see the introduction of the 180 nm node and the GeForce 2 class of products. The GeForce 2 GTS was a 25 million transistor chip that was running at 200 MHz. Go back to the 2 million transistor Voodoo Graphics and we see that the chip design of the GeForce 2 GTS is 12.5x more complex running at four times the speed. Between the Voodoo Graphics and GeForce 2 GTS we see only a span of four years between these developments.
The NVIDIA Riva TnT was the first serious competitor for 3Dfx's lineup of cards, including the then new Voodoo 2.
The pace did not slow down there. Next up was the 150 nm half node from TSMC and the GeForce 3 series. This chip was a monster for the time. It was one of the first consumer level products that had a transistor count of around 57 million. The GeForce 4, which was released a year after the GeForce 3 and still using the 150 nm process bumped that count up to around 67 million. Then came the monster from ATI. The R300, which powered the Radeon 9700 Pro, was an astonishing 107 million transistors on the same 150 nm process. In the two years between 2000 and 2002 we see another quadrupling of transistor counts between two process nodes (and a half node at that) and another 100 to 150 MHz of speed for a complex GPU.
Around 2004 things started to slow down a bit, but that is a relative term as compared to the first eight years in 3D graphics. I had written an article at my old site that covered what I had expected to be a problem in the years following. “Slowing Down the Process Migration” discussed the inevitable slowing of process node transitions due to issues in materials, design strategies, and plain old physics. Little did I know some of the major issues that plagued the 130 nm jump (migrating voids, design rule changes midstream, etc.) would be solved and we again returned to a very regular cadence of process improvements. 130 nm lead to 110, 90, 80, 65, 55, 45, 40, 32, and now 28 nm. Graphics products did not inhabit every node, but they hit all of the major ones (45 and 32 nm were absent from most graphics platforms).
So where are we at now? In 2003 the top end product was the Radeon 9800 XT running at 412 MHz and was comprised of 117 million transistors using TSMC’s highly optimized 150 nm process. Today we are looking at the GTX TITAN based on the NVIDIA GK110 processor that weighs in at 7 billion transistors and around 850 MHz. This represents twice the raw clockspeed and an astonishing 70 times more complex in transistor design in the span of ten years. It is absolutely no wonder that we are spoiled by the constant stream of new products that advance the state of the art on a yearly basis with a major process node improvement every 18 months or so.
With this highly aggressive pace from year to year, why are we in graphics name only refresh-land right now? I am starting to see a lot of commenters discussing their displeasure at both NVIDIA and AMD for their lack of a true, next-generation GPU. The GK104 that originally powered the GTX 680 has morphed into a variety of products including the GTX 770 and GTX 760. The GTX TITAN based on GK110 was released last year and it has been repurposed for the GTX 780. AMD refreshed their lineups with last year’s Tahiti and Pitcairn chips, and the top end Hawaii chip (R9 290X) only reaches the complexity of last year’s GK110. These parts are all based on TSMC’s 28 nm process. Where exactly are the new chips and why aren’t we at 20 nm yet?
thanks Josh. Lets hope there
thanks Josh. Lets hope there is that one guy who says “how about trying this?”, and he changes everything.
Make no mistake, there is a
Make no mistake, there is a lot of research in a LOT of different areas to overcome the issues that the industry is running into. The challenges have always been there (breaking the 1 micron barrier was seemingly huge), but now the challenges are just bigger, more complex, and more expensive.
Or maybe one of the foundrys
Or maybe one of the foundrys will have a happy accident Bob Ross style.
They’ll come in one monring to find their equipment had slipped around a new nm during the night and everything is a little out of whack. They’re about to toss out the batch when someone grabs a wafer for the fun of it and runs a test and BAM! breakthrough!
Guy can dream, can’t he.
Thanks Josh. Fantastic
Thanks Josh. Fantastic article.
Very awesome and informative
Very awesome and informative article Josh. What implications could this have with Moore’s law? Does this effectively stop it before the theoretical quantum limit in 2036? These will be an interesting few years for pure-play foundries and their clients indeed.
Well, things will be
Well, things will be necessarily slowing down. There simply are hurdles that need lots of time and lots of money to solve. 10 nm shouldn't be that bad, 7 nm is hitting some interesting limits, and sub 7 nm is going to be really rough. Litho, materials, and electrical characterisitcs at that size will be sorta crazy.
Just to add a few points to
Just to add a few points to this excellent article:
– The 14nm/16nm nodes for GloFo and TSMC, respectively, are going to be utilizing a 20nm back-end-of-line. This means that while density won’t increase, they’ll improve power characteristics (these are the two FinFET nodes).
– The time-to-market for the above two nodes from both foundries should be more painless than if they were to attempt a shrink + FinFETs. As a result, if I were to guess I’d say we see the 14nm/16nm nodes a bit earlier than some had anticipated. Early tape-outs for 14nm and 20nm have been close together so that definitely adds some credence to that line of thought. Though not certainty ;P
– These node names (eg., 14nm) don’t actually accurately describe the half-pitch. Unless I’m recalling incorrectly, the current tools would only allow something like 18nm(?). Intel’s current 22nm FinFETs has been described in papers as 26nm. Whether that’s true or not, I have no idea, but the point is that the half-pitch is only a single detail in a long list of attributes that defines a new “node.” The takeaway is that you shouldn’t get too caught up in the XXnm numbers and remember that it’s the power, leakage, density, and performance of the node that actually matters.
Thanks for the
Thanks for the comments.
About the node names… Intel's 22 nm describes the smallest feature, but you are correct in that a certain other feature (I think it has to do with SRAM) is 26 nm. There was some thought that AMD with GF's 28 nm would be able to get fairly close to the transistor density of Intel's 22 nm in certain aspects due to this size variance.
Nice summary Josh. As the
Nice summary Josh. As the person above noted above, the node numbers are not strictly related to feature size (e.g. TSMC 16nm is FinFET transistors on 20nm backend).
Nvidia likes to talk about how GPUs have better than Moore’s Law scaling, but with die sizes already at 550mm2 (GK110), that will not be true going forward – die sizes are already close to the limits of fab reticles (~600mm2).
I just had this same conversation with AMD’s Raja Kudari. Raja’s response is that it will take new architectures to improve performance, not just process shrinks and die area growth. It’s going to take improvements in architecture efficiency and effectiveness. It also means that the GPU designers need to work closer with game engine developers to find efficiency improvements – Mantle is one example.
If you have some spare
If you have some spare minutes, you should read that old article I linked. Some interesting stuff there (considering it was written in 2004 and issues at 130 nm were just being solved).
Thanks for reading! The next few years are going to be very interesting considering the challenges ahead!
Reason I found it interesting
Reason I found it interesting that last year AMD replaced CPU architects that were the creators of the Athlon. We all really need AMD to do well to drive pricing down for everyone and to also push technology forward.
I wish I could edit 🙂 They
I wish I could edit 🙂 They replaced the bulldozer architect. They have hired people who worked on the Athlon projects. 1 am sorry :).
Yeah, some of the old guys
Yeah, some of the old guys came back. Jim Keller is the big name. Raja Koduri on the GPU side is back. There is a lot of uplift in what they are trying to do, and I think overall they are heading in the right direction. I like Dirk Meyer, but while he was a great CPU architect, he almost missed the major mobile transition that his product stack would not be able to address.
Wonderful and informative
Wonderful and informative post Josh. As a theoretical physicist with some background in solid state physics I’ve been aware of a few of the issues facing the industry especially the lithography. I cannot even begin to imagine how hard it’s going to be to get 7-5nm process nodes operational. I expect quantum effects to come in earlier maybe even 10nm will be very tough. Quantum tunnelling will no doubt be a huge issue when line traces are so small.
Interesting times ahead. It looks like either an R9 290x or a GTX 780 Ti will be my friend until well into 2015, but that’s okay, as they are still going to be pretty darn good cards.
I find it interesting that
I find it interesting that pretty much the entire industry is heavily invested in EUV… and from what I understand, the risk there is still very high that it will even work out.
The industry has been working
The industry has been working on EUV for over 10 years already and still seems quite far from its target (which has been moving during that time too).
Those are interesting times indeed at the process level.
The question you have not breached is about the economics of it. We have been seeing a lot of consolidation in the semiconductor for the past ten years and it is accelerating. Each process node cost exponentially more than the last one and THIS is the reason for pure play foundries: few companies can afford their own fabs anymore. Intel is of course the exception, but even they are starting to open their fabs to other companies (which nobody would expect just a few years ago). That means that even intel has too much capacity and cannot fill its fabs anymore.
Semiconductor is so far the pinnacle of human ingenuity, taking so much efforts from so many people to keep on track and follow Moore’s Law. All those people are hard at work on EUV and backup plans (multiple patterning, they are doing dual, but 3 or 4 are definitely possible, 3D transistors are also coming, first in Flash at Samsung). We have not yet seen the end of semiconductor growth.
Looks like someone got
Looks like someone got influenced by their trip to Montreal.
Heh, I didn’t go to Montreal
Heh, I didn't go to Montreal with Ryan and Ken. Oddly enough, I started researching and writing this before that event. I was sorta cranky when Carmack started talking about this subject… day late and a dollar short for me (or rather many millions of dollars short).
Your article really reminds
Your article really reminds of the past seeing the names of all the cards and players in the market. There was so much more excitement back then. Thanks for the read 0/
This is the first time I just
This is the first time I just sit and read a long article with my focus just on it. Very nice article, very informative especially for me who is new to this kind of stuff. Thanks for this.
Thanks for reading the entire
Thanks for reading the entire thing! Ryan will thank you as well!
Great writing Josh. Also it
Great writing Josh. Also it was nice to visit your archives for the first time. I enjoyed both articles.
Some great articles from the
Some great articles from the PC Per staff this week. Keep up the good work.
Can you run that thing in
Can you run that thing in SLI? Also did it cause you to loose all your hair?
I bet the prices stayed the same but not with inflation. Don’t tell marketing people about inflation. Once they learn about it we are all screwed.
I lost my hair because I got
I lost my hair because I got married and had kids.
This is a stunning article,
This is a stunning article, this is why I visit Pcper every day. Ryan give the man a bonus!
This article is up there with Scott’s “The Windows You Love is Gone” stunner a year ago.
https://pcper.com/reviews/Editorial/Windows-You-Love-Gone
These kind of delays are to
These kind of delays are to be expected. The smaller you go, the indiviual effects become that much ore pronounced. Instead of treating the design as a whole or in smaller but relatively large units, more research needs to be done in examining each and every change occuring within the system. Very time consuming and expensive. I will not be surprised if there are further delays. The break-neck speed of development had to come to an end some time.
Not disappointed about delays but very much expected. Can´t keep throwing money and expect it to pay dividends immediatly. My 2 cents.
Yup, you are likely correct.
Yup, you are likely correct. What we often don't hear about is how closely the fab engineers work with the designers. The amount of back and forth work and information they do is pretty staggering, especially with these next generation nodes. This simply isn't a "we are finished with the design, send it to the Fab guys and they can figure out the rest!" situation anymore.
We are also seeing the pure-play guys working to amortize their investments in current process nodes… because the next gen stuff is so expensive. Gotta pay those bills. They only hope that Intel will slow down, cause those guys don't clear $3 billion a quarter like Intel does.
Amazing article Josh, thank
Amazing article Josh, thank you for an informative read. I feel thoroughly educated.
Great article Josh! Thank you
Great article Josh! Thank you from cold mother Russia! 😀
Thanks! I woke up this
Thanks! I woke up this morning to weather that was -3C this AM! Happy cold days to you as well!
Wow Josh, great article, I’m
Wow Josh, great article, I’m with you on this. you really have a passion for this.
Great article!
I do think
Great article!
I do think you got a bit speculative on the impact of mobile chips and on the supposed decline of the desktop graphics market… there has been some research recently showing that the desktop graphics segment is actually healthy and growing.
I think this needs substantiation and cannot be assumed:
“Remember, desktop graphics is actually a shrinking market due to the effective integration of graphics not just in the mobile space, but also with higher powered CPUs/APUs from Intel and AMD.”
Desktop graphics are not
Desktop graphics are not growing, they are shrinking. But they are not shrinking that much. Intel and AMD have such good integrated graphics anymore, a large portion of the people who would previously have been bundled with a low end card are now just integrated.
The sky is not falling on discrete graphics though, it just is not growing anymore. Mobile IS growing, and that is where a lot of the R&D is going.
I agree that a lot of R&D is
I agree that a lot of R&D is going into mobile. However, things like this:
http://www.techpowerup.com/188572/global-pc-gaming-hardware-sales-shrug-off-pc-market-decline-jpr.html
suggest that there is growth occuring in the discrete graphics segment.
That’s why I said that there needs to be some substantiation of the idea that mobile + integrated GPUs are detrimental to discrete GPU growth.
Discrete isn’t growing
Discrete isn't growing though. Take a look at some of the J Peddie numbers over the past few years. Sure, gaming systems are not being affected by the PC slowdown, but there are fewer shipments now than there were 3 years ago for discrete graphics. It isn't plummeting, and it is a healthy market, but it just isn't growing. All of the growth is mobile right now.
HAHA I still have my voodoo
HAHA I still have my voodoo 2’s and sli cable!
The man is a repository of
The man is a repository of knowledge. Ryan is truly lucky.
And I bathe regularly!
And I bathe regularly!
Wow cool article.
As for the
Wow cool article.
As for the innovations in GPUs on 20nm. Well there is HyperCube or Stacked GDDR5 memory. You will get a marginally better GPU die but because the bandwidth is going to sky rocket it will seem like Christmas again…
Also AMD had a stacked DRAM prototype spotted in the wild in 2011. Why didn’t it hit the market earlier? Maybe it needs time to be introduced to the market or maybe they saved this architectural revelation for the tough times, that the 20 nm without FDSOI is going to be…
Also come to think about it, should a GPU have 1 TB/s of bandwidth to main memory with improved latency at the same time, a lot of the on-die caches could be removed and pave way for more computational resources in the same die area. Of course the engineers will have to do their job and pick the right choices, but this is a possible outcome of new architectural breakthroughs that are orthogonal to the silicon production process.
So my point is, that the 290X/Titan replacement may in fact be a massive performance leap forward irrespective of the 20nm problems.