28 nm Continued
AMD still has not started fleshing out their entire refreshed line of parts. The 300 series of cards have not seen the light of day, but the situation is becoming a little clearer every day. AMD gave us a glimpse of the future when they released the “Tonga” based R9 285 last Fall. Tonga is still a 28 nm part that features the GCN 1.2 architecture which includes improvements to power consumption, implementing color compression to increase usable bandwidth, inclusion of all the latest technologies such as TrueAudio and xDMA, and double the primitives of previous chips. This will be the baseline for what we will see with the upcoming “Fiji” product which will be the first to implement the GCN 1.3 architecture.
HBM1 will be used with Fiji. AMD is the first out of the gates with a large, complex, mass produced part that will utilize a 2.5D HBM implementation. (Photo courtesy of diit.cz)
It looks as if a fully implemented Tonga (with 2048 stream units and a 384 bit memory bus) will comprise the R9 370X while the current R9 285 will probably be rebranded as the R9 370. A slightly cleaned up and faster Hawaii (R9 290X/290) is being introduced as Grenada and will come out as the R9 380X/380. It is unlikely that particular chip will adopt the GCN 1.2 or 1.3 architectures, but we will see some power and clockspeed improvements due to refinements to the design. This information is not confirmed in the least bit, as details are confusing. Tonga may be rebranded as Antigua, but again there is no confirmation of this.
This is the point where I could be absolutely wrong. I do not believe Fiji will be a 20 nm part as was rumored some months ago. I just do not see a chip as large as it is rumored to be, and as potentially power hungry as well, to be successfully produced on a 20 nm process that is quickly looking to be obsolete. Why develop a complex product such as Fiji on 20 nm when there will be some gains in transistor density, but those will be offset by some very nasty power and heat issues. It seems as though it would be better to design a larger part on the mature and well known 28 nm HKMG. With Tonga being released as a 28 nm part, this gives some credence to this assumption.
My best guess is that Fiji will come in around 520 mm square to 550 mm square. The area above 550 mm squared is a scary one, and certainly one that has never been addressed by AMD with their GPUs. Even their R600 based HD 2900XT was only around 420 mm square. Hawaii has been one of the larger chips that AMD has released in a while and it still is reasonable at 438 mm square. At 550 mm square, a 300 mm wafer (12 inch) will have a maximum of around 98 dies per wafer. Obviously a 520 mm square product is going to have more die per wafer. Also remember that these numbers are entirely theoretical as AMD does not disclose this particular number and could have a very different way of arranging and spacing the dies.
This is a decent representation of what 98 usable dies would look like on a wafer. Too bad we never hear about actual yields on such products. (Wafer estimate courtesy of Silicon Edge)
The yields on any part that exceeds 200 mm square typically starts to go downhill fast. Anything above 550 mm square gets really bad. This does not take into account design features and processes that can recover defective dies. NVIDIA is getting around this sticky part by offering a very expensive Titan X, while a cut down/recovered version of that chip will probably be designated as the GTX 980 Ti. AMD looks to do something similar with the R9 390X and the cut down R9 390.
AMD recently disclosed their usage of High Density Libraries with their upcoming Carrizo APUs. This netted them not only a more densely populated design, but also allowed them to improve overall efficiency at the same time. Improvements in metal layer design and power delivery have helped this a great deal. We can look at Tonga to see what kind of improvements they have netted. Tonga is comprised of around 5 billion transistors, but takes up around 359 mm square. Compare that to the older Tahiti (which powers the current R9 280 series) that has 4.3 billion transistors, it takes up more space at 365 mm square. AMD was able to pack in another 700 million transistors AND take up less space. They also have provided performance that is similar to the older Tahiti, but with a lower overall TDP. We must also remember that Carrizo is being produced on a 28 nm process as well.
AMD looks like it can integrate the rumored 4096 stream units into Fiji without blowing transistor budgets or die sizes. It also will feature the new memory controller design that can access High Bandwidth Memory. This is certainly a topic that will be very interesting to hear from Joe Macri about. We do not yet know what other features the new chip will have, but one random idea just floated through my head. I wonder if we will see hardware support inside the GPU that will address FreeSync. Ryan’s recent article that exposes the differences in sub 40 fps content between FreeSync and G-Sync might be an interesting place to start. Are there plans for a video out buffer on-chip that can store and resend a frame when a low-framerate condition occurs? That would certainly be one way to fix that particular characteristic of FreeSync.
Certainly NVIDIA has proven that they can produce a design that will hit TDP targets all the while being comprised of 8 billion transistors and 600 mm square die sizes. What is not addressed is the cost of manufacturing the entire product. We do not know the details of the interposer (it is silicon based) and its cost of fabrication. We have a general idea of what the HBM memory costs, but not specifics. What kind of defect rate does AMD expect to see with marrying all three components into a working device? Can these chips be reused if the finished product is defective? AMD is moving into uncharted territory with a very aggressive product that could certainly be class leading for the next year. There are significant risks assigned to this direction, but if it works out as expected then they will have a leg up on NVIDIA for quite some time.
HBM2 will be coming to GPUs (and other products) in 2016. (Photo courtesy of WCCF Tech)
We may certainly scoff at companies at times when we do not think that they are progressing at the rate they should be, or releasing products that are not revolutionary. This is extremely unfair. It is amazing that any of these products work in the first place. Converting sand and copper into moving pictures onto a screen is honestly mindbending. We can look back at the dawn of true 3D acceleration and see the 3Dfx Voodoo Graphics which had two chips, each around 1 million transistors, and produced on a then leading-edge 500 nm process (0.5 micron) clocked at 50 MHz (it could be overclocked to 59 MHz!!!). Now we have single chip solutions that have 8 billion transistors and produce realtime rendered images that are simply stunning.
It seems for graphics we are stuck at 28 nm for 2015. I could be wrong and AMD might produce a 20 nm Fiji part, but I believe that the evidence presented here makes a strong case that they will continue with 28 nm. Samsung has shown that it has a working and viable 14 nm FinFET process that they are producing adequate numbers of Exynos 7420 SOCs. TSMC has their 16 nm FF process node coming online, and they promise a 16 nm FF+ line that could address large, power hungry GPUs. Evidence points to these 14/16 nm products as being the basis for GPUs in 2016. 20 nm planar seems like it is a short-lived node that will not have anywhere near the impact that 28 nm did for the fabless semiconductor firms. Even if Fiji is produced on 20 nm, there will be few follow-up parts on that node due to its very nature. It could very well be that we have seen the last of the 250 watt+ TDP parts on cutting edge process nodes. If we are relegated to process technologies that will not tolerate high TDP parts, we may yet again see companies take different and unexpected routes to address the performance computing scene. This is not necessarily an unwelcome situation.