PC Perspective Podcast #348 – 05/07/2015
Join us this week as we discuss DirectX 12, New AMD GPU News, Giveaways and more!
You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.
The URL for the podcast is: https://pcper.com/podcast – Share with your friends!
- iTunes – Subscribe to the podcast directly through the iTunes Store
- RSS – Subscribe through your regular RSS reader
- MP3 – Direct download link to the MP3 file
Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath, and Allyn Malventano
Program length: 1:27:38
-
Week in Review:
-
News item of interest:
-
-
Hardware/Software Picks of the Week:
-
Ryan: Samsung SM951 NVMe SSD
-
Jeremy: GoatZ!
-
Josh: Give it a Go… or GoG
-
-
Closing/outro
Subscribe to the PC Perspective YouTube Channel for more videos, reviews and podcasts!!
waiting DX12 and how amd take
waiting DX12 and how amd take full use of it ??
If you look at Wikipedia,
If you look at Wikipedia, they list a full line of 8xxx GPUs after the 7xxx GPUs that were OEM only. They are also listing a full line of 3xx GPUs that are OEM only and a full line up of 4xx GPUs. Only the top end 4xx GPUs are HBM though, and the lower end GPUs may be rebrands/updates; I haven’t looked at it that closely yet.
If Wikipedia is correct (?)
If Wikipedia is correct (?) then the 470/470X are still GCN 1.0 GPUs. I guess that the GPUs for the 490 can not be easily cut down for the 470 market segment.
There are a lot of
There are a lot of misconceptions and plain out wrong assertions being made towards HBM.
Josh is kind of right about the Interposer but also very wrong. It is most certainly made on a 65nm process (GF). But that doesn’t mean it’s that easy to manufacture. You have to etch the TSV channels, place the micro bumps, etc. and that is certainly not a small feat to achieve. The bigger the Interposer (ready for assembly), the lower the Yields. The DRAM-stacks are less complicated and can be easily produced like any other memory chip. The stacking isn’t the biggest problem either, you just have to get it all together on the interposer.
And no, the “HBM” memory yield rumors are just bullshit. The interposer Yields could be problem, but I’ll go out on a limb and say that’s not a problem either.
2. The power consumption savings. While it remains to be seen whether the 50% lower power consumption holds true, you have to factor in the Memory Interface which is much wider but also simpler. Have a look at the “Die Stacking is happening” presentation (p. 45).
http://www.microarch.org/micro46/files/keynote1.pdf
The power consumption for the memory chip alone is only cut by half. BUT the power consumption of the PHY is reduced to a third while retaining the same bandwidth in comparison to GDDR5.
That is the most interesting thing about HBM: same or higher bandwidth at close to 1/3 power consumption.
Memory controller and RAM consume up to 1/3 of the total power consumption (or even more, exact numbers are hard to come by). So we could very well see a drop from like 50 watts or more in comparison to a GDDR5 configuration.
3. The max bandwidth should be 640 GB/s, not 1 TB/s (HBM2) and not 512 GB/s either.
Is there any specific
Is there any specific information on processes used to make the transposer? It isn’t simple, but it isn’t the same thing as etching a 20 nm device either. I would think there would be a low chance of etching defects due to the large feature size. Also, it is (AFAIK), a passive device, so it would be metal interconnect layers only, and possibly some capacitors. It will not have that many layers since it does not have transistor device layers. I was assuming only a small number of metal layers, similar in number to what you would have on a PCB.
HBM is a very wide interface, so the number of micro solder balls is going to be very large. I would expect most defects from this soldering process; both for the gpu and the memory stack. I do not know what the exact die stacking process is for the stacked memory. I would assume that they create a stack from known good memory die and logic die, and then test this stack before soldering this to the transposer. If this is the case, then defects during stacking of memory chips is not that important, since it will not make it to the final transposer. You would only waste the memory chips, which isn’t good, but much better than scraping a full gpu and transposer; the memory die are very small.
I would be interested in knowing the actual micro bump count. If it has a 4096-bit memory interface, then I would assume that the micro bump count is close to 5000. Routing on the PCB should be relatively trivial though since it is just the pci-e interface, power/ground, and the video outputs. Using the transposer adds quite a few manufacturing steps, all with the possibility of introducing a defect, so transposers may not be usable for anything but the high end until the process matures. We still do not know whether Fuji is 28 nm or 20 nm. If yields are low, it could be because of the 20 nm process for the gpu, rather than issues with the transposer.
As I said, it’s going to be
As I said, it’s going to be 65nm@GF/Amkor. They (GF) will produce the silicon, etch the TSVs and then Amkor takes over, forms a land pad for the bumps and so on…
You can take a look here: http://www.amkor.com/go/Kalahari-Brochure
They already produced several test vehicles with a 65nm Interposer and 28nm Chips (e.g. ARM CPUs) and Amkor has lots of pdfs on their website going into detail.
I’m not sure about the amount of bumps, but i would think it’s going to be somewhat higher. Compare the Virtex-7 2000T which had about 200,000 micro bumps (50,000 per “die slice”) with 1,600 bumps for the ARM core test vehicle, iirc. You have 5 dies they need to be connected to the main die, the main die needs to be routed through to the substrate.
And yes, i think the DRAM stacks are one of the lesser problems. Sure they need TSVs but they are produced on the same process as other memory chips, that’s why i think the HBM yield issue thing is not true.
And ofc one has to understand that “HBM Gen1” is actually a process limitation by Sk hynix not some kind of technology advancement per se. With HBM Gen2, they will jump to a new process and therefore density increases and you get more capacity per die.
Thank you for the link to
Thank you for the link to real information. The kalahari technology demonstrator is 26×32 mm; 832 square mm. The link also talks about this being reticle-sized, so it seems that the size of the reticle is a limitation, but this is quite large. It seems Josh is pretty far off in his description.
The virtex-7 2000T is a multi-chip fpga; this requires significantly more interconnect than a memory interface, and is very expensive. It would be similar to making a multi-chip gpu that acts like a singe chip gpu. If it is a 4096-bit interface then you obviously need 2 bumps per bit, one on the gpu and one on the memory. This is would put the gpu in the 5k range by itself, but around 10k for the whole transposer. I don’t think it is going to be anywhere near the number required for the multi-chip fpga.
That’s what I meant. It’s
That’s what I meant. It’s certainly not going to be 200k bumps but I suspect it’s more than 5k.
About the Interposer size. You have to remember that there are 4 HBM stacks and the GPU die. You need some space between the dies because of thermal constraints for example.
Each stack is close to 42mm² big (5,48mm x 7,29mm~40mm² plus safety margin) per SK hynix (http://www.hotchips.org/wp-content/uploads/hc_archives/hc26/HC26-11-day1-epub/HC26.11-3-Technology-epub/HC26.11.310-HBM-Bandwidth-Kim-Hynix-Hot%20Chips%20HBM%202014%20v7.pdf)
So the stacks eat away from the area for the GPU die. You want the GPU die to be as near to a square as possible, otherwise you lose to much space on the wafer.
All in all the GPU die should be under 600mm² considering safety margins, space left between the dies and so on.
Speaking of just the gpu, not
Speaking of just the gpu, not the entire interposer, you have 4k bumps for the memory interface. It will take some extra bumps for the memory interface beyond just the 4k data lines, but I wouldn’t think it would be that many. The pci-express link and the display port outputs are both relatively low pin count interfaces. There could be a lot of power and ground bumps; I don’t know if this number goes up significantly due to the small size of micro-bumps vs. regular bumps on the exterior package. The entire interposer would have at least around 10k bumps since more than 8k would be needed for the memory interface. Anyway, it is unclear what the cost structure is like. I know the FPGAs using these interposers are very expensive, but these require massive amount of interconnect to allow programming a single design across multiple chips as if it is a single FPGA. If the maximum is 200k, then would there be a meaningful cost difference between 5k vs 10k?
Doing some ratios with the
Doing some ratios with the number of shaders (4096 vs. 2816) and the die size of hawaii (438 mm²), it seems like fiji could be close to 600mm² on 28 nm. They could have decreased the size of the shaders though by decreasing 64-bit FP support or other mechanisms. If the water cooling requirement is true, then they may not be quite as subject to thermal constraints.
You can’t just ratio Hawaii
You can’t just ratio Hawaii to Fiji. They have completely different Memory Controllers.
That would just make Fiji
That would just make Fiji smaller, which means that the 600 mm2 limitation is probably not an issue.
That would just make Fiji
That would just make Fiji smaller, which means that the 600 mm2 limitation is probably not an issue.
If there is a yield issue, I
If there is a yield issue, I would expect this to be on the gpu side. Especially if AMD actually did go with a 20 nm process for this gpu.
AMD will use 28nm. No one
AMD will use 28nm. No one will manufacture big die 20nm GPUs(let alone any other GPU, except those found in SoCs), GF 20nm process is M.I.A., or may only be used to manufacture die shrinked console chips.
AMD also cancelled the 20nm chips so that process is pretty much dead.
And no there should not be
And no there should not be any yield issues. That process is 4 years old now. You might have binning issues but yield should not be a problem at this point.
AMD has made risky moves
AMD has made risky moves before, pushing both design and process tech. The decision to target 28 or 20 nm would have been made quite some time ago when it may not have been as clear how much trouble 20 nm was going to be. The high-end chips at $500 range are actually low volume, so AMD may have taken the risk. Given the rumored specs, they key have needed to go 20 nm to fit it on the transposer. Anyway, unless you have inside information, I don’t think we can be sure yet.
I don’t have insider
I don’t have insider information but since Nvidia abstained from manufacturing 20nm GPUs and the fact that TSMCs process is pretty much fully booked by Apple and QC we can safely assume that there won’t be coming any chips from TSMC.
Globalfoundries 20nm process on the other hand has been MIA for like… forever. The last signs of somewhat life are from last year, when a GF manager claimed 20nm would not be a high volume node. And now AMD cancelled their 20nm APU chips
A few weeks ago bitsandchips.it came out with an article claiming (Google translate):
https://translate.google.com/translate?sl=auto&tl=en&js=y&prev=_t&hl=de&ie=UTF-8&u=http%3A%2F%2Fwww.bitsandchips.it%2Fhardware%2F50-enterprise-business%2F5522-20nm-di-globafoundries-missing-in-action&edit-text=
So there is lots of evidence that 20nm@GF is dead and therefore it’s highly unlikely that we will see high performance GPU chips, let alone any standalone GPU chip from a 20nm process.
Oooh, thanks for the link. I
Oooh, thanks for the link. I have been obviously speaking off the cuff about transposers, as I hadn't run across anything with these kinds of details. All I know is that it is a piece of silicon that has been processed at a larger/older node. Cool to see real info that goes into some detail.
From my understanding, the yield problem isn't from any individual piece of silicon… it is the "bonding" process of the GPU, memory, and transposer. That apparently is quite hard to get right.
Given the size of the solder
Given the size of the solder micro-bumps, the alignment would need to be incredibly precise. I would be surprised if they would actually start shipping a product like this if the yields of the final bonding process are so low. I assume that the entire package is garbage, including an expensive GPU, if there is a defect in the bonding process.
I would be very curious if
I would be very curious if they could refloat the chips from the transposer and then re-ball them? They could salvage all the dies that way, but would it be worth it? What kind of damage would we expect? So many questions with this particular technology.
See!
You’ve implemented an ad
See!
You’ve implemented an ad in the video without Google requiring to put their nose in or worry about them being blocked.
And besides that was nice and neat, on the topic and, filmed by you. 🙂 Even 2 years from now, it will still be worth watching and be compared with their newer models. 🙂
with regards to freesync the
with regards to freesync the benq monitors offer 40-144hz so I think the constrant is with the display and also its recalled so maybe this is part of why its not running the full range
GOAT Z turned out to be DLC
GOAT Z turned out to be DLC for goat sim, $5.49cdn boooooooooooooo
Synergy has been around for
Synergy has been around for years as a free microsoft download called remote desktop connection manager.