Sorry for all of these single item news posts I keep making, but this is how the information is coming out about AMD's upcoming Fiji GPU using new HBM (high bandwidth memory) technology. (And make no mistake this is exactly the way that AMD marketing dreamed it would happen.) Below we have an image of Fiji: the GPU die, the interposer and the four stacks of HBM.
Look familiar?
That chip is massive, quite simply, measuring about 70mm x 70mm based on the information presented during our HBM technical session last month. That is gigantic when compared to other GPU dies alone but is smaller than previous generation GPUs and the required memories on the PCB separately.
In case you missed it earlier today, AMD also released a teaser video of a CG Radeon card using Fiji. We'll know everything (maybe?) about AMD's latest flagship on June 16th.
USA vs. Canada
Bacon vs.
USA vs. Canada
Bacon vs. Canadian Bacon
GTX 980Ti (USA) vs. Fiji XT (Canada)
I love America, bacon, and GPUs that don’t suck. Now AMD might get bonus points because you could likely cook bacon on that toasty GPU.
HBM 2.0 will be exciting with speed AND capacity.
Which ever company relies
Which ever company relies less on compression technics to get more bandwidth with brute power out of there gpus will win.
That is the stupidest thing I
That is the stupidest thing I have read in a while. The 970 alone has fubarred AMD and completely destroyed their ales figures and profit margins. Hint, the 970 doesnt require brute power or even high power. The 980 and 980ti have more brute power and compression. Fury and Nano sales just arent there. Sadly, i think we may lose AMD in its current form. Hopefully someone buys them out and does something with that company.
What they should have done
What they should have done was hold up a previous generation’s mainboard with its GPU and memory, next to this interposer module with its HBM stacked memory and let the audience note the space savings. What could not be seen is the 4 massively wide 1024 etched traces leading to each HBM stack, that if they where laid out the old way on the PCB would completely cover the PCB in traces, and require a much thicker PCB with enough added layers to make the PCB prohibitively thick, and impossible to reasonably manufacture. That HBM memory is definitely the way to go, and wide data buses are now a reality for more than just GPUs, I’m waiting for HBM to hit the APU market, and give the CPU some of that extra bandwidth goodness. Wait until the Dual GPU monsters on even wider interposer modules begin appearing, now imagine multi thousand wide traces between 2 of these GPUs, to go along with individual 1024 traces wide buses to each GPU’s HBM stacks. The old fashioned way with the dual GPUs on their own sockets and much narrower simi-direct GPU to GPU connection, is now going to give way to an actual direct GPU to GPU connection with no reduction in the amount of traces connecting the dual GPUs, it will be effectively as if they were made from one giant monolithic die, for all point and purposes, with the introduction of the silicon interposer. I can’t wait to see the dual GPU interposer photo when it is leaked, and AMD probably already has some very nice images of the dual GPU follow up for this single GPU introductory SKU.
There are limitations you do
There are limitations you do not seem to be taking into account. The size of th interposer is limited, and the one AMD is making for Fuji is probably close to the limit. It is limited to the size of reticule, which is around 830 square mm, I believe. They can be made bigger, but the price would go up significantly. For two giant GPUs (~600 square mm each), the interposer would need to be gigantic. Also, If something goes wrong during production, you could lose 2 GPUs at once. There could also be issues with supplying 500 to 600 W to such a device.
You will probably not see any such multi-gpu interposer. Using the interposer does free up almost all of the off package interconnect though. This will allow for very fast connections between multiple interposers. It will most likely be something like PCI-e signaling, but it could be significantly wider since the memory interface does not need to be routed out of the package. Nvidia has already released info about their plans for such an interconnect. It is called nvlink.
It may be plausible to make a multi-chip gpu, but each die would need to be much smaller. Yields may be better if they made a ~150 square mm die and placed 4 of them on the interposer. This may be expensive to design though, since the individual die may not be usable as an independent gpu. It may still be economical though if yields on 20 nm and 14 nm are not good enough for large GPU die.
The interposer is not limited
The interposer is not limited by the reticule size. As covered here: http://wccftech.com/fiji-xt-limited-4gb-memory/
Also a link to the hotchips video with Bryan Black talking about interposers being made to any size required by the package: https://youtu.be/9KjtVjis8Ps?t=8130
“They can be made bigger, but
“They can be made bigger, but the price would go up significantly.”
This is what I said in my post. While the reticule size isn’t a hard limit, it is still a limitation.
In the linked video, he says “Interposers are going to be very, very large, right, and they are going to be larger than the reticule size. So we’re going to see interposers as big as 40×40, 50×50. Whatever it takes […].”
A 600 square mm die would be close to 25 mm on a side; even a 400 square mm die would be 20 mm on a side. Most recent high-end GPUs have been over 400 square mm. Reticule size seems to be around 26×32 (Amkor reticule sized demonstraitor). Two high end GPUs would definitely be pushing the limits since it would need to be at least 40 mm on a side. Just because it is technically possible doesn’t mean it is economical to actually make such a product at this time.
I have no idea what the cost increase for going over the reticule size will be. If the cost go up to thousands (or more) dollars per device, then you are effectively limited to the reticule size for any consumer level product. There may be people willing to pay $1000, but not so many willing to pay $10,000. Bottom line, I don’t expect any multi-gpu interposers anytime soon, and when we do see them, they may be much smaller die which will mostly be used to increase yields. Yields go down very quickly with increased die size, so using multiple smaller die could increase yields significantly.
All of the info I have seen indicates multi-gpu will be using board level interconnect for at least the next few generations. This can still be high bandwidth with high-speed serial links (nvlink, for example), but it will not have the low latency and low power of interconnect through the interposer.
Well how else is there a way
Well how else is there a way to connect the 2 GPUs with sufficiently wide traces to not have the 2 dies communicate over some slower protocol over narrower buses. You just can not get super thin massively large numbers of traces made on the standard PCB technology, we are talking about connecting the 2 GPUs over connections thousands of bits/traces wide so as the give the 2 GPUs the ability to take advantage of the inter GPU bandwidth of terabits per second. With the silicon interposer becoming the de facto mainboard hosting the two GPUs and their respective HBM memory stacks. The trace size limit on silicon is as small potentially as the actual circuits themselves on any processor. I am talking about 2 separate dies interconnected to each other with direct connections wire for necessary wire to allow the for the GPUs to act as one giant monolithic die based GPU, and no just because one potential component is not functional does not imply that the rest of the components stacked/attached to the interposer would be forfeit that’s the beauty of the interposer, with various components able to be placed close the each other interposer and connect with massively wide direct connections, as if they were made from one die. Hell 3 or 4 smaller GPUs could be connected in this way to equal the processing power of 2 big fat GPU dies with the interposer allowing even more modular scalability. GPU specialized modular units could be fabricated separately and attached via the interposer’s massively wide fabric to give gaming whole complexes of GPUs acting as one coherent unit, and the potential for making some HPC/workstation APUs with CPU blocks attached on the interposer’s massively wide interconnects along with the necessary GPU/s and HBM stacks should give an indication of where things are heading for interposer technology.
We are not even yet talking about having intelligent interconnect and circuitry for inter processor communication and cache coherent networks etched out on the interposer portion to assist the multitude of HBM, GPU, CPU, and other functional blocks that can be fabricated separately, on the fab processes that best suits the functional units workloads, and will be dropped in and connected via the interposer’s massively wide interconnect abilities. The interposers will eventually supplant the mainboard as the main host for the processing/other units that are currently attached over much narrower PCB interconnects/fabrics, the old way requires more encoding/decoding via the standard communication protocols, a latency inducing step, while with interposers the sheer numbers of traces and closeness of placement of the various GPU, CPU, HBM, other units will allow for more direct wire for wire connections between these and other units, at the speeds of a main internal bus, and not the off chip slower narrower buses on PCBs that currently used.
Eventually all of the various functionality of computing platforms will be on the silicon interposer module the low latency bandwidth can not be achieved with the old PCB connection technology, expect that eventually the only wires coming directly off of the interposer will be power, and optical communication wires, with the PCB portion only hosting the module, and optical wires to external flash/other storage, the PCB itself will be there mostly as an attachment point/power trace distribution for external flash/other memory, peripherals and various other standard optical/electrical external connection sockets, and as attachment to the PC/laptop/mobile device’s case. PCB’s are going to get much smaller and very simplified with all the major inter-chip connection complexity taken over by the interposer.
First paragraph: I think that
First paragraph: I think that is close to what I said. I am not saying we will not get multi-gpu interposers, I am just saying that multi-gpu solutions for the next few generations will almost certainly not be multiple GPUs on an interposer. In the short term, you will not see two big die (>400 sq. mm) on an interposer since this will not fit on a reticule sized device. As the technology matures, I would expect larger devices to become easier to make and cheaper, but this is the first generation. Also, to take advantage of interposer interconnect and have multiple GPUs act more like a single gpu, you would need a completely new gpu design. They may be working on this, but since there is no information that I know of, it is probably years out.
If they were going to design a gpu in slices (as they are doing with FPGAs), you obviously need to design the slice completely differently from a monolithic gpu. The amount of interconnect may go up significantly. Right now, Fuji is actually on the low side compared to the amount of interconnect possible through the interposer. Most of the micro-bumps will actually not be carrying any signals. We do not know the characteristics of the process yet. Obviously, as die get larger the chance of a defect in any one die goes up significantly. As the number of micro-bumps carrying signals goes up, I would assume that the chances of defects also goes up significantly. I would expect a big difference in yield of a device with 10k active micro-bumps vs. one with 100k active micro-bumps. This means that a “sliced” gpu may have very low yields, making it a ridiculously expensive device. As far as I know, once you attempt to solder the gpu to the interposer, it is not recoverable in the case of defects. I have heard that the sliced FPGA device cost over $10,000 each, but I don’t know for sure.
Interposer technology will make a big difference in a lot of markets, but it still has limitations. It has great potential for mobile where you could technically move most of the chips that are on the motherboard onto the interposer and reduce the power consumption while increasing bandwidth. This will not happen overnight though. Ever chip that you move onto the interposer has to be redesigned to take advantage of the different type of interconnect.
I wonder if they could have
I wonder if they could have put 6 stacks of memory. There seems to be a bit of wasted space.
Memory controllers on the GPU
Memory controllers on the GPU itself most probably can’t take more than 4 hbm:s. And it probably has 4 MC:s(one on each side); so 6 hbms would make it unbalanced memory config, if you even could share one with two memory modules.
they can. but they have to
they can. but they have to halve the memory speed.
so if they wanted 8gb on HBM1. they would have to make more channels
and it would go from 600_gb/s to 300. back down to DDr5 std’s and make it pointless. but then again 4gb is pointless for such a big chip. no point to buying this card. at all
Ideally, if they went to 6
Ideally, if they went to 6 GB, they would add another memory interface. It seems that the memory interface is actually quite a bit smaller than the GDDR5 interface. I suspect that 4GB will be fine with the optimization work they have done.
I wonder if they could have
I wonder if they could have put 6 stacks of memory. There seems to be a bit of wasted space.
If you pull the heatsink off
If you pull the heatsink off of a card, will it actually look like this? I would assume that it will have a lid that covers the entire package but it is unclear how delicate the interposer is after it has been mounted to a package.
I wonder if the HBM modules
I wonder if the HBM modules are taller than the GPU core. How would that affect the cooler design and could you continue to use CPU coolers like the NZXT Kraken G10.
Someone from the reporters
Someone from the reporters must be very happy for bringing his telephoto in the press conference and glad he does know how to focus that camera pretty quick.
It is amazing that those four little HBM chips are the size of just one GDDDR5 chip. AMD is one year ahead of Nvidia considering that Pascal is coming at the 3rd quarter of 2016. Of course if Fiji is faster than Titan X, Nvidia could accelerate Pascal development. If Fiji is not faster than Pascal, I guess it wouldn’t really matter.
Video from the moment Su shows Fiji
https://www.youtube.com/watch?t=322&v=QQ92qWdVLsM
[quote]AMD is one year ahead
[quote]AMD is one year ahead of Nvidia considering that Pascal is coming at the 3rd quarter of 2016.[/quote]It’s not quite so clear-cut. Pascal is aimed at HBM2 and has a whole heap or architectural changes (e.g. NVlink), whereas all signs point to Big AMD Mystery Card being GCN1.2 at its core with the GDDR5 interfaces swapped out for HMB1 interfaces. This will give AMD a head-start in functional testing and experience with HBM1, but depending on how much HBM2 differs in terms of interface constrains (and how much testing Nvidia is doing with demi-Pascal devkits) this may not may not be useful for GCN2 (or whatever AMDs next architectural change turns out to be).
So huge, so big and sexy, I
So huge, so big and sexy, I want it inside my socket. Oh it doesn’t fit? MAKE IT FIT!!!!!
ok, now had a good look, THAT
ok, now had a good look, THAT IS FUCKING SICK. Next evolutionary step in our offspring’s history
Knowing that HBM is 5x7mm it
Knowing that HBM is 5x7mm it seems that Fiji is as big as Titan but no transistors wasted on complex cache. Hopefully all this water cooling gossips indicate the the chip can be clocked at 1.5Ghz or more. Fingers crossed.
spoiler alert. ITS FAKE. Roll
spoiler alert. ITS FAKE. Roll with me Ryan and PC Per
fact 1. they send working cards (plural) to DICE engineers they “leak photos”
Fact 2. AMD show up to computex essentially empty handed.
Fact 3. AMD has been quoted saying they havent even got a bios finalized, and drivers are still being tweaked 10 days out from launch.
So that wouldn’t matter. if they had built cards. there is no reason not to show it. instead they show off this chip. Now this chip I have questions about. there are no markings. this is a PROP. every chip has a part number, where its made, where in the wafer it came from. all of that. every AMD GPU/CPU has it, every NV has it also!
again no card at computex. but Cards (plural) are at EA / DICE’s in sweeden. That makes a lot of sense doesnt it! NO it doesnt. that was a prop as well
AMD dont even have working cards That is the truth!
which makes sense. AMd sent its only working GPU’s to sweeden. or they dont have any real cards to show. Its the later.
AMd promised available ships for E3. THAT IS NOT GOING TO HAPPEN.
they are having trouble with HBM. and drivers. and the release of the 980ti. has made AMD try to put the clocks even higher. as AMD were planing on charging way more than the 980ti. yet its performing worse!
AMD did the same thing to Ryan Shrout at CES. when they had that next gen card that no one could see. well that was revision A1. and Titan was just released. and AMD pooped themselves!
No fanboy here. I want AMD to win. but they have made so many errors. they have been terrible to AMD fans. and Im really over this.
couple that with 4GB and only 4GB. its a good chance ill be skipping this and getting the Ti. and waiting till HBM 2 comes from both NV and AMD for a good comparison.
Thats if AMD stay open. Broadwell just SMOKED AMD. and now NV smoked them with TI. in price and performance. AMD didnt see this coming. and its going to be another rough 6 months min!
Yes/No question.
Do you work
Yes/No question.
Do you work at AMD?
bro, the story.. is it cool?
bro, the story.. is it cool?
Oh please, dude really??
Oh please, dude really?? Utter tosh…
Your “facts” are based on
Your “facts” are based on speculation, maybe you should go work for the National Inquirer and yell idiocy with those clowns.
Obvious FUD. I have no way
Obvious FUD. I have no way to prove anything about the readiness of AMD’s drivers other than to say that this is still a GCN architecture, just bigger and with a faster memory interface. I dont think drivers are an issue.
This one can easily be refuted though:
“So that wouldn’t matter. if they had built cards. there is no reason not to show it. instead they show off this chip. Now this chip I have questions about. there are no markings. this is a PROP. every chip has a part number, where its made, where in the wafer it came from. all of that. every AMD GPU/CPU has it, every NV has it also!”
The markings go on the external package. The individual chips on the interposer are not external packaging. You are looking at the bare silicon of the gpu and memory die.
Obvious FUD. I have no way
Obvious FUD. I have no way to prove anything about the readiness of AMD’s drivers other than to say that this is still a GCN architecture, just bigger and with a faster memory interface. I dont think drivers are an issue.
This one can easily be refuted though:
“So that wouldn’t matter. if they had built cards. there is no reason not to show it. instead they show off this chip. Now this chip I have questions about. there are no markings. this is a PROP. every chip has a part number, where its made, where in the wafer it came from. all of that. every AMD GPU/CPU has it, every NV has it also!”
The markings go on the external package. The individual chips on the interposer are not external packaging. You are looking at the bare silicon of the gpu and memory die.
“AMD were planing on charging
“AMD were planing on charging way more than the 980ti”
Just about the only part of that rant that made any sense.
The 980 Ti launch has me convinced that:
(a) Fiji is close to the Titan X in performance.
(b) AMD could’ve priced Fiji at $799 or $849, but was denied by Nvidia going aggressive with their 980 Ti pricing.
I wish AMD showed up with more than one competitive product every once in a while.
I am hoping that AMD will
I am hoping that AMD will have more than one HBM part. Wikipedia, which is probably mostly incorrect, is listing a part with one HBM stack disabled. This would still be a 3072-bit interface. It would be interesting if they have a cut down part at the $650 price range with the full part at $850.
Are the HBM modules the same
Are the HBM modules the same height as the GPU core? is it going to require hella custom water blocks? No more NZXT Kraken G10 I guess.
So, basically, judging by the
So, basically, judging by the picture alone you can deduct that if they move lower two chips just slightly up a bit, they can cram two more in there. Eight would probably not fit (until new tech-process/another die shrink, that is), but it’s pretty obvious by looking at that picture that they AT LEAST can do 6GB.