Samsung is ready to roll out the next generation of High Bandwidth Memory, aka HBM2, for your desktop and not just your next generation of GPU. They have already begun production on 4GB HBM2 DRAM and promise 8GB DIMMs by the end of this year. The modules will provide double the bandwidth of HBM1, up 256GB/s of bandwidth which is very impressive compared to the up to 70GB/s DDR4-3200 theoretically offers.
Not only is this technology going to appear in the next genertation of NVIDIA and AMD GPUs but could also work its way into main system memory. Of course these DIMMs are not going to work with any desktop or mobile processor currently on the market but we will hopefully see new processors with compatible memory controllers in the near future. You can also expect this to come with a cost, not just in expensive DIMMs at launch but also a comparible increaset in CPU prices as they will cost more to manufacture initially.
It will be very interesting to see how this effects the overall market; will we see a split similar to what is currently seen in mainstream GPUs, a lower cost DDR version and a standard GDDR version? The new market could see DDRx and HMBx models of CPUs and motherboards and could do the same for the GPU market, with the end of DDR on graphics cards. If so will it spell the end of DDR5 development? Interesting times to be living in, we should be hearing more from Samsung in the near future.
You can read the full PR below.
Samsung Kick-Starts Next Generation of Memory with High Bandwidth Memory DRAM
Market leading Samsung has started the race to bring the next generation of RAM memory to market, with the innovative High Bandwidth Memory (HBM2) DRAM now coming out of its factories.
The new DRAM is being heralded as the ‘latest and greatest’, and will offer twice the bandwidth of the old HBM1 memory. Users will get data in and out at 256GBps, and there will be various capacities available. Though the memory modules have already gone into production, the Korean manufacturer isn’t saying when the industry-first 4GB DRAM modules will hit the shelves – but they have guaranteed an 8GB version before the end of 2016.
Data Memory Systems, one of the US’ leading providers of computer memory solutions, is renowned for keeping its finger on the pulse of all the latest goings-on in the world of tech. The experts at the firm are looking forward to welcoming the next generation of DRAM to their stock.
A spokesman for Data Memory Systems says, “Technology moves faster than ever nowadays, and we’re always poised for the latest developments and breakthroughs. This announcement by Samsung marks the beginning of next-generation DRAM, with speeds up to seven times faster than its predecessors.”
The spokesman adds, “The memory has only just been put into production and will likely become available over the course of the next year – we eagerly anticipate being able to sell these modules on our site, which is renowned for being one of the US’ premier memory suppliers for business and individuals alike.”
The new DRAM has been produced with an eye on high-performing computing, advanced graphics and network systems, and enterprise services. According to Samsung themselves, the rapid adoption of high-performance computing systems by IT companies across the globe was the key factor in them deciding to produce a memory module that’s suitable for the demands of these systems.
The new modules will also be effective for those who seek components for high-end gaming computers. The DRAM will enable graphics-intensive applications to run flawlessly, which will be ideal for the die-hard PC gamers out there that don’t want to compromise when it comes to their tech.
A further announcement from Samsung on the HBM DRAM modules is anticipated over the coming months. Until then, those seeking a computer memory upgrade will find a multitude of options at Data Memory Systems, from external hard drives to RAM modules for any device.
“Of course these DIMMs are
“Of course these DIMMs are not going to work with any desktop or mobile processor currently on the market but we will hopefully see new processors with compatible memory controllers in the near future. You can also expect this to come with a cost, not just in expensive DIMMs at launch but also a comparible increaset in CPU prices as they will cost more to manufacture initially.”
HBM/HBM2 requires an interposer, so not much to do with DIMMs! Those 4 1024 bit HBM memory connections are etched out on an interposer’s silicon substrate, and the HBM placed right next to the processor(CPU or GPU/Other), and they are attatched at the point of interposer assembly, so not so much user accessable, or Gimping OEM accessable, once AMD begins to make APUs on an interposer package! That HBM2 with its 8GB will come in handy for APUs, and is sure to stop any Laptop OEM from gimping of the primary RAM subsystem of any future interposer based APUs, or SOCs!
The DIMMs are assumption on
The DIMMs are assumption on my part to go with the hypothesis that HBMx will come to desktops at some point. I really think Intel and AMD will have to find a way to continue to sell upgradable RAM. It would be hard to go to market with the new AMD 7777GQXP APU, available in 8GB, 16GB and 32GB models … or take it to the max with the AMD 9999XXXX which runs at a higher freqency and comes in 32GB, 64GB and 128GB models.
For mobile maybe, though even still buying a laptop with a an amount of RAM fixed at purchase … well that's for Apple-lovers, not PC users!
Not original poster, but just
Not original poster, but just wanted to say that you cannot route a1024-bit connection through a DIMM slot. We don’t really have a next generation of upgareable memory. I suspect we will see a lot of systems with fixed size memory. Even if they use HBM stacks on a memory module, it would still be limited by the DDR4 interface. HMC is a partial solution, but as far as I know, it is not meant to be routed through a socket or slot. A solution similar to HMC could work, but I haven’t seen anything about making an HMC memory module. It is all board mounted BGA packages.
Not original poster, but just
Not original poster, but just wanted to say that you cannot route a1024-bit connection through a DIMM slot. We don’t really have a next generation of upgareable memory. I suspect we will see a lot of systems with fixed size memory. Even if they use HBM stacks on a memory module, it would still be limited by the DDR4 interface. HMC is a partial solution, but as far as I know, it is not meant to be routed through a socket or slot. A solution similar to HMC could work, but I haven’t seen anything about making an HMC memory module. It is all board mounted BGA packages.
I’d love to see the PCB
I’d love to see the PCB crowded with the 4096 traces it takes to serve HBM/HBM2, but the interposer technology is ready made for HBM, and that is not only for Apple systems, there is nothing to stop a desktop APU on an interposer SKU from having HBM memory and some to PCB based memory via more narrow standard RAM(DIMM) channels to secondary RAM(that is what the server APU based SKUs are going to have with up to 32GB of HBM and some secondary standard channels to more standard DIMM based RAM!
It depends of how many desktop PC users need more than 32GB of HBM, and there is nothing in the JEDEC standard preventing APU/SOC systems on an interposer from having more than 4 HBM die stacks, it will just be a matter of designing the memory controller for more HBM stacks! The interposer has plenty of ability to host tens of thousands of traces with the interposer itself being made of silicon, much more than any PCB will be able to accommodate!
So for laptops, an APU on an interposer, and for PC systems the same with maybe some extra standard DIMM channels to secondary RAM for those that need even more than 32GB of HBM! Where the interposer will really shine for APUs on an interposer is having thousands of traces from those Zen cores directly to the big fat Polaris/Vega GPU, and some hell of a bunch more CPU to GPU bandwidth than ANY PCIe x16 or more connection that is currently using on gaming systems! More bandwidth at much lower, and power saving, clock speeds for the memory subsystems with HBM!
Needs more exclamation
Needs more exclamation points.
We are going to see memory modules with stacked die, but it will unfortunately probably be limited to the DDR4 bus for connection to the APU. We could use a new standard that uses narrow, high speed links, but I don’t know if we will see that. I suspect most consumer systems will be stuck with a limited amount of memory and you will have to pay a lot extra to get an expandable or upgradeable system.
Put an interposer based APU
Put an interposer based APU with its HBM and GPU on the main board and keep the channels to the DIMMs for those that need even more memory! Most laptop users would be happy with probably 16GB on HBM, with the top end laptops offering 32GB of HBM, let the PC based systems have one or two standard narrower memory channels to secondary DIMM RAM for whatever amount of extra secondary RAM the user needs in addition to the up to 32GB of HBM on an interposer based APU/SOC. HBM3 is sure to be around the corner with even more memory capacity! Hell AMD could even add some FPGA compute to the HBM stacks themselves for adding the latest DX##, or Vulkan updates onto programmed into the FPGA, AMD has a patent filing for FPGAs added to the HBM stacks for in memory distributed compute!
I suspect the 32 GB APUs will
I suspect the 32 GB APUs will be exclusively for the HPC market and will be priced too high for the consumer markets. My guess would be only single or maybe dual stack HBM for the mobile market with HBM2. Also, I don’t know when, or if, we will get a socketed HBM based APU for the PC market. Intel tried to keep their crystalwell chips (includes 128 or 64 MB DRAM chip on package) mobile only. This is probably because they are significantly mor expensive to make, so the higher prices commanded for mobile chips made more sense. They would have much lower margins if sold in the socketed PC market. It is unclear what the economics of HBM based APUs will be like. Just don’t get your hopes up for a cheap, high memory capacity, HBM based APU next year. If these are out by then, they could command the price of a high end graphics card, a high end CPU, plus a large margin which could mean something like $2000 dollars or more.
HBM and Interposers are not
HBM and Interposers are not that expensive, if AMD can use it on $500 GPU then is will not cost that much for a consumer APU on an interposer SKU. Samsung making HBM for AMD and Nvidia will mean the prices of HBM will be coming down as the numbers of HBM equipped Processors begin to make the economy of scale allow for downward HBM pricing! The HPC market will allow for even more economy of scale and both AMD’s HBM partner and Samsung/others will be ramping up production to supply a market that will be growing rapidly over the next few years! Maybe not much 32GB HBM for the consumer market, but probably more 16GB and smaller for PC and laptop based APUs on an interposer!
If Apple decided to go with an APU on an interposer for its MacBooks it could secure enough HBM for its needs, even if that means Apple paying for some extra HBM capacity to be brought online!
If you put a high-end GPU and
If you put a high-end GPU and a high-end CPU in a single package, it will be expensive. The margins on a high end CPU are a lot higher than a GPU. For GPUs, they are going to be segmented based on how much 64-bit vs. 32-bit hardware that they have. We may not even get such an APU with a consumer level GPU. The GPUs with large amounts of 64-bit hardware can go for thousands of dollars. Intel has, I think, 3 different Haswell Xeon die, an 18-core, a 12 core, and an 8 core. For the 12 core and above, it gets expensive fast with prices ranging from around $2000 all the way up to close to $7000. These are around 400, 500, and 700 square mm die size which is how big a high end GPU is. The AMD Fiji GPU is 596 square mm, and they sell for $650 for the whole card.
It is desirable to segment the market to increase profit margins, so they have to hold some features back. Maximum memory size has always been a feature used to create market segmentation. I would guess 32 GB products will be high-end HPC devices only. We may see 16 GB high-end GPUs, but that 16 GB will be paired with a GPU with minimal 64-bit capabilities. I don’t know if we will see 16 GB APUs anytime soon. It may be a while before we see HBM based APUs at all; maybe late 2017. They also keep things like ECC for non-consumer devices, which I find really annoying. IMO, system memory should have gone all ECC protected a long time ago.
Well for the gaming SKUs the
Well for the gaming SKUs the GPU can get away with less of the 64 bit GPU execution resources, and it’s the interposer’s ability to more directly wire up the CPU to the GPU via the many thousands interposer traces that will make a gaming APU on an interposer popular! I would imagine that the CPU and the GPU wired up together on an interposer could probably be able to transfer entire cache lines back and forth over some very wide channels with the CPU and GPUs cache subsystems acting as one unit for even better communication, with the CPU/GPU processing systems’ Cache Controllers directly transferring data/kernels in the background in a completely coherent fashion to keep the workloads going along with a minimum of latency.
AMD will have less yield issues for these interposer based APU with the CPU’s and GPUs coming from separate die fabrication lines, so under the old methods some totally bad GPU units mean that the CPU cores have to be sacrificed, or GPUs sacrificed because of bad CPU cores on monolithic die based APUs, for parts that could not even be binned.
AMD’s APUs are gradually merging the CPU’s functionality with the GPU’s functionality and over time the CPU cores on the APU systems may just be able to directly dispatch floating point/other instructions to the GPU with a minimum amount of latency. There is no current CPU to GPU PCIe connection that will be able to compete with a more direct interposer based connection between CPU and GPU with such low latency, as latency inducing PCIe based protocols have to be employed to transfer any data over PCIe, while an interposer based solution with thousands of traces directly connecting CPU to GPU will not have the need for any extra layers of latency inducing protocol encoding/decoding to get the data back and forth between CPU and GPU!
For a lot of games an APU on an interposer will be all that is needed, but there can still be systems with extra PCIe/other based slots for extra GPU resources.
A better solution may just be putting an interposer based gaming APU on a PCIe card and let that become the gaming platform, with the motherboard CPU out of the gaming loop mostly except for a support role and to host the main OS. An interposer based gaming APU on a PCIe card would be able to host its own streamlined gaming OS, the game, and gaming engine and be able to run the games. One benefit to this is that users with multiple PCIe slots could add more than one gaming card and get more CPU cores to go along with the extra GPU power! This would essentially be a gaming cluster with each card its own stand alone system that would be able to cooperate to run a game across more than one computing/gaming card.
Using DIMM may not have been
Using DIMM may not have been the most accurate shorthand way to express what I was saying in but it seems to have worked in that you all understand what I was implying.
External stacks of HBM might make sense but as you point out, the interface is a hell of a problem unless you can find a way to make the HBMx modules on the interposer removable and upgradable. That would not be a trivial effort considering the size and number of connections on an interposer, a tiny speck of dust would knock all or at least some of it out of commission.
Maybe having external memory as in the server solution would mitigate the problem of a fixed amount of RAM, then again think back to the GTX 970; some might be screaming about how some RAM is slower than the rest.
Since we didn't get much in the way of details, making prognostications and knocking down theories will have to do for now.
“some might be screaming
“some might be screaming about how some RAM is slower than the rest.”
That’s why the HBM on the interposer based APU would be called primary HBM RAM, and the DIMM Based RAM would be called secondary RAM, with the proper education from the mainboard manufacturer, and AMD, as to just which memory is faster! I do not see people complaining about Intel’s use of RAM on the CHIP/Package being faster than DIMM based RAM! There will be no doubt that AMD is moving to APU’s on an interposer for even gaming in their future systems, all the government grant funding for the exascale project is going to fund a lot of R&D that will be used in the consumer market by AMD, and others!
If you can have gigabytes of
If you can have gigabytes of memory on package, then external memory, even if it is DRAM, may be accessed more like a flash swap drive. You wouldn’t want it to act like a hardware based cache since some things may get evicted from the cache when it should not. This would lead to software management which is how the swap to disk mechanism functions. You would want to keep stuff needed for the GPU in the HBM, since external memory would be too low of bandwidth. This would lead to a situation where external memory doesn’t need to be that fast, since it is accessed infrequently. You would then probably have systems with either no off package memory, or with relatively slow off package memory.
Interesting idea, that would
Interesting idea, that would certainly be an interesting and effective compromise if they could pull it off.
AMD GPU & APU is why they
AMD GPU & APU is why they spent 7 years designing this stuff and will be used on both.
That would be great. I can
That would be great. I can think of APUs with 4GB RAM and the option to expand that RAM using HBM2 dimms. Those APUs will be expensive of course, but the more products and options coming out with HBM2, the cheaper HBM2 memory will become. The cheaper HBM2 becomes, graphics cards with HBM2 memory will also become cheaper. And if in 3-5 years HBM2 is the standard memory instead of a DDR5 or something, that would mean ultra cheap graphics cards. Like those today that use DDR3.
But the HBM2 based DIMMS
But the HBM2 based DIMMS would have to communicate with the CPU over a much narrower PCB based channel, negating the reason for HBM in the first place! The only way the HBM on a DIMM would make any sense is to put some compute on the DIMM and have the HBM/CPU do its work there! Maybe if the traces to the DIMM could be Optical and the Do away with the DIMM slot then maybe there would be enough bandwidth to reasonably use the HBM at its design specs. HBM needs that wide 1024 bit channel so it can achieve it power savings by having the ability to be clocked much lower than the standard DIMM based RAM!
HBM operates similar to
HBM operates similar to regular DRAM chips, it is just spread throughout a stack of die. A DRAM chip has multiple pages and banks, each one which accesses a very wide memory array. When a row is read, it reads that entire row into buffers. Accesses within that row can be served immediately, but they have to be sent out over a narrow bus. HBM allows for a very wide bus when on an interposer, but there isn’t any reason not to use the stacked die on memory modules also.
Modules using stacked memory chips have some advantages. It will be lower power and a much lower number of packages for the capacity. The 4 or 8 GB modules actually could be made up of single stacks. For multi-stack modules, the capacity can be made very large. Samsung talked about a 128 GB single module back in November of 2015. The demonstration module used 36 4 GB packages. They will be building 8 GB stacks with HBM2.
This is the press release for the 128 GB module Samsung demonstrated in 2015 (if it will let me post links):
https://news.samsung.com/global/samsung-starts-mass-producing-industrys-first-128-gigabyte-ddr4-modules-for-enterprise-servers
I am not sure what is being
I am not sure what is being talked about with HBM DIMMs. You cannot route the 1024-bit memory connections used by HBM on an interposer through a DIMM package. The interface is designed to work with the short, soldered traces (only a few millimeters) made available on a silicon interposer. They could use HBM stacks on DIMMs with an interface chip or a different bottom logic die to allow connection to a DDR4 bus. This isn’t much different from how current memory chips work. A current DDR4 chip accesses whole rows in parallel, which must then be sent over a narrow bus to the CPU.
HBM could be used in a similar manner. This may be more efficient than DDR4, but the modules would still be limited to DDR4 interface speeds. It would also get you super small modules since a single stack is 4 or 8 GB. It also can be used for very large memory modules with many stacks. An 8 stack module with 8 GB stacks would be a single 64 GB module. Using these stacks for DDR memory modules will also help ramp the volume up.
It may look strange to have a single chip module since it is a lot of waisted space on the PCB. This is part of what makes me think we need a new standard. I would say something like an m.2 form factor, except for DRAM. It would need an HMC (hybrid memory cube) style interface, which is somewhat similar to PCI-e electrically.
Samsung has already talked
Samsung has already talked about 128GB DDR4 DIMMs using stacked memory. These are still going to be limited by the DDR4 bus though. They may be able to clock higher since it would be equivalent to running buffered memory. You are not going to get 256 GB/s out of a 64-bit DDR4 module even if it uses stacked memory internally.
No, not even Allyn could come
No, not even Allyn could come close to that with his M.2 950's but that's not to say we can't find a better connector. Maybe fabric interconnects could be integrated … just as a thought for what could come in the future.
I just wanted to make sure
I just wanted to make sure people aren’t expecting 256 GB/s out of such a DIMM. What we will be getting in the near term will be HBM DIMMs connected via the DDR4 bus, which will limit the speed significantly. I would like to see something like what is used for HMC. It is a high speed differential serial link, which would be electrically similar to the physical layer of PCI-e, QPI, or HyperTransport. This seems to be the “go to technology” for anything passed through the PCB with current technology.
Intel doesn’t seem to plan on running HMC through removable connectors though. It may limit the speed compared to board mounted components? HMC has, I believe, up to 4 16-bit links per package that allows multiple packages to be chained. This would allow an m.2 style form factor, probably wider though. It would need to be an 8 or 16-bit link, while m.2 is only x4 PCI-e, so it may be more similar to an x16 PCI-e card, just smaller. Even 4 stacks could provide 32 GB of memory with 8 GB stacks.
Although, with chip stacking technology, the amount of memory that can be included on the package may actually outpace consumers need for memory, so we may never see such a device. Also, with memory on package, the system memory gets moved out one extra level of hierarchy. This means it might be accessed more like a flash swap drive is now, which also means it does not need to be anywhere near as fast. If you have even a single 4 or 8 GB stack to act as an L4 cache, then external memory could be much slower without impacting performance and all of these high speed possibilities are irrelevant.
I don’t see HBM2 making it’s
I don’t see HBM2 making it’s way into desktops for a while until the upgrade ability is figured out. But notebooks on the other hand, that’s where it’ll shine. Just give the package enough RAM from the start to avoid the need to add more later. A Zen+Polaris APU with 8-16GB (or 32GB?!)of HBM2 in a notebook would do well i’d guess. It’d definitely interest me. That’d be enough memory to feed a mobile CPU/GPU setup. I think resuming from sleep would be practically instantaneous as well.
On the desktop front i think HBM memory would have to act as some kind of higher cache layer for the CPU, due to the bandwidth advantages, and having regular DIMM slots for bulk memory. Slower than the HBM but faster than SSD access. Wouldn’t be able to upgrade the HBM memory without a processor upgrade, but would still be able to add more RAM to the system if necessary. Give the OS the ability to prioritize the HBM for system responsiveness and let the user have the ability to prioritize programs with leftover space.
HMB will not turn up on
HMB will not turn up on DIMMS. The entire standard is designed around using an interposer as the transmission path, it’s not going to work over a PCB let alone jump a card-edge connector!
HMC is the comparable stacked-DRAM standard for operation over a PCB. However, it too cannot jump a card-edge connector. Potentially another connector standard could be used (e.g. PGA or LGA) but even that is at the limits of signal integrity.
Instead, anything requiring expandable memory will likely stick with a DDR variant designed for that purpose.
(Possibly) HBM on a
(Possibly) HBM on a DIMM:
https://news.samsung.com/global/samsung-starts-mass-producing-industrys-first-128-gigabyte-ddr4-modules-for-enterprise-servers
I am not sure that the stacked memory used in this device is actually HBM2. HBM1 is only 1 GB per stack and this uses 4 GB for each stack. You don’t get the speed of HBM on a silicon interposer, but using stacked memory has many advantages otherwise. Regular DDR4 memory chips are really parallel internally which must be converted to a narrower, faster clock interface for transfers through the PCB. It is a similar situation with HBM or stacked memory in general. They can either have the bottom logic die convert to a DDR4 interface, or have a bridge chip to go between the DDR4 interface and the interface supported by the bottom logic die. Even locally on the DIMM, you still can’t route a 1024-bit memory interface though a PCB, so they will need to use a narrow connection between the memory stacks and the bridge chip. The module linked above uses 36 4 GB stacks for a ridiculously large 128 GB capacity. It would be interesting to know how many bits are routed to each stack and what exactly the interface is. Perhaps they actually have two interfaces available on the bottom logic die to allow the same stacks to be used either on a silicon interposer or in a BGA package.
Memory modules using stacked memory will take a lot less power than conventional DRAM chips. With HBM2 (or similar stacked devices) they could technically make a 4 or even 8 GB module with a single stack. The module PCB can be made very small also, but I am not sure what this will look like. You are going to waste some PCB area on the module if you only mount one or two packages. They could easily make an SO-DIMM that would be very low power, yet have 16 or 32 GB of capacity. Perhaps they should switch over to using smaller form factor modules for consumer systems. I don’t know if they are making stacked memory with a different interface on the bottom logic die. If it uses a DDR4 interface directly on the bottom logic die, then it wouldn’t really be HBM, since HBM is defined by the interface.
Those are just regular
Those are just regular stacked chips, not HBM. Same as Samsung’s previous high-capacity modules. They’re RDIMMs, so the chips only need to interface as far as the buffer, and the buffer can handle stack interleaving. Speed is reduced to regular DDR4 rates (or even slower) with a hit to latency, so even if the stacks were replaced with HBM and some bizzarre interposer-DIMM were created, the data rate to the host would still be DDR4.
Most of the post here relate
Most of the post here relate to this statement by Jeremy:
“They have already begun production on 4GB HBM2 DRAM and promise 8GB DIMMs by the end of this year. The modules will provide double the bandwidth of HBM1, up 256GB/s of bandwidth which is very impressive compared to the up to 70GB/s DDR4-3200 theoretically offers.”
Comparing HBM2 bandwidth on an interposer to what can be achieved with a DIMM is obviously nonsense. I have already stated in several post here that there is no way they are going to get that level of bandwidth from a DIMM and that it will be limited to DDR4 speeds. This is from the original PR though:
“The new DRAM is being heralded as the ‘latest and greatest’, and will offer twice the bandwidth of the old HBM1 memory. Users will get data in and out at 256GBps, and there will be various capacities available. Though the memory modules have already gone into production, the Korean manufacturer isn’t saying when the industry-first 4GB DRAM modules will hit the shelves – but they have guaranteed an 8GB version before the end of 2016.”
This states 256 GBps will be available from a “memory module”.
“A further announcement from Samsung on the HBM DRAM modules is anticipated over the coming months. Until then, those seeking a computer memory upgrade will find a multitude of options at Data Memory Systems, from external hard drives to RAM modules for any device.”
This specifically says “HBM DRAM modules”, although I think they are using the word “module” where we would use “stack”. Since they quote the bandwidth of HBM on an interposer, they could just be calling an HBM stack a “module”. It is actually Jermey who changed “module” to “DIMM”. The PR release seems like it may be using “module” in the more general sense, that is an independent, self contained unit, and not specifically a DIMM.
As far as this article, I am leaning towards a massive semantic mess, to put it mildly. I think the PR is just talking about HBM2 4 GB stacks being produced now and 8 GB stacks before the end of the year. If they were actually talking about some kind of DDR4 DIMM module implemented with HBM stacks, then they would hopefully not state the bandwidth available from HBM on a silicon interposer. I am not sure how the HBM buisness is set up, but since it is a JEDEC standard, they will probably just be selling known good stacks, with it left up to the buyer to integrate them onto a suitable interposer.
There is a possibility that they are actually going to be using HBM stacks on DIMMs though. It probably wouldn’t be that hard to design the bottom logic die to implement the HBM2 interface in addition to the interface necessary to connect to a DDR4 interface. One of the ideas behind HBM is that you can build the logic die on a process more suited to logic and the memory die on a process more suited to DRAM. With HBM2 more than doubling in die size compared to HBM1, I suspect they have plenty of space on the bottom logic die; the size may be completely determined by the DRAM die above, rather than how much space the logic die actually needs. The HBM interface doesn’t take much die space even though it is very wide since it is much simpler than DDR4 interfaces. Making the die stacks able to be used in either mode would be a good way to increase the volume ramp. With the way stacked memory is set-up, they still need a bottom logic die for an interface even if it is not implementing the HBM interface.
I was using a fairly dry and
I was using a fairly dry and somewhat confusingly written PR release to try to get some conversation going by speculating about possible uses for HBMx over and above memory integrated onto the processor. We knew HBM2 was coming and what the expected bandwidth would be, so instead I thought I'd see if I couldn't make it more interesting for everyone.
Calling it a DIMM is not very accurate technically but it was a quick way to get people to understand what I was hypothesizing. Seems to have worked.