Earlier this week, Micron launched their QuantX branding for XPoint devices, as well as giving us some good detail on expected IOPS performance of solutions containing these new parts:
Thanks to the very low latency of XPoint, the QuantX solution sees very high IOPS performance at a very low queue depth, and the random performance very quickly scales to fully saturate PCIe 3.0 x4 with only four queued commands. Micron's own 9100 MAX SSD (reviewed here), requires QD=256 (64x increase) just to come close to this level of performance! At that same presentation, a PCIe 3.0 x8 QuantX device was able to double that throughput at QD=8, but what are these things going to look like?
The real answer is just like modern day SSDs, but for the time being, we have the prototype unit pictured above. This is essentially an FPGA development board that Micron is using to prototype potential controller designs. Dedicated ASICs based on the final designs may be faster, but those take a while to ramp up volume production.
So there it is, in the flesh, nicely packaged and installed on a complete SSD. Sure it's a prototype, but Intel has promised we will see XPoint before the end of the year, and I'm excited to see this NAND-to-DRAM performance-gap-filling tech come to the masses!
Remember the days where
Remember the days where memory and storage were one and the same? AHHH once again developers will be lazy and itll take ten years to take advantage of this…
> it’ll take ten years to
> it’ll take ten years to take advantage of this
… not if the hardware designers allow for a
few simple changes (see my comment above) e.g.:
(a) increase the transmission clock to 16G
to “sync” with PCIe 4.0
(b) enable the 128b/130b jumbo frame on U.2 cables
to “sync” with PCIe 3.0.
These 2 changes will also accelerate
existing SAS subsystems, without requiring
10 years of R&D.
Thanks for the great FMS
Thanks for the great FMS coverage, Allyn.
I couldn’t help but notice that:
x4 saturates at 900
x8 saturates at 1800
Will x16 saturate at ~3600, then?
Lane count does make a difference.
If future U.2 ports oscillate at 16G,
we can use HBAs with x16 edge connectors
and not have to worry about the DMI bottleneck.
x16 @ 16GHz / 8.125 bits per byte = 31.50 GB/second!!
How about an HBA Option ROM feature to select
pre-set clock speeds?
or, better yet, auto-detection?
At FMS, is anyone talking about Optane
installed in 2.5″ NVMe SSDs?
Fill’er up with some QuantX
Fill’er up with some QuantX or Optane, Sir. Why yes, I’ll have some QuantX, and put it right on the HBM2 stacks connected up with some TSVs to the DRAM Dies! And be sure to pimp out the HBM2’s bottom logic die with an XPoint controller chip to pair with the HBM2’s DRAM controller to keep things going back and forth in the background, DRAM to XPoint, without reducing any system bandwidth available from the HBM2’s DRAM. Plenty of deep command buffers/queues too, so things can be kept efficiently utilized.
No too good!
No too good!
“Intel overhyping flash-killer XPoint? Shocked, we’re totally shocked”
My Comment at
My Comment at http://www.theregister.co.uk yesterday:
Let’s start with a very simple and basic block diagram:
CPU —–> chipset —–> storage subsystem (i.e.3D XPoint).
Try to visualize the CPU as a radio frequency transmitter:
4 cores x 64-bits per register @ 4 GHz is a lot of binary data
On the right is 3D XPoint.
As their measurements show,
Micron achieved “900” w/ PCIe 3.0 x4 lanes; and,
Micron achieved “1800” w/ PCIe 3.0 x8 lanes.
Read: almost perfect scaling.
And, the flat lines speak volumes:
in both cases, the storage subsystem
saturated the PCIe 3.0 bus.
Now, extrapolate to PCIe 3.0 x16 lanes:
wanna bet “3600”? My money says, “YES!”
Now, extrapolate to PCIe 4.0 x16 lanes:
my money says ~ “7200” — flat line
(maybe not perfect scaling,
but you get the idea 🙂
Conclusion: 3D XPoint is FAAAST, and
Micron’s measurements show that
the chipset is now the bottleneck —
all cynicism aside.
Good but not 1000 times the
Good but not 1000 times the speed of NAND as of this time or ever(?), and what about the durability as that is the more important concern if an XPoint die/s where to be added to the HBM2 stack/s. As there has to be some substantial durability over NAND before any XPoint can be added to any HBM2 stacks. My question is that if the durability of XPoint is sufficient enough to last the useful live of any device using HBM2 with an XPoint/NVM die added, can it be done. Then as well how much bandwidth can be had by using TSVs to wire the XPoint Die/s directly to the DRAM dies and maybe being able to transfer entire blocks of data at a time between the XPoint die/s and the DRAM die/s. Ultimately any XPoint NVM added to the HBM2 stacks is going to be an asset what with XPoint being much a more densely packed medium than NAND or DRAM, provided the XPoint has a durability rating that exceeds 5-7+ years.
AMD appears to be wanting to add some FPGA compute to the HBM stacks as well(Patent filings), so do you see any exascale applications where having both some FPGA in HBM2 memory compute along with some NVM(XPoint) in HBM2 memory storage right there very close to where it is needed for some types of exascale workloads where the offloading of some compute can happen right in the HBM2 stacks without the need to do as much power wasting transfers of large data sets to and from any distant storage pools. And even for GPU/Gaming workloads imagine having DRAM, FPGA Compute, and NVM storage right there on the the HBM2 die stacks for any pre/post processing or assisting the GPU in processing various workloads for gaming. And this includes reprogramming the FPGA/s with any Vulkan/DX12 new features that where not present in the GPU ASIC at the time of its release.
HBM2/other technologies are going to be a very important for reaching the exascale power usage metrics with the HBM technology’s intrinsic ability to provide that high effective bandwidth at low/power saving clock rates, so add to that maybe some NVM wired up directly to the HBM2’s DRAM dies with TSVs and some added FPGA compute for some very localized in memory compute assistance to the GPU, or APU, compute accelerator and that solution may just be very appealing as far as overall power usage is concerned for exascale compute, and probably even consumer/gaming/graphics uses also.
What “SSD”, lol? All I see on
What “SSD”, lol? All I see on that second photo is just some 3dFx VooDoo video card. Is 3dFx arisen from ashes like a phoenix, via some black VooDoo. Them flashbacks just won’t stop that easily, I guess…
Is it wrong that I wish I had
Is it wrong that I wish I had storage that required active cooling?
My old WD Raptors were close.
If you don’t happen to have a
If you don’t happen to have a “fan club”, then
buy a “Squid”:
“What about heat? We ran our storage stress test (constant recycling Blackmagic Disk Speed Test for several minutes). The sensors of the four SM951 AHCIs reported high temperatures of 158F, 160F, 158F, and 165F.”
Is that a 6-pin PCIe
Is that a 6-pin PCIe connector on top right? I know this is a Dev board, but do you know what the other ports are for Allyn since communications should go over the PCIe bus?
… and 2 more connectors at
… and 2 more connectors at the top edge?
A prototype board will often
A prototype board will often have a lot of extra connectors for test equipment. I don’t know why they are showing off a prototype board with an FPGA. If they are going to have a consumer product this year, then they should have taped out an ASIC controller months ago. They may be able to operate the design, programmed into an FPGA, at near full speed with how powerful modern FPGAs are. It obviously will not be anywhere near as power efficient as an ASIC, so the final design may not need a fan. With the higher bandwidth though, the power consumption may be quite a bit higher than flash based devices. For flash, and probably the x-point devices, most of the power is consumed by the controller and DRAM, not by the actual memory die.
I hope the new AMD Zen
I hope the new AMD Zen platform will provide enough PCIe lanes to make benefit of this new storage tech. Intels mainstream Core i7 6700 platform is kind of bottlenecked with only 16 active lanes.
You don’t need more PCIe
You don’t need more PCIe lanes for storage,
if you have an available x16 PCIe 3.0 slot:
Intel tends to gimp their
Intel tends to gimp their consumer SKUs to keep them from competing directly with their commerical SKUs but maybe PCIe 4.0 will help for lower numbers of PCI lanes needed.