IDF 2016 is up and running, and Intel will no doubt be announcing and presenting on a few items of interest. Of note for this Storage Editor are multiple announcements pertaining to upcoming Intel Optane technology products.
Optane is Intel’s branding of their joint XPoint venture with Micron. Intel launched this branding at last year's IDF, and while the base technology is as high as 1000x faster than NAND flash memory, full solutions wrapped around an NVMe capable controller have shown to sit at roughly a 10x improvement over NAND. That’s still nothing to sneeze at, and XPoint settles nicely into the performance gap seen between NAND and DRAM.
Since modern M.2 NVMe SSDs are encroaching on the point of diminishing returns for consumer products, Intel’s initial Optane push will be into the enterprise sector. There are plenty of use cases for a persistent storage tier faster than NAND, but most enterprise software is not currently equipped to take full advantage of the gains seen from such a disruptive technology.
XPoint die. 128Gbit of storage at a ~20nm process.
In an effort to accelerate the development and adoption of 3D XPoint optimized software, Intel will be offering enterprise customers access to an Optane Testbed. This will allow for performance testing and tuning of customers’ software and applications ahead of the shipment of Optane hardware.
I did note something interesting in Micron's FMS 2016 presentation. QD=1 random performance appears to start at ~320,000 IOPS, while the Intel demo from a year ago (first photo in this post) showed a prototype running at only 76,600 IOPS. Using that QD=1 example, it appears that as controller technology improves to handle the large performance gains of raw XPoint, so does performance. Given a NAND-based SSD only turns in 10-20k IOPS at that same queue depth, we're seeing something more along the lines of 16-32x performance gains with the Micron prototype. Those with a realistic understanding of how queues work will realize that the type of gains seen at such low queue depths will have a significant impact in real-world performance of these products.
The speed of 3D XPoint immediately shifts the bottleneck back to the controller, PCIe bus, and OS/software. True 1000x performance gains will not be realized until second generation XPoint DIMMs are directly linked to the CPU.
The raw die 1000x performance gains simply can't be fully realized when there is a storage stack in place (even an NVMe one). That's not to say XPoint will be slow, and based on what I've seen so far, I suspect XPoint haters will still end up burying their heads in the sand once we get a look at the performance results of production parts.
Leaked roadmap including upcoming Optane products
Intel is expected to show a demo of their own more recent Optane prototype, and we suspect similar performance gains there as their controller tech has likely matured. We'll keep an eye out and fill you in once we've seen Intel's newer Optane goodness it in action!
Really those XPoint haters
Really those XPoint haters are responding to Intel’s marketing claims, and who does not hate/mistrust the marketing “profession”. So it’s really a response to Intel’s Pie in the sky claims about XPoint’s real obtainable performance metrics in a non vaporware form.
Most are hoping that XPoint will have some measure of durability to justify using it in DIMMs along with the DRAM at enough of an improvement over NAND to justify XPoint’s added cost, but more testing is in order for hopefully some engineering samples with which to see beyond any marketing hype!
“Intel’s Optane XPoint DIMMs pushed back – source”
http://www.theregister.co.uk/2016/08/16/intel_optane_xpoint_dimms_and_ssds_delayed/
Yeah, that article is kinda
Yeah, that article is kinda out to lunch. First gen XPoint was meant for storage-class memory, not to act as DRAM, which is what it will take to realize the raw 1000x performance gains over flash. XPoint basically pushes storage class devices into their other bottlenecks. The same thing would happen if someone tried to make an NVMe SSD full of DRAM.
Skylake Purley and Knights
Skylake Purley and Knights Hill, as well as(probably) Fujitsu’s ARMV8 CPU for their Post K exascale architecture will all likely be using XPoint directly addressed by the CPU.
That should show its true capabilities much better than PCI-E bottlenecked SSDs do.
My only concern with XPoint is some kind of artificially imposed endurance limitation.
> The same thing would
> The same thing would happen if someone tried to make an NVMe SSD full of DRAM.
Allyn’s excellent point here is very easy to prove,
using some simple arithmetic:
Take DDR3-1600, just to illustrate
(yes, DDR4 is the “latest” but let’s go with
the larger installed base of DRAM):
1600 x 8 = 12,800 MB/second (stock speed / no overclock)
There is a very large installed base of DDR3-1600 DRAM
(e.g. in a myriad of laptops).
Now, serialize that data stream, assuming PCIe 3.0 specs:
12,800 MB/second x 8.125 bits per byte = 104.0 Gb/sec
One NVMe device uses 4 x PCIe 3.0 lanes at 8 GHz per lane
104 Gb/s / 4 NVMe PCIe lanes = 26 Gb/s per PCIe 3.0 lane
BUT, each PCIe 3.0 lane oscillates at 8 GHz presently.
With perfect scaling and zero controller overhead,
an NVMe RAID Controller with 3 members of a RAID-0 array
might come close:
3 @ 8G = 24 Gbps vs. 26 Gb/s (see above)
With realistic scaling and non-zero controller overheads,
a RAID-0 array with 4 members might come close.
Another good comparison would be a RAID-0 array
of 12G SAS devices: 12G SAS oscillates faster
than one NVMe lane, but SAS still uses the
8b/10b legacy frame: so, there’s a trade-off.
OOPS! This is an error:
>
OOPS! This is an error:
> With perfect scaling and zero controller overhead,
> an NVMe RAID Controller with 3 members of a RAID-0 array
> might come close:
> 3 @ 8G = 24 Gbps vs. 26 Gb/s (see above)
Correction: each array member uses 4 PCIe 3.0 lanes:
3 @ 32G = 96 Gbps vs. 26 Gb/s
Sorry about the typo.
There’s an assumption that is
There’s an assumption that is often hidden in these predictions, and this is that NVDIMMs will be deployed
IN PLACE OF DDR4+.
On the other hand, I can visualize
a triple-channel memory subsystem, which uses 2 of 3
“banks” for normal DRAM, and the third bank is populated
with NVDIMMs that spend most of their time on READs
e.g. launching programs.
Hosting an OS in non-volatile memory has a LOT to
recommend it e.g. almost INSTANT-ON restarts.
As such, durability is not a singular concept, but
should come with separate metrics for READs and WRITEs.
Now, for a future
Now, for a future possibility:
IF (BIG IF here) …
IF NVMe 4.0 “syncs” with PCIe 4.0, THEN
re-calculate with 16G serial data channels:
assume a RAID-0 array with zero controller overhead and
4 member NVMe 4.0 SSDs with channels oscillating
at 16G + using 128b/130b jumbo frames:
THEN …
4 x NVMe SSDs @ 4 PCIe 4.0 lanes @ 16 GHz per lane / 8.125
= 31.50 GB/second (roughly 2 GBps per lane)
So, under those assumptions, such an NVMe storage subsystem
does exceed the raw bandwidth of DDR3-1600, theoretically.
Reviews of Highpoint’s RocketRAID 3840A (recently announced)
should tell us a LOT about how close the above calculations are.
I apologize for my math error
I apologize for my math error above:
it was truly a day to forget
(I’ll spare you the political details).
Here’s a simplified version of
our bandwidth comparison with DDR3-1600:
Assume:
DDR3-1600 (parallel bus)
1,600 MHz x 8 bytes per cycle = 12,800 MB/second
(i.e. exactly TWICE PC2-6400 = 800 x 8)
Now, serialize with PCIe 3.0
(8G transmission clock + 128b/130b jumbo frames):
1 x NVMe PCIe 3.0 lane
= 8 GHz / 8.125 bits per byte = 984.6 MB/second
4 x NVMe PCIe 3.0 lanes
= 4 x 984.6 MB/second = 3,938.4 MB/second
4 x 2.5″ NVMe SSDs in RAID-0 (zero controller overhead)
= 4 x 3,938.4 = 15,753.6 MB/second
Compute aggregate overhead:
1.0 – (12,800 / 15,753.6) = 18.7% total overhead
Highpoint calculated 15,760 (almost identical):
http://highpoint-tech.com/PDF/RR3800/RocketRAID_3840A_PR_16_08_04.pdf
Conclusion:
assuming aggregate controller overhead of 18.7%,
four 2.5″ NVMe SSDs in RAID-0
exactly equal the raw bandwidth
of DDR3-1600 DRAM.
http://semiaccurate.com/2016/
http://semiaccurate.com/2016/09/12/intels-xpoint-pretty-much-broken/