Just three weeks ago, we reported 3D XPoint Technology. This was a 2-layer stack of non-volatile memory that couples the data retention of NAND flash memory with speeds much closer to that of DRAM.
The big question at that time was less about the tech and more about its practical applications. Ryan is out covering IDF, and he just saw the first publically announced application by Intel:
Intel Optane Technology is Intel’s term for how they are going to incorporate XPoint memory dies into the devices we use today. They intend to start with datacenter storage and work their way down to ultrabooks, which means that XPoint must come in at a cost/GB closer to NAND than to DRAM. For those asking specific performance figures after our earlier announcement, here are a couple of performance comparisons between an SSD DC P3700 and a prototype SSD using XPoint:
At QD=8, the XPoint equipped prototype comes in at 5x the performance of the P3700. The bigger question is how about QD=1 performance, as XPoint is supposed to be far less latent than NAND?
Yes, you read that correctly, that’s 76k IOPS at QD=1. That means only issuing the SSD one command at a time, waiting for a reply, and only then issuing another command. Basically the worst case for SSD performance, as no commands are stacked up in the queue to enable parallelism to kick in and increase overall throughput. For comparison, SATA SSDs have a hard time maintaining that figure at their maximum queue depths of 32.
Exciting to see a follow-on announcement so quickly after the announcement of the technology itself, but remember that Intel did state ‘2016’ for these to start appearing, so don’t put off that SSD 750 purchase just yet.
More to follow as we continue our coverage of IDF 2015!
Hopefully they are going to
Hopefully they are going to give us some info on how it works, not just what they are going to use it for.
I promise I have been doing
I promise I have been doing my best to get them to describe how this tech works. I'm sure they will release a description at some point, lest this be the only type of memory missing a basic principles of operation.
That’s like the the Colonel’s
That’s like the the Colonel’s special Herbs and Spices, or the recipe for Coke! Do not expect that they will reveal those anytime soon. And very likely it’s a simple variant of the very same technology that a lot of others have nearly perfected also, so they won’t chance loosing their lead time to market.
P.S. The Hot Chips Symposium on High Performance Chips is Sunday-Tuesday, August 23-25, 2015, Intel’s there, and AMD is there talking about the next generation of GCN, among others.
I don’t think that anything
I don’t think that anything at this level would be described as simple. Companies have been working on PCM and similar alternative memory tech for a long time without much of any real products. It is difficult for a new type of technology to take over from an established type since the established type is a moving target. There has been a lot of R&D spent making flash better, so by the time the new tech is available, it is often outperformed by the old tech. In this case, the new tech may be orders of magnitude faster, so even if it is much more expensive, it is still viable. It may be based on a simple,idea, but I suspect that the process tech required to achieve it is not simple.
“At QD=8, the XPooint
“At QD=8, the XPooint equipped prototype comes”
I got your back
Thanks, Anonymous bro!
Thanks, Anonymous bro!
Hopefully they are going to
Hopefully they are going to give us some info on how it works, not just where they are going to use it.
I’m patiently waiting for a
I’m patiently waiting for a Plug-and-Play DDR3-1333 and DDR3-1600 compatible SODIMM in popular densities e.g. 4GB, 8GB and 16GB per SODIMM.
This patent pending device was designed with the eventual arrival of Non-Volatile SODIMMs in mind — to eliminate the need for a secondary input power supply:
Future permutations can also include DDR4 SODIMMS and fiber optical data channels with adjustable clock rates to “push” throughput:
Thanks again, Allyn.
These are exciting times in the storage field, and we look forward to reading more of your genius-level insights.
I don’t think DDR-3 has any
I don’t think DDR-3 has any provisions for supporting NVRAM. I thought this was already part of the DDR-4 spec though. I wouldn’t expect NVRAM based on DDR-3 since it may be difficult to implement in a transparent manner. If you have support for it in DDR-4, then there would probably be little interest in attempting to port the technology back to DDR-3 based systems.
FYI: Everspin do mention DDR3 support at their website:
There were no results today, searching Everspin for “DDR4”.
With the arrival of Intel’s 3D XPoint, Everspin may be confined to niche markets like the notorious “smart meters”:
Google site:crossbar-inc.com ddr3
Google site:crossbar-inc.com ddr4
also produced zero results.
Just as Flash was set to
Just as Flash was set to supplant HDDs for those hot and ready data requests on server systems, CrossBar memory comes along to rain on Flash’s parade. HDDs will still be around for backup and longer term storage, but FLASH is now going to be pushed down the hierarchy on the server system memory/storage pecking order, and even on the consumer systems, with CrossBar available on DIMMS as well. Just imagine your DRAM DIMMS supporting some extra added to the DRAM on DIMM CrossBar memory for system use including write through backups for faster suspend/sleep and power loss recovery of RAM. One thing is for sure CrossBar is orders of magnitude faster than NAND, and orders of magnitude more durable, add to that 8 to 10 times the density for more NVM in less space. That SLC FLASH NAND is going to become a lot cheaper, and will be what the FLASH makers will have to use just to even gain a little amount of extra read/write speed compared to CrossBar.
Let’s hope the extra competition forces the Flash, and HDD folks to pool their resources and offer hybrid drives with loads of Flash paired with lots of HDD storage capacity, because FLASH NAND is going to need a little spinning rust to help it over those longer periods of storage when those NAND cells can sometimes loose their state. Look for more SLC NAND CACHE paired with HDDs in the future because CrossBar is going to kick the NAND only SSD drives to the curb, as CrossBar/other similar technologies take over hosting the OS and paging files.
To : CrossPoint, or XPoint, Or X-Point, Or Optane!
And most certainly my spell checker will want replace Optane with Propane, and people will wonder what that’s about.
Actually propane is the third choice for Optane, in LibreOffice 5.0!
“…, add to that 8 to 10
“…, add to that 8 to 10 times the density for more NVM in less space. ”
Isn’t it advertised as more dense than DRAM, not flash? This is still going to be more expensive than flash for a given capacity, but for a fast boot/swap drive, this shouldn’t be that big of an issue.
Half the cost for XPoint
Half the cost for XPoint compared to DRAM is some of the figures I have seen online(?). The 7x performance figure over flash NAND is from Intel’s demo for IOPS (Input/Output Operations per Second) at a queue depth of 1. Density figures of DRAM which is only a single transistor and a capacitor, and with flash about twice as dense as DRAM(?). That Flash is going to have to be SLC only to make up for some of the difference in R/W speed relative to XPoint.
I’m looking froward to some benchmarks that can test repeated single byte random reads and writes for XPoint, because its byte addressable, compared to flash NAND’s having to read an entire bank just to write back a single byte along with the rest of the bank’s contents. No one really expects real world single byte reads or writes but it would be nice to see with a controller monitor software package the amount of in the background and total amount of work a Flash NAND based SSD’s controller would have to be doing for a single byte’s change, relative to a Xpoint based SSD controller’s work for a single Byte. the amount of Error correction will be intresting to see tested also.
The current speeds of MLC(2), or MLC(3) flash NAND is going to be even slower than SLC based NAND, and even the SLC NAND, which is orders of magnitude slower than XPoint’s R/W speeds, is not even going to be enough. And Intel is talking about consumer products in 2016, not just enterprise products.
> No one really expects real
> No one really expects real world single byte reads or writes
That’s already being handled efficiently by operating systems which utilize associative buffers for all I/O e.g. if a single byte is changed in a 512-byte HDD sector, that entire sector is not immediately overwritten on that HDD.
The byte in question is changed in a buffered copy of that sector.
Those buffers are one of the reasons why a thumb drive must be “safely removed” after data have been written to it:
those buffers must be “flushed” first (read “physically written to the target device”).
This “flushing” happens automatically at routine Windows SHUTDOWN, for example.
You don’t read a single byte
You don’t read a single byte from DRAM either. When you try to access a byte, the DRAM transfers an entire row into a buffer. It has to be written back when the row is closed since reads are destructive (it drains the capacitor). Most systems are just going to read entire cache line at once anyway. Since this new NVRAM is supposed to be byte addressable, I would expect it to be set up more like DRAM. It shouldn’t need to be written back though since reads should not be destructive. It will be interesting to know the specifics.
DRAMs are accessed through
DRAMs are accessed through the memory controller via the specialized Channels, and those channels can be larger than even the CPU’s standard word size(32/64) bit data bus width. And yes there are the DRAM’s refresh cycles, but the post is concerned with XPoint and whatever the XPoint based SSD’s controller would have to do in handling that theoretical one byte R/W request, relative to the work a NAND based SSD’s controller would have to do to get that byte to be written, or read.
We are talking about a controller workload metric, and the power used/overhead to get the work done on NAND versus XPoint based devices. This usually happens in the background on the respective devices’ controllers, and it’s only theoretical, as there is buffering and queuing also to help things along. You did NOTE that the Intel IOPS benchmark on the test device, that the queuing depth was set to 1, and that the XPoint based device was still magnitudes ahead of the NAND based device in the testing results. So for DATA center usage those “Single Byte” workloads will be modeled, and even the controllers on any potential SSD/Other device will be tested using different OSs in an effort to save energy, those bean counters are serious about power usage in the data center.
P.S. the DRAM is usually handled by the Virtual Memory subsystems in the of the OS, and CPU’s Hardware/Memory controller on the CPU/chip-set and whole pages are loaded to and from the DRAMs and the page swap file. Individual memory physical memory accesses are done on the physical DRAM addresses via the CPUs addressing mechanisms, Cache subsystems etc. That’s in addition to whatever buffering systems the OS has for direct read write to the I/O devices themselves, via DMA, other subsystems. DRAM’s do require refresh cycles but that’s done on the DIMM package in the background. XPoint based DIMMS are also going to be available, requireing no refreshing so maybe combo DIMMs will have both XPoint and DRAM stores to make things even faster.
“No one really expects real
“No one really expects real world single byte reads or writes”!!
Is what was said, but the rest of the post is about testing the amount of work that the NAND SSD’s controller would have to do, using specialized software that is made to do single byte workloads. Most likely the single byte is actually going to be a single word, and depending on the word size/data bus width of the CPU 32, or 64, that is what the smallest amount of data that will be read or written to memory. So with/using specialized benchmarking software! The POST is about benching the NAND SSD’s controller’s workload, relative to the XPoint based Devices controller’s workload, and for potential power usage and other metrics. This is a Controller vurses Controller benchmark, that can be done in addition to the standard through the OS system’s software stack “associative buffers “, or whatever.
FYI: Crossbar and 3D XPoint
FYI: Crossbar and 3D XPoint (pronounced Cross-Point)
are 2 different things:
Compare http://www.crossbar-inc.com/ with Allyn’s report above.
Yes, it may look similar, but
Yes, it may look similar, but Intel/Micron have made it clear that this is *not* crossbar.
Yes You are right, Fix that
Yes You are right, Fix that in MY Troll, my reliance on spell checkers is necessary, otherwise the words will come out backwards, and mangled! So yes it’s Cross-Point(TM), X-Point(TM). And most WP programs do not like trade terms and computer Acronyms.
> hosting the OS and paging
> hosting the OS and paging files
Yes YES!! including also a “Format RAM” option in the BIOS/UEFI just prior to running Windows Setup e.g. see these 2 Provisional Patent Applications (now expired):
Definitely getting one of
Definitely getting one of these for my 2016 build.
Even if they had
Even if they had consumer products by then which I doubt the cost would likely far exceed a regular SSD.
If you really need the performance (which would mainly benefit video editing for consumers) we already have pretty fast PCIe SSD options.
Likely the cost will mean there’s a very niche usage case such as servers where the power savings for constant usage make the extra cost worthwhile.
How many number of NAND dice
How many number of NAND dice are in P3700 and Optane respectively?
Besides the latency of NAND itself, random read IOPS depends on two major factors: I/O queue depth and number of NAND die. I guess Xpoint will have much larger per-die density than 2D NAND. So given the same disk capacity, Optane will have less dice than P3700, if this is true, you will see the random read IOPS gap will be reduced when I/O queue depth increase. Maybe that’s why Intel only shows you QD=8.