Write Pressure, Conclusion, Pricing, and Final Thoughts
Write Pressure
Before we wrap up, I wanted to reiterate and expand on a chart that Intel showed us at our briefing:
Since we had an additional comparison point and some increased testing flexibility on our end…:
I've added in the Micron 9100 MAX, which is able to achieve a higher random write load (300,000 IOPS) while still servicing reads, but we can certainly see the penalty adds up as the average latency climbs to over 1ms. The P3700 runs out of steam at 100,000 IOPS, but this was only a 800GB sample and was not able to reach the 750 MB/s seen with Intel's data above. All the while, check out that blue line down there. Intel only took their chart out to the ~200,000 random write IOPS point. I pushed the P4800X all the way past 500,000 random write IOPS and it was still able to service reads at just 36us, which I should point out is still quicker than the average read latencies of both competing products with no writes taking place at all (far left)!
Conclusion:
Pros:
- Outstanding random performance
- Outstanding QoS
- Outstanding endurance (based on rating)
- It's the fastest thing we've ever tested. Period.
Cons:
- Cost (see below)
Pricing:
- Micron 9100 MAX (highest performing competing NAND Flash)
- P4800X 375GB:
- $1520 ($4.05/GB)
- P4800X 320GB (usable capacity) w/ Intel Memory Drive Technology (augments DRAM):
- $1951 ($6.10/GB)
- Server-class ECC Registered DRAM:
- ~$9-$10/GB
Yes, this is expensive, but you definitely get what you pay for here, especially if your use case meshes nicely with where the P4800X shines. For your money, you are getting a product that eats random IOPS for breakfast, lunch, and dinner. And then it asks for dessert. Just don't waste your cash on this type of product unless you intend to use it for its designed purpose! Sure pretty much any IO heavy application will benefit greatly from the P4800X, but as with any storage system, you have to balance cost with performance. That second, more expensive tier of the P4800X includes a license for Intel MDT, which effectively converts the installed server RAM into a pool of Optane with the installed RAM acting as an additional cache layer. For many workloads the MDT solution will offer similar performance at a substantial cost reduction over a pure RAM solution.
Warning (to non-IT pros):
If you have read this far and are not an enterprise customer, I know what you're thinking. You may want one of these for your video editing, workstation, or maybe even your gaming rig. That's fine, but there are a few things you need to consider. First, enterprise parts are tuned for random access across the entire drive, meaning a consumer SSD / firmware would likely perform better with consumer workloads as it is tuned for that purpose. Second, and more important in the case of Intel Datacenter parts, is the matter of 'assertion'. IT specialists don't like wasting time on intermittent faults and silent data corruption. If something is wrong in the slightest, an IT Pro just wants the thing to fail hard so they can replace it and get that portion of their network back up ASAP. As such, Intel programs their DC SSD firmware to enter an 'assert mode' at the slightest sign of trouble. An asserted Intel SSD is effectively a bricked SSD that won't do anything further as it is meant to be replaced. Even if most of the data was good, it will no longer be readable. That's not to say Intel's Datacenter SSDs are bricking left and right, but an SSD 750 (consumer version of the P3xxx) will push through many faults and attempt to continue operating while those same issues would instantly assert a P3520. Moral of the story – don't use an enterprise part for consumer purposes unless you are employing an enterprise-level redundancy / backup regime.
Final Thoughts:
The P4800X is a beast. I mean seriously. We're talking 1/10th the latency of the fastest competing NAND products and 10x the IOPS performance at lower Queue Depths. Endurance is multiple times higher than anything with NAND flash in it. It's just a monster. The only catch? Software needs to catch up a bit in order to realize the full potential here, but that is very much a solvable problem, and future iterations of XPoint will be packaged to speak to the CPU directly via DIMM slots, further removing the legacy bottlenecks associated with modern OS kernels. As it stands now, it looks like the P4800X has certainly met Intel's expectations for XPoint in PCIe NVMe form. Now we can all start waiting to see what XPoint DIMMs are capable of!
What? No Editor's Choice? I'm drawing a few hard lines here:
- I can not award Editor's Choice for a product we did not test in-hand.
- I will not award Editor's Choice to a product we are unable to completely verify against the product specification that a customer would receive if they were purchasing it.
That said, it's a damn impressive showing of the first new memory technology to come out in over a decade. It's hard not to give it some sort of award despite the odd testing scenario present for such early testing of what is still a *very* protected product.
Thanks for the review(pre
Thanks for the review(pre consumer) of optane which I had been waiting for a while now. First none and now two, one on another site that I respect. Big thanks for the latency graphing from 1 clock cycle to a floppy drive. Very informative and something I was wondering about after getting a picture of intel placing the idea that it could be a go between storage and dimms. You test at very high queue depths but seem to state that some testing for a web server is not the best idea. Isnt it true that a webserver is the only place where high queue depths are to be seen? If so, and queue depths normally seen are much lower, where would one expect to see such high queue depths – or is it as you seem to say, its just a test to test?
Thanks for the article, I will have to wait for you to test again when you get one in your hands and likely find that consumers are at the door of another exponential shift like the one where ssd’s were used as boot drives when the price came down. We will more than likely start placing our Os’s on optane drives in our ssd system to gain additional quickness.
When they become available, Pcper “must” see what it will take to boot a computer in a second with a optane boot drive.SSD is 10 second possible. Nuff said.
Regarding high QD’s, there
Regarding high QD's, there are some rare instances, and it is possible that a web server could be hitting the flash so hard that it hits such a high QD, but if that happens I'd argue that the person specing out that machine did not give it nearly enough RAM cache (and go figure, this new tech can actually help there as well since it can supplement / increase effective RAM capacity if configured appropriately).
Regarding why I'm still testing NVMe parts to QD=256, it's mostly due to NVMe NAND part specs for some products stretching out that high. I have to at least match the workloads and depths that appear in product specs in order to confirm / verify performance to those levels.
I'm glad you saw benefit in the bridging the gap charts. Fortunately, my floppy drive still works and I was able to find a good disk! :). I had to go through three zip disks before finding one without the 'click of death'!
Holy smokes!
Hey great work
Holy smokes!
Hey great work here A, as usual.
Ditto that, Allyn: you are
Ditto that, Allyn: you are THE BEST!
> In the future, a properly tuned driver could easily yield results matching our ‘poll’ figures but without the excessive CPU overhead incurred by our current method of constantly asking the device for an answer.
Allyn,
The question that arose for me from your statement above
is this:
With so many multi-core CPUs proliferating,
would it help at all if a sysadmin could
“lock” one or more cores to the task
of processing the driver for this device?
The OS would then effectively “quarantine”
i.e. isolate that dedicated core from scheduling
any other normally executing tasks.
Each modern core also has large integrated caches,
e.g. L2 cache.
As such, it occurred to me that the driver
for this device would migrate its way
into the L2 cache of such a “dedicated” core
and help reduce overall latency.
Is this worth consideration, or am I out to lunch here?
Again, G-R-E-A-T review.
Locking a core to storage
Locking a core to storage purposes would sort of help, except you would then have to communicate across cores with each request, which may just be robbing Peter to pay Paul. The best solution is likely a hybrid between polling and IRQ, or polling that has waits pre-tuned to the device to minimize needlessly spinning the core. Server builders will likely not want to waste so many resources constantly polling the storage anyway, so the more efficient the better here.
for
for example:
http://www.tech-recipes.com/rx/37272/set-a-programs-affinity-in-windows-7-for-better-performance/
“Whether you want to squeak out some extra Windows 7 performance on your multi-core processor or run older programs flawlessly, you can set programs to run on certain cores in your processor. In certain situations this process can dramatically speed up your computer’s performance.”
I did some experimentation
I did some experimentation with setting of affinity on the server, and I was able to get latency improvements similar to polling, but there were other consequences such as not being able to reach the same IOPS levels per thread (typical IO requests can be processed by the kernel faster if the various related processes are allowed to span multiple threads). Room for improvement here but not as simple as an affinity tweak is all.
PCPER.com announces the first
PCPER.com announces the first ever CLONE AUCTION:
This auction will offer exact CLONES of Allyn Malventano,
complete with his entire computing experience intact.
Minimum starting bid is $1M USD. CASH ONLY.
Truly, Allyn, you are a treasure to the entire PC community.
THANKS!
I’m glad there was at least
I’m glad there was at least one comparison with the 960 pro, which is the most interesting graph in the article. I just wish there were more comparisons.
Your additional answers are
Your additional answers are coming soon!
Speaking of comparisons, I am
Speaking of comparisons, I am now very curious to know if Intel plans to develop an M.2 Optane SSD that uses all x4 PCIe 3.0 lanes instead of x2 PCIe 3.0 lanes.
Also, we need to take out a life insurance policy on Allyn, because we want him around to do his expert comparisons when the 2.5″ U.2 Optane SSD becomes available.
If Intel ultimately commits to manufacturing Optane in all of the following form factors, we should expect it to be nothing short of disruptive (pricing aside, for now):
(a) AIC (add-in card)
(b) M.2 NVMe
(c) U.2 2.5″
(d) DIMM
I would love to know that a modern OS can be hosted by the P4800X and all future successors!
PCIe 4.0 here we go!
Hello, Allyn!
Could you tell
Hello, Allyn!
Could you tell me, how did you manage to tweak FIO to perform polling for Optane P4800X under Windows?
I’ve read, how to do it under Linux only.
Thanks a lot in advance!
Regards,
Nick