High Resolution Quality of Service (QoS) 4KB Random
*note* much of the writing on this page is a repeat of the 375GB review. Results for the P4800X are now the 750GB model tested in-house, and we re-tested the P3700 as a newer firmware had been released since the last review. Micron 9100 MAX results remain unchanged from the last review as there was insufficient time to get that drive re-tested for this piece (and we have no reason to believe those results have changed regardless).
Required reading (some additional context for those unfamiliar with our Percentile testing):
- Introduction of Latency Distribution / Latency Percentile (now called IO Percentile)
- Introduction of Latency Weighted Percentile (now called Latency Percentile)
Intro to PACED workloads – 'It's not how fast you go, it's how well you go fast!'
I'd considered laying out the typical Latency Percentile and IO Percentile data before going into the QoS, but honestly, it's just a slaughter across the board, so I'll cut straight to the chase:
Quality of Service (QoS)
QoS is specified in percentages (99.9%, 99.99%, 99.999%), and uniquely spoken (‘three nines’, ‘four nines’, ‘five nines’). It corresponds to the latency seen at the top 99.x% of all recorded IOs in a run. Enterprise IT managers and system builders care about varying levels of 9's because those long latencies lead to potential timeouts for time-sensitive operations, and increasing the 9's is how they quantify more stringent QoS requirements. Note that these comparative results are derived from IO Percentile data and *not* from Latency Percentile data.
If you have a hard time wrapping your head around the 9's thing, It may be easier to flip things around and think about it from the standpoint of the remaining longest-latency IO's that haven't been accounted for as the plot progresses. As an example, the 99.9% line near the center of the vertical axis represents the top 10% of the top 1% (0.1%) of all recorded IOs, where 'top' means those IOs of the longest latency.
These plots are tricky to make, as they are effectively an inverse log scale. Each major increment up from the zero axis corresponds to the top 90%, and the next increment after that shows the top 90% *of that previous value*, meaning it's an asymptotic scale which will never reach 100%. The plots below essentially take the top portion of the IO Percentile results and spread them out, exponentially zooming in on the results as they approach 100%.
Read
Note that we have shifted the scale here to make it down to 1 microsecond as the P4800X is riding the 10us figure throughout these tests. We want QoS to ideally be a vertical line, and this is an extremely impressive result here. I didn't take these out to QD=256 as the P4800X saturates by QD=16 in all workloads. Further plot lines simply shift further to the right.
Here is an easier numerical chart plotting out the exact places where the QoS chart crosses the various latencies. Note the 50% mark (upper left), where the P4800X comes in at an average (50%) latency of less than 10us!
Alright, I've taken QD=1, 2, and 4 for the P4800X (blue), P3700 (green), and 9100 MAX (gold). Remember this is a log scale, so the competing products coming in a full major increment to the right indicates that they are 10x slower.
70/30 mix
With reads being the majority of this mix, the P4800X results are nearly identical to 100% read, while the competing products taking a right turn into even longer latencies due to the increase in write demand. The P4800X doesn't seem to care in the least about the added writes and continues to dominate.
Write
Now we see some elbows in the plot. Latencies are still great overall, but clearly the controller is doing some extra work, likely to provide wear leveling, etc.
Figures are still well within spec, though average (typical) latency has crept just over 10us. 100% writes is not a 'typical' workload, so I consider Intel's "Typical: <10us" claim to fall more into the 70/30 bracket covered earlier.
Finally someone comes to the party! Well, sorta. The 9100 MAX was able to beat the P4800X when pushing into the higher consistency metrics, but take note of the legend – it is only doing it at less than half of the overall IOPS (because the majority of its IOs are at a much higher latency).
One more comparison before we move on. Intel showed us a nifty QoS comparison between the P4800X and the P3700:
This chart ramps up IOPS while showing how QoS responds along the way. Where have I seen that before??? It's like those IOs are PACED or something 🙂
I've kept Intel's colors but added the Micron 9100 MAX (gold). Micron can reach higher IOPS loading at 70/30 before it saturates, but the P4800X's maximum (99.999%) latency remains lower than the average of the 9100 and P3700, while the P4800X's average latency is a full magnitude (10x) lower. I've added a few labels on the average plot lines to denote the QD associated with each product at that level of load. The NAND products have to push into virtually unattainable queue depths to reach the performance levels that the P4800X simply breezes through.
the endurance and performance
the endurance and performance are impressive, and those prices are impressively high too!
Is it possible to get optane drives with slower speeds and same endurance? I mean, it seems like it would be cheaper and I’d be ok with SSD speeds we have now, just that endurance is really nice. I would literallly never replace the drive due to endurance.
Why would making it slower
Why would making it slower make it cheaper?
They make Optane drives that
They make Optane drives that are significantly cheaper at a slightly reduced endurance. They are called 900P.
Those are significantly
Those are significantly cheaper as compared to the new optane drives but are still WAY more expensive than sata ssds.
I think the idea is if the optane drive is much slower and still really good endurance that because it is slower it would mean even cheaper pricing.
Think about it, the faster devices are faster because hardware is more expensive to drive those devices faster.
Any real world testing ?
Like
Any real world testing ?
Like is this worth using in compile servers and workstation ?
If this save me 10 minutes a day in compile time, I would buy it.
But IOPS numbers doesn’t say much…
It really is workload
It really is workload dependent, and as we've found in our other research on Optane, it varies wildly by application. No specific real-world test would give you your answer unless we just happened to test your exact application on your exact hardware configuration. That said, we did note significant performance increases in similar applications – they are documented in this white paper.
Further, you should be able to monitor storage activity for your particular workload on your particular platform. If access times are totaling 10+ minutes for what you are doing, there's a good chance Optane will bring that number down significantly.
Many thanks again,
Many thanks again, Allyn.
It’s very gratifying to see Optane graduate
from questionable promises to production devices.
Guys trust me. Intel is
Guys trust me. Intel is making leaps and bound progress in making Optane win. I work for them. This is just the beginning. Prod Spec will only get better from here. End of 2018 there will be a Optane memory product along with storage.