5100 MAX 960GB – Saturated (High-Res) QoS and PACED IOPS Sweep
Required reading (some additional context for those unfamiliar with our Percentile testing):
- Introduction of Latency Distribution / Latency Percentile (now called IO Percentile)
- Introduction of Latency Weighted Percentile (now called Latency Percentile – client use)
- PACED workload application to enterprise SSDs
Quality of Service (QoS)
QoS is specified in percentages (99.9%, 99.99%, 99.999%), and uniquely spoken (‘three nines’, ‘four nines’, ‘five nines’). It corresponds to the latency seen at the top 99.x% of all recorded IOs in a run. Enterprise IT managers and system builders care about varying levels of 9's because those long latencies lead to potential timeouts for time-sensitive operations, and increasing the 9's is how they quantify more stringent QoS requirements. Note that these comparative results are derived from IO Percentile data and *not* from Latency Percentile data.
If you have a hard time wrapping your head around the 9's thing, It may be easier to flip things around and think about it from the standpoint of the remaining longest-latency IO's that haven't been accounted for as the plot progresses. As an example, the 99.9% line near the center of the vertical axis represents the top 10% of the top 1% (0.1%) of all recorded IOs, where 'top' means those IOs of the longest latency.
These plots are tricky to make, as they are effectively an inverse log scale. Each major increment up from the zero axis corresponds to the top 90%, and the next increment after that shows the top 90% *of that previous value*, meaning it's an asymptotic scale which will never reach 100%. The plots below essentially take the top portion of the IO Percentile results and spread them out, exponentially zooming in on the results as they approach 100%.
First a refresher of the stated specs:
…and now the results (X'd as appropriate – note these are for QD=1 (left most line)):
For those having a hard time picking out where the crossover points lie, here's a different way of looking at / spelling out things:
Note that the 99.999 specs in this pair of charts go off the scale high (*very* high in the case of reads), meaning the 5100 falls well within the spec.
PACED workloads – 'It's not how fast you go, it's how well you go fast!'
I'm expanding on the PACED workload application introduced in the Micron 9100 review. Instead of selecting just a few workloads, I've taken things a step further, sweeping the IO range and (with a lot of number crunching) plotting the resulting 9's at the various loads. The brief explanation of what we want to see is how the tail latency looks at more realistic levels of loading compared to how manufacturers typically rate their parts:
As you can see, reduced load generally lowers latency and enables greater consistency. The catch for the 5100 appears to be that it is already so consistent that even the slightest disturbances during the test run can cause spikes in the curve. Bear in mind that the 5100 is *well* below its rated max latency at even the highest points in these charts.