High Resolution Quality of Service (QoS) 4KB Random
Required reading (some additional context for those unfamiliar with our Percentile testing):
- Introduction of Latency Distribution / Latency Percentile (now called IO Percentile)
- Introduction of Latency Weighted Percentile (now called Latency Percentile)
Intro to PACED workloads – 'It's not how fast you go, it's how well you go fast!'
I'd considered laying out the typical Latency Percentile and IO Percentile data before going into the QoS, but honestly, it's just a slaughter across the board, so I'll cut straight to the chase:
Quality of Service (QoS)
QoS is specified in percentages (99.9%, 99.99%, 99.999%), and uniquely spoken (‘three nines’, ‘four nines’, ‘five nines’). It corresponds to the latency seen at the top 99.x% of all recorded IOs in a run. Enterprise IT managers and system builders care about varying levels of 9's because those long latencies lead to potential timeouts for time-sensitive operations, and increasing the 9's is how they quantify more stringent QoS requirements. Note that these comparative results are derived from IO Percentile data and *not* from Latency Percentile data.
If you have a hard time wrapping your head around the 9's thing, It may be easier to flip things around and think about it from the standpoint of the remaining longest-latency IO's that haven't been accounted for as the plot progresses. As an example, the 99.9% line near the center of the vertical axis represents the top 10% of the top 1% (0.1%) of all recorded IOs, where 'top' means those IOs of the longest latency.
These plots are tricky to make, as they are effectively an inverse log scale. Each major increment up from the zero axis corresponds to the top 90%, and the next increment after that shows the top 90% *of that previous value*, meaning it's an asymptotic scale which will never reach 100%. The plots below essentially take the top portion of the IO Percentile results and spread them out, exponentially zooming in on the results as they approach 100%.
Note that we have shifted the scale here to make it down to 1 microsecond as the P4800X is riding the 10us figure throughout these tests. We want QoS to ideally be a vertical line, and this is an extremely impressive result here. I didn't take these out to QD=256 as the P4800X saturates by QD=16 in all workloads. Further plot lines simply shift further to the right.
Here is an easier numerical chart plotting out the exact places where the QoS chart crosses the various latencies. Note the 50% mark (upper left), where the P4800X comes in at an average (50%) latency of less than 10us!
Alright, I've taken QD=1, 2, and 4 for the P4800X (blue), P3700 (green), and 9100 MAX (gold). Remember this is a log scale, so the competing products coming in a full major increment to the right indicates that they are 10x slower.
With reads being the majority of this mix, the P4800X results are nearly identical to 100% read, while the competing products taking a right turn into even longer latencies due to the increase in write demand. The P4800X doesn't seem to care in the least about the added writes and continues to dominate.
Now we see some elbows in the plot. Latencies are still great overall, but clearly the controller is doing some extra work, likely to provide wear leveling, etc.
Figures are still well within spec, though average (typical) latency has crept just over 10us. 100% writes is not a 'typical' workload, so I consider Intel's "Typical: <10us" claim to fall more into the 70/30 bracket covered earlier.
Finally someone comes to the party! Well, sorta. The 9100 MAX was able to beat the P4800X when pushing into the higher consistency metrics, but take note of the legend – it is only doing it at less than half of the overall IOPS (because the majority of its IOs are at a much higher latency).
One more comparison before we move on. Intel showed us a nifty QoS comparison between the P4800X and the P3700:
This chart ramps up IOPS while showing how QoS responds along the way. Where have I seen that before??? It's like those IOs are PACED or something 🙂
I've kept Intel's colors but added the Micron 9100 MAX (gold). Micron can reach higher IOPS loading at 70/30 before it saturates, but the P4800X's maximum (99.999%) latency remains lower than the average of the 9100 and P3700, while the P4800X's average latency is a full magnitude (10x) lower. I've added a few labels on the average plot lines to denote the QD associated with each product at that level of load. The NAND products have to push into virtually unattainable queue depths to reach the performance levels that the P4800X simply breezes through.