Latency Percentile – Intro and Comparative Results


Our exclusive Latency Distribution / Latency Percentile testing was a long time in the making and was first introduced in my 950 Pro review (longer explanation at that link). To put it briefly, the thing that contributes greatest to the 'feel' of storage device speed is its latency. Simple average and maximum latencies don't paint nearly the full picture when it comes to the true performance of a given SSD. Stutters of only a few IO's out of the thousands delivered per second can simply be 'lost in the average'. This applies even if the average is plotted every second. The only true solution is to track the latency of each and every IO, no small feat when there are potentially hundreds of thousands (or millions) of IO's being delivered by the fastest SSDs.

Latency Distribution (V1)

Here the data has been converted into what is essentially a spectrum analyzer for IO latency. The more IO's taking place at lower latencies (towards the left of the 'spectrum') the better. While it is handy for seeing exactly where latencies fall for a given device, the results are generally hard to read and digest, so the data is further translated into a percentile:

Latency Percentile – IO Weighted (V1)

For those unfamiliar with this plot, the ideal result is a vertical line as far to the left as possible. Real world storage devices under load will tend to slant or slope, and some will 'turn' prior to hitting 100%, indicating that some of the IO's are taking longer (the point where the line curves back upwards indicates the latency of those remaining IO's).

This new testing has come a long way since it was first introduced. The most recent and significant change is to correct a glaring issue common to all IO percentile plots, caused by a bad assumption similar to that which comes with using averages. V1 Percentiles were calculated from the percentage of total IOs, which was in-line with what the rest of the industry has settled on. You might have seen enterprise SDS ratings claiming 99.99th (or some other variation e.g. 99.9% / 99.999%) percentile latency figures. As an example, a 99.99 percentile rating of 6ms would mean that 99.99% of all IOs were <= 6ms.

There is a flaw inherent in the above rating method. Using the 99.99% <= 6ms example above, imagine an SSD that completely stalled for one second in the middle of a 6-second run. For the other five seconds of the test, it performed at 200k IOPS. The resulting data would reflect one million total IO's and (assuming QD=1) a single IO taking a full second. The average IOPS would still be a decent 167k, but that nasty stutter was diluted – effectively 'lost in the average'. The same goes for 99.99% ("four nines") latency, which would miss that single IO. Despite hanging the entire system for 17% of the run, that single IO would not get caught unless you calculated out to 99.9999% ("six nines"), which nobody rates for.

The industry has settled on calculating this way mainly out of necessity and the limits of latency measurement. Most tools employ a coarse bucket scheme, meaning 99.99% values must be interpolated. Fortunately, our data gathering technique gives us far greater resolution into the data, meaning not only can we minimize interpolation, we can do something previously impossible. Getting away from IO-based percentages means we must correct our IO Percentile results by summing not just the IO's, but the time those IOs took to complete. When calculated this way, our hypothetical example above would show low latency only up to the 83% mark, where its result would ride that 83% line all the way to the one-second mark on the plot. With these percentiles now based on total time and not the unweighted sum of the IO's, we can more easily identify those intermittent stalls.

Latency Percentile – Time Weighted (V2)

I've created the above based on the new method but using the same source data as the earlier V1 plot. This data was based on reads, which typically don't suffer from the same inconsistent latencies seen in SSD writes. Even with more consistent results, we can see a difference in the plotted data. The RevoDrive 350 (red line) doesn't quite make it past 99% as quickly as it did in the V1 plot, and some of the faster SSDs taper off a bit earlier as well. The three HDDs also saw an impact, as longer seeks take up more of the total time of the run. If you're still not convinced as to the relevance or importance of this new presentation method, I'll just leave this worst-case example here, comparing the older IO-weighted results to the newer Latency-weighted translation of that same distribution data:

Yes, this was a real SSD and not a hypothetical example, even though it does mirror my example above rather remarkably.

Latency Percentile – Comparative Results

The workload chosen for these tests consists of completely filling an SSD with sequential data, then applying 4k random writes to an 8GB span. This is not the same as 'full span writes', which is more of an enterprise workload. Instead, we are emulating more of a consumer type of workload where only 8GB of the drive randomly written (typically by system log files, registry, MFT, directory structure, etc). The following is a random read Queue Depth sweep (1-32) of that same area, which tests how quickly the OS would retrieve those previously written directory structures and registry files.


I didn't have a lot of SATA percentile data handy for this review, and it does take some time to properly steady-state drives and collate the results, but I did include SSDs that also fall into the budget category. 750GB is certainly an 'odd' capacity, so I chose 500GB for the competing SSDs. Future reviews will have additional SATA results populated here.

At QD=1 reads, we noted higher latencies on the MX300, contributing to a halving of the observed IOPS when compared to the 750 EVO.

As we ramp up the queue depth, the MX300 turns in decent IOPS figures and starts to look a bit better on the latency up to QD=16…

…but at QD=32, Samsung's products kick things into overdrive with respect to Latency Percentiles. All three of those competing products hit nearly 100k IOPS and do so with an extremely consistent latency profile. The MX300 fares well at nearly 70k IOPS, but it sees nearly 10% of its IO time spent on IOs taking longer than 1ms.


Now for the fun part. These are all caching SSDs, meaning that during the test run, some IOs go to SLC while others go to TLC. I am currently developing new (currently prototype) methods of applying these write workloads in a more paced manner, which will more closely emulate typical consumer intermittent IO workloads. For now, we have to go into these results with the understanding that the workload is a 100-second crop out of a steady-state sustained application of random 4k writes to an 8GB span. Lower queue depths will naturally see lower demand, which means a greater chance that writes will go to SLC vs. TLC areas.

Starting at QD=1, we see the MX300 perform decently, nearly matching the older 840 EVO.

At QD=2 things start to spread out a bit. It's a close race between the MX300 and the new 500GB 750 EVO here, but the MX300 wins out on total IOPS.

The MX300 holds the 750 EVO at bay all the way to QD=32, where it turns in a respectable 53k IOPS. While the numbers are certainly good, there is just no catching the 850 EVO, which is also a 3D NAND (VNAND) SSD, but more expensive than both the MX300 and 750 EVO.

« PreviousNext »