Mixed Burst R/W Throughput, Load Times, and Latency Percentile

In an attempt to better represent the true performance of hybrid (SLC+TLC) SSDs and to include some general trace-style testing, I’m trying out a new test methodology. First, all tested SSDs are sequentially filled to 100%. Then the first 8GB span is pre-conditioned with 4KB random workload, resulting in the condition called out for in many of Intel’s client SSD testing guides. The idea is that most of the data on an SSD is sequential in nature (installed applications, MP3, video, etc), while some portions of the SSD have been written to in a random fashion (MFT, directory structure, log file updates, other randomly written files, etc). The 8GB figure is reasonably practical since 4KB random writes across the whole drive is not a workload that client SSDs are optimized for (it is reserved for enterprise). We may try larger spans in the future, but for now we’re sticking with the 8GB random write area.

Using that condition as a base for our workload, we now needed a workload! I wanted to start with some background activity, so I captured a BitTorrent download:

This download was over a saturated 300 Mbit link. While the average download speed was reported as 30 MB/s, the application’s own internal caching meant the writes to disk were more ‘bursty’ in nature. We’re trying to adapt this workload to one that will allow SLC+TLC (caching) SSDs some time to unload their cache between write bursts, so I came to a simple pattern of 40 MB written every 2 seconds. These accesses are more random than sequential, so we will apply it to the designated 8GB span of our pre-conditioned SSD.

Now for the more important part. Since the above ‘download workload’ is a background task that would likely go unnoticed by the user, we also need is a workload that the user *would* be sensitive to. The times where someone really notices their SSD speed is when they are waiting for it to complete a task, and the most common tasks are application and game/level loads. I observed a round of different tasks and came to a 200MB figure for the typical amount of data requested when launching a modern application. Larger games can pull in as much as 2GB (or more), varying with game and level, so we will repeat the 200MB request 10 times during the recorded portion of the run. We will assume 64KB sequential access for this portion of the workload.

Assuming a max Queue Depth of 4 (reasonable for typical desktop apps), we end up with something that looks like this when applied to a couple of SSDs:

In the above example, the OCZ Trion 150 (left) is able to keep up with the writes (dashed line) throughout the 60 seconds pictured, but note that the simultaneous read requests occasionally catch it off guard. Apparently, if some SSDs are busy with a relatively small stream of incoming writes, read performance can suffer, which is exactly the sort of thing we are looking for here.

When we applied the same workload to the 4TB 850 EVO (right), we see an extremely consistent and speedy response to all IOs, regardless of if they are writes or reads. The 200MB read bursts are so fast that they all occur within the same second, and none of them spill over due to other delays caused by the simultaneous writes taking place.

Here is our new workload applied to a batch of SSDs including the 600p. I've added and pushed the NVMe / PCIe parts to the top of the list for easier comparison:

From our Latency Percentile data, we are able to derive the total service time for both reads and writes, and independently show the throughputs seen for both. Remember that these workloads are being applied simultaneously, as to simulate launching apps or games during a 30 MB/s download. The above figures are not simple averages – they represent only the speed *during* each burst. Idle time is not counted.

The bottom half of the chart (starting with the 850 EVO) represents the SATA bunch tested here. I've removed the smallest (120/128GB) capacities as they are not comparative with the tested group (those are included here if you need to look back at/for them). While the SATA results are all fairly consistent with eachother, the PCIe parts are more of a mixed bag. Moving up to the subjects of this review, we see the 600p turn in respectable read performance, approaching the SSD750 in read throughput. We also witnessed a surprising result in write speeds, as the 600p's SLC cache helped it beat out both the Samsung 950 Pro and the Kingston HyperX Predator! The Plextor M6e is historically known to turn in poor performance, and here we see it mixing in with the SATA parts.

Now we are going to focus only on reads, and present some different data. I’ve added up the total service time seen during the 10x 200MB reads that take place during the recorded portion of the test. These figures represent how long you would be sitting there waiting for 2TB of data to be read, but remember this is happening while a download (or another similar background task) is simultaneously writing to the SSD.

The 600p wasn't the fastest PCIe part in this comparison, but it was reasonably close to the SSD 750, and came in nearly twice as fast as the M6e and all SATA parts in this comparison.

Below are the Latency Percentile data that the above charts were derived from. Note how the 600p (light blue) comes very close to the performance of the SSD 750 (orange) and the 950 Pro (grey) in these results.

For a budget SSD, the 600p did very well here. Such a sharp contrast to how poorly it performs in saturated (legacy) benchmarks. Here are a few Latency Percentile comparisons at saturated (non-paced) levels:

Moral of the story: The 600p is a great drive so long as you don't hit it with sustained writes at >120 MB/s. Good thing we had these new tests on hand to show more realistic performance!

« PreviousNext »