PC Perspective Enterprise Test Suite – 4K Random
Taking a good hard look at the items we pointed out on the previous page, along with the current enterprise review landscape, I noted that the large amounts of data that can be obtained from a typical run through an enterprise testing suite is extremely challenging to present to the typical reader. Since many organizations will sample products and perform their own in-house performance testing in their specific environment, a review such as this should serve more as a ‘rough test’ that will direct their initial testing decisions. With that in mind, our tests will aim to be as generic and non-specific as possible, as storage professionals need the ‘raw data’ with which to make their testing (and ultimately purchasing) decisions. They typically come in to a review armed with some specifics about their intended usage, such as the type of workload, R/W percentage, server demand (IOPS), maximum acceptable latency, and other factors.
That means my task as a reviewer is to perform the following:
- Formulate test sequences that will yield steady state performance values.
- Test the devices in as controlled of an environment as possible.
- Collect and analyze the resulting test data.
- Distill the results into the simplest and most direct format possible.
That last one is the tough one. When I looked at other enterprise SSD reviews from the standpoint of a system builder, despite the available charts and graphs, I was typically left with questions unanswered. Some reviews would report a given result with a few select queue depths, while others would select other variables to display at a single queue depth. I asked myself how we could answer these questions for our readers without expanding into an unwieldy number of charts and graphs. I made my first challenge with the data (and for this piece specifically) to distill enterprise SSD performance into just two charts for each given workload.
This first chart shows the achievable steady state IOPS (Y) at varying R/W percentages (X). Each plotted line corresponds to the performance at a given queue depth. An additional Y axis has been added with MB/sec throughput values that correspond to the IOPS at this workload (this is simply a proportional axis to help those looking for a specific throughput at that workload). This chart is useful as the starting point, and contains three dimensions of data on a 2D chart.
Looking at the data, we can see how many QD levels (plotted lines) it takes to reach maximum IOPS and therefore maximum throughput to the host. The P3608 ramps up very quickly on 100% writes (left side of the chart), but requires higher queue depths to achieve its maximum 100% read IOPS (right side of the chart).
This next chart was far trickier to implement, as it contains four dimensions of data on a 2D chart, but it also makes it a much more powerful tool when used properly. Three of the dimensions are a translation of the previous chart. IOPS and MB/s remain on the Y axis, but the R/W percentages are split into the plot lines, displacing queue depths, which are now labeled points along each plot line (and connected by the thinner lines for ease of use). This restructuring freed up the X axis for another dimension of data, and one of the most important pieces when dealing with enterprise SSDs – latency.
In the above chart, the P3608 reaches full write IOPS (cyan line) very quickly and at a very low queue depth. That line also sits at the firthest to the left of this chart, indicating that the P3608 is highly optimized to minimize latency during write operations. Read operations take longer simply because there are more steps that must be taken to look up and retrieve a piece of data from the flash. 4K random reads takes more effort to ramp up to full speed, not exceeding the rating of 850,000 IOPS until QD=256, but just look at that throughput achieved (right side y-axis)! That's just under 3.5 GB/sec worth of 4KB *random* reads – 70% of its maximum sequential throughput!
The next chart is simply a zoomed in version of the previous one, focusing on the lower queue depths:
To use this chart, find the line corresponding to your % Read workload and follow it up until it crosses your anticipated demand (IOPS). Where these intersect, note the approximate QD required to achieve this performance and finally trace down to the X axis for the corresponding (average) latency.
We are still in development of more detailed statistical analysis and advanced presentation methods for latency distribution at given workloads, however we did not have sufficient sample data at the time of this article to go live with those results. The new method will be far superior to previous reviews, as we will be taking the latency of *every* IO into account.