PC Perspective Enterprise Test Suite – 4K Random
Taking a good hard look at the items we pointed out on the previous page, along with the current enterprise review landscape, I noted that the large amounts of data that can be obtained from a typical run through an enterprise testing suite is extremely challenging to present to the typical reader. Since many organizations will sample products and perform their own in-house performance testing in their specific environment, a review such as this should serve more as a ‘rough test’ that will direct their initial testing decisions. With that in mind, our tests will aim to be as generic and non-specific as possible, as storage professionals need the ‘raw data’ with which to make their testing (and ultimately purchasing) decisions. They typically come in to a review armed with some specifics about their intended usage, such as the type of workload, R/W percentage, server demand (IOPS), maximum acceptable latency, and other factors.
That means my task as a reviewer is to perform the following:
- Formulate test sequences that will yield steady state performance values.
- Test the devices in as controlled of an environment as possible.
- Collect and analyze the resulting test data.
- Distill the results into the simplest and most direct format possible.
That last one is the tough one. When I looked at other enterprise SSD reviews from the standpoint of a system builder, despite the available charts and graphs, I was typically left with questions unanswered. Some reviews would report a given result with a few select queue depths, while others would select other variables to display at a single queue depth. I asked myself how we could answer these questions for our readers without expanding into an unwieldy number of charts and graphs. I made my first challenge with the data (and for this piece specifically) to distill enterprise SSD performance into just two charts for each given workload.
This first chart shows the achievable steady state IOPS (Y) at varying R/W percentages (X). Each plotted line corresponds to the performance at a given queue depth. An additional Y axis has been added with MB/sec throughput values that correspond to the IOPS at this workload (this is simply a proportional axis to help those looking for a specific throughput at that workload). This chart is useful as the starting point, and contains three dimensions of data on a 2D chart.
Looking at the data, we can see how many QD levels (plotted lines) it takes to reach maximum IOPS and therefore maximum throughput to the host. The P3608 ramps up very quickly on 100% writes (left side of the chart), but requires higher queue depths to achieve its maximum 100% read IOPS (right side of the chart).
This next chart was far trickier to implement, as it contains four dimensions of data on a 2D chart, but it also makes it a much more powerful tool when used properly. Three of the dimensions are a translation of the previous chart. IOPS and MB/s remain on the Y axis, but the R/W percentages are split into the plot lines, displacing queue depths, which are now labeled points along each plot line (and connected by the thinner lines for ease of use). This restructuring freed up the X axis for another dimension of data, and one of the most important pieces when dealing with enterprise SSDs – latency.
In the above chart, the P3608 reaches full write IOPS (cyan line) very quickly and at a very low queue depth. That line also sits at the firthest to the left of this chart, indicating that the P3608 is highly optimized to minimize latency during write operations. Read operations take longer simply because there are more steps that must be taken to look up and retrieve a piece of data from the flash. 4K random reads takes more effort to ramp up to full speed, not exceeding the rating of 850,000 IOPS until QD=256, but just look at that throughput achieved (right side y-axis)! That's just under 3.5 GB/sec worth of 4KB *random* reads – 70% of its maximum sequential throughput!
The next chart is simply a zoomed in version of the previous one, focusing on the lower queue depths:
To use this chart, find the line corresponding to your % Read workload and follow it up until it crosses your anticipated demand (IOPS). Where these intersect, note the approximate QD required to achieve this performance and finally trace down to the X axis for the corresponding (average) latency.
We are still in development of more detailed statistical analysis and advanced presentation methods for latency distribution at given workloads, however we did not have sufficient sample data at the time of this article to go live with those results. The new method will be far superior to previous reviews, as we will be taking the latency of *every* IO into account.
@Allyn Any chance that Intel
@Allyn Any chance that Intel will release a 800 GB version of the P3608, in order to lower it to a more affordable price point for the enthusiast?
More than likely they will
More than likely they will not, as the P3608 is meant to get higher densities into smaller spaces. It would also limit each 'half' to only 400GB, which would offer limited performance that would be close to that of the 800GB P3600 in the first place.
Regarding use by enthusiasts, I would highly recommend going the new 800GB SSD 750 route as (or a pair of 400's in RST RAID). The 750 Series uses the same controller but has its enterprise temperature monitoring features disabled – those features interfered with many desktop class BIOS and caused memory contention / address conflict issues. The firmare is also more optimized for desktop / consumer workloads.
I’m not even mad I can’t
I’m not even mad I can’t afford one.
I’m just sitting here admiring the nice pictures. Making neat graphs like these should be performance art with tours of live shows. You rock, Allyn!
Thanks for the kudos! We’re
Thanks for the kudos! We're working hard on how we present this data, and will continue to improve on these charts.
Interesting review, but not
Interesting review, but not exactly a PC part. Giving it a gold award seems a bit pointless. No PC enthusiast should buy this part, or really anything in the DC P3xxx line. It is interesting to know what is going on in the enterprise market, since that tech will filter down to the PC market eventually, if it is something that is actually useful to the PC market. I don’t know if devices like this will have a place in the PC market before it is displaced by other technology though.
I realize that the site is
I realize that the site is called PC Perspective, but this is an enterprise review. A handfull of sites cover both PC and enterprise storage devices. For the moment, we are doing it without spinning off another site or brand. With Intel's RST for Z170 NVMe devices and RSTe to bridge both halves of this device, you're correct that it may filter down to the PC market. Actually, the same RST tech can currently RAID SSD 750's (not RAIDed for that piece, but now it is possible).
OK- I really need your help.
OK- I really need your help. I have a 1.6TB P3608 and have it installed on an X99 chipset motherboard. I have tried every version of RSTe I can find and I cant for the life of me get the P3608 to detect in RSTe. The P3608 shows up fine in Disk Manager and I can even set up a RAID from within Disk Manager (albeit at the expense of being able to TRIM the array).
Can you please explain which version of RSTe driver and UI you used?
Any word on pricing? Not that
Any word on pricing? Not that I could ever afford one, I’m pretty sure its more expensive than the rest of my PC. (PS, I know its for data centers and not for a regular enthusiast, but damn I want it so bad).
Nothing new when you look at
Nothing new when you look at “ordinary” P3600. I was expecting lame PLX chip as it is much cheaper way than actually making two SSD working in tandem without lane switcher on same card. Sadly no hardware RoC is available for NVMe ATM.
While review is interesting from raw performance standpoint, it is not relevant at all to PC market as 3608 is purely server grade, industrial storage that will never reach enthusiast market – at least not in this shape. More interested in what you hinted above about seriously more expensive P3700.
Allyn have you tested that setup in RAID1/10 (if you have 2)? Would be interested in that, how much hit NVMe gets on writes with this setup vs classic NAND AHCI. R0 is pointless exercise from my point of view. Redundancy over performance any day of the week.
We reviewed the P3700 before.
We reviewed the P3700 before. I've run the workload on the P3608 and both P3700's in a RAID-0. RSTe had no issue pegging all drives on sequentials (10 GB/sec reads), but you need to throw more cores at it for random IO as compared to addressing the drives individually. More detail on the level of overhead in the next piece covering RSTe, as there is a lot of data I need to compile for it. Might do a RAID-10 data in that piece as well if the testbed is still assembled when I'm back at the office next week.
I see you recommend the Intel
I see you recommend the Intel 750 800GB for the pro-sumers out there. Would you recommend it over the new Samsung 950 Pro 512GB coming out next month?
Those two different SSDs are
Those two different SSDs are going to have their own use cases. The 950 PRO will only be available in M.2 and at 512GB max (initially), while the SSD 750 is available in 800GB and 1.2TB. The 950 PRO should be a lower cost, but those without an M.2 slot will need an adapter. I think they will be close enough on performance that it will boil down more to fitment and cost.
Why would they use a PEX8718
Why would they use a PEX8718 chip? You don’t need 16x PCIe 3.0. 8X would suffice.
The chip has 16 PCIe lanes
The chip has 16 PCIe lanes *total*, some of which need to connect to the controllers. This one is configured to send 8 lanes to the host and 4 lanes to each controller. 8+4+4 = 16.
Thanks for clarification.
Thanks for clarification. That makes more sense now.
Those graphs man, hard to
Those graphs man, hard to wrap my head around some of them
Lots of data in a small
Lots of data in a small space, but if you know what your specific workload is, I think they get the job done.
just Wow !
just Wow !
we’re putting together a unix
we’re putting together a unix rig to sit in a data center and just compute 24/7. as many cores as we can afford, dual gpu nvidia and 64 gigs ram.
programmer is deranged by 3608 for boot (and everything really) and i want to make sure lanes are sufficient.
any 2011v3 boards stand out for this use?