Performance Focus: 4x Optane Memory and 4x 960 PRO in VROC RAID-0

In the interest of speed, I'll be sticking with random and sequential reads only for the charts. Note that each array was fully sequentially written prior to testing (fresh out of the box SSDs that have never been written may 'cheat' and instantly return zeroes without even touching the flash, so the only real-world results can be obtained by reading from areas that have been previously written).

A matched set of four Optane Memory 32GB modules were used to evaluate IOPS and Latency, while a set of four Samsung 960 PRO 512GB SSDs were used to evaluate maximum throughput.

4KB Random Read:


Jumping right into these results, we have two groupings of IOPS curves. The bottom set shows the IOPS response of an increasing number of 960 PRO SSDs added to a RAID-0. The top set represents the same, but with varying numbers of Optane Memory modules in place of the 960 PROs.

Note how the IOPS performance of Optane is far superior to one of the fastest NAND SSDs we've tested to date. Four 960 PROs can only beat a single 32GB Optane Memory module, but can only do so at QD=32, and only because the Optane part had saturated by QD=8, giving Samsung time to catch up. With Optanes in a RAID, all bets are off, though there was a peculiarity noted at the lower queue depths, where it seemed any RAID configuration lost nearly half of its performance advantage over the NAND arrays. This becomes more clear if we break down the results in a different way, focusing more closely on the lower queue depths:

Note how the far left dark blue (QD=1) bar starts off at nearly 100,000 IOPS, but any of the next three blue bars fall closet to 50,000 IOPS. More on that shortly.


Let's start by focusing on that lower left point. 10 microseconds is in line with the expected latency of Optane Memory (as observed in our prior detailed analysis of that part). Unfortunately, it appears that any form of VROC RAID applied adds 6 microseconds of latency. We've actually seen that number before in our triple M.2 RAID testing of the Z170 platform, but I was hoping for less of a negative impact with this newer platform, especially since the VMD controller is at the CPU / hardware level. Still, remember we are dealing with pre-release, well, everything here, so this is obviously subject to optimizations and improvements.

One general note on the above chart before we move on. Note that as you add SSDs, the latency profile rotates clockwise, effectively flattening and making it to higher QD's before curving upwards (latency begins to spike high due to increased controller/media loading).

The QD=1-4 bar chart makes the latency differences between Optane and 960 PRO painfully obvious.

128KB Sequential:

Note that we choose 128KB sequential as the kernel will break requests >128KB into multiple 128KB chunks issued in parallel (and at an effectively higher QD than desired).

Now we get to the fun part. The bottom cluster (starting from the ~1GB/s point and spanning out) are the Optane parts. These only link at PCIe 3.0 x2 and are not meant to excel at sequential performance. Still, by QD=8 we see them spread out to an even stack of increasing throughputs nearing 6GB/s. A single 960 PRO, with its x4 link and a controller channel layout better optimized for sequentials bisects the Optane throughputs, falling between the x2 and x3 Optane configurations. The rest of the 960 PRO configurations handily beat the Optane parts in sequential performance.

QD=16 is about as high as we've seen in our trace recording of Windows bulk file copy operations, so I've ended the bar chart spread at that depth. QD=32 is a moot point here anyway, as all configurations reached saturation closer to QD=8.

And now the chart you all came here to see:

Here we are looking only at QD=32 for the Optane and 960 PRO spread from a single to a quad-SSD RAID-0. We would ideally expect linear scaling here, and that appears to be exactly what happened. Quad Optane Memory 32GB hit 5.6GB/s, while quad 960 PRO 512GB achieved over 13.2 GB/s! We've certainly come a long way from the DMI bottlenecked Z170/Z270 limit of 3.6GB/s.

« PreviousNext »