Introduction
IOSP and GB/s does *not* tell the whole story!
NVMe was a great thing to happen to SSDs. The per-IO reduction in latency and CPU overhead was more than welcome, as PCIe SSDs were previously using the antiquated AHCI protocol, which was a carryover from the SATA HDD days. With NVMe came additional required support in Operating Systems and UEFI BIOS implementations. We did some crazy experiments with arrays of these new devices, but we were initially limited by the lack of native hardware-level RAID support to tie multiple PCIe devices together. The launch of the Z170 chipset saw a remedy to this, by including the ability to tie as many as three PCIe SSDs behind a chipset-configured array. The recent C600 server chipset also saw the addition of RSTe capability, expanding this functionality to enterprise devices like the Intel SSD P3608, which was actually a pair of SSDs on a single PCB.
Most Z170 motherboards have come with one or two M.2 slots, meaning that enthusiasts wanting to employ the 3x PCIe RAID made possible by this new chipset would have to get creative with the use of interposer / adapter boards (or use a combination of PCI and U.2 connected Intel SSD 750s). With the Samsung 950 Pro available, as well as the slew of other M.2 SSDs we saw at CES 2016, it’s safe to say that U.2 is going to push back into the enterprise sector, leaving M.2 as the choice for consumer motherboards moving forward. It was therefore only a matter of time before a triple-M.2 motherboard was launched, and that just recently happened – Behold the Gigabyte Z170X-SOC Force!
This new motherboard sits at the high end of Gigabyte’s lineup, with a water-capable VRM cooler and other premium features. We will be passing this board onto Morry for a full review, but this piece will be focusing on one section in particular:
I have to hand it to Gigabyte for this functional and elegant design choice. The space between the required four full length PCIe slots makes it look like it was chosen to fit M.2 SSDs in-between them. I should also note that it would be possible to use three U.2 adapters linked to three U.2 Intel SSD 750s, but native M.2 devices makes for a significantly more compact and consumer friendly package.
With the test system set up, let’s get right into it, shall we?





That’s nice. Real nice.
That’s nice. Real nice.
Brilliant analysis on page 4,
Brilliant analysis on page 4, i’ll be checking into this setup. Simply brilliant.
$400 for a motherboard?!
$400 for a motherboard?! That’s what I spend to build an entire system. I just don’t see the value in it.
Horses for courses. This
Horses for courses. This motherboard was never meant to target the market sector that can do everything they need with a $400 system.
Many of us want or need significantly more power and are prepared to pay for it.
$400 for a motherboard?!
Try actually gaming on a high-end machine, or a machine that is running something that actually benefits from low-latency, high-IOP performance.
Yeah, that’s going to require you to spend more than $400 for a whole system.
You’re essentially trying to compare a skateboard to a supercar.
On that note I can assemble a
On that note I can assemble a baked potato for under 5 dollars. Doesn’t mean it’ll run crisis on 7 vms hooked up to 7 freesync monitors. It might be able to run dota though ;}
Yeah those considering triple
Yeah those considering triple NVMe RAID are definitely in the serious power user category.
I’m not sure, since I can’t
I’m not sure, since I can’t find proper Xeon boards with enough channels, I’ve only been able to connect one 512gb 950Pro to my system with 25 256GB 850Pro drives to as many dedicated SAS lanes as I could manage (not nearly enough). That’s hosting the storage for my workstation which is currently 44 Xeon cores with an additional 32 in transit now. The system has a little more than 800GB of RAM and uses dual 40Gb/sec Infiniband as a host bus and 10Gb/sec internet uplink.
Does this count as a power user? I didn’t add any GPUs since I didn’t have any need for them, but I’m considering adding a front-end device which hosts GPU as well.
I am using this configuration as a part-time data center for hosting labs for courses, but normally, I just use it for programming and compiling code and experimenting. I suspect I’ll be up to around 2TB RAM and 200+ Xeon cores before 2017. My goal is to do it in a single rack with absolute resiliency. I have 4U sucked up with 52 3.5″ hard drives though. They’re big and ugly, but 400TB of SSD is still too expensive.
Oh that’s certainly power
Oh that's certainly power user, but a different type of power user. Depending on the IOPS capabilities of the RAID cards you are using, this triple M.2 setup might be able to beat 25 SATA SSDs in some performance metrics.
This board is a high-end
This board is a high-end model. It looks like the main feature is a PLX chip which converts the 16 PCIe lanes from the CPU out to 32 PCIe lanes. This allows all 4 x16 slots to operate at x8 with 4 video cards installed. You still only get x16 bandwidth to the cpu though.
How many GPUs can you use
How many GPUs can you use when all 3 m.2 slots are populated?
WEll, PLX isn’t that great,
WEll, PLX isn’t that great, its a stop gap until the Skylake-E’s are out I wouldn’t suggest it. Without it you would have 3 of these at 3.0 x4, so you would have x4 available for your video card. With the fast switching from the PLX you could do x16 and have x4 left over, but from every instance I Have dealt with PLX (mostly Z87 and Z97 boards) it is is not really even close to true 3.0 x16 performance. It only shows any returns when you have more than 4 cards installed. Personal opinion from experience, stay away from it. If you need the lanes go Haswell-E.
This of course was for shiz and giggles, to see what they could push it to, but if this was a “real” build I would only use 2 950 Pro’s and the GPU at x8 with a non-PLX board (if they have any with 2 M.2 slots).
The difference between x8 and x16 is CURRENTLY negligible for gaming, might change with DX12.
What are the *practical*
What are the *practical* applications for this other than RAID1 for drive redundancy?
NVMe’s increased in bandwidth has little to no tangible benefits over SATA-based SSDs. The only real benefit is high IOPS for database interactions.
Can we get application load times (OS, productivity and games), processing time analysis on video rendering, file compression/decompression, file copy, etc.?
When can we expect real-world benchmarks?
RAID-1 is certainly doable
RAID-1 is certainly doable for a pair of drives, and RAID-5 would be the more efficient choice for three (we talk about that part on page 3).
The effect of the reduced latency is faster response with a lot going on on the system (heavy loading, multiple apps hitting the array simultanrously). There is no existing consistent test using actual simultaneous launching of applications, so the closest we can come is with the testing we are conducting here. The reader will have to decide, based on their particular demand on their storage, how high they will be filling the queue (this can be monitored in Windows), if the reduction in latency is of benefit to them.
I do have an extension of this testing that will also evaluate as the SSD is filled and TRIMmed, but for now the setup was random access to an 8GB span of a full SSD / array.
For someone building a
For someone building a computer that’ll mainly be used for gaming plus the usual everyday use scenarios, would the addition of a 950 pro provide a noticeably faster experience compared to a SATA ssd such as the 850 Evo or Pro?
Other than the increased
Other than the increased performance over synthetic benchmarks, you won’t see any discernible and tangible differences. I bought the 512GB 950 Pro NVMe drive to replace my 500GB 850 EVO as my boot drive and Windows 10 and all my applications load just as fast. Loading games such as BF4, SWBF, WoWS w/ tons of mods, and anything in my Steam library loads about 0.2~0.8 seconds faster with the 950 Pro.
Is it work the extra cost? YMMV, but for me it was not.
I eventually put that 950 Pro to test against a 480GB Seagate 600 into my web and database server and found it to be worth it in there with the much lower latency on DB queries and the ability to have more concurrent connections.
I think what Allyn and Ryan need to say out right and not have the assumption that readers will just figure out is that we are up against the laws of diminishing returns. Meaning, as SSDs get faster and faster with different NAND types, controllers and protocols like NVMe, we, as consumers, will start seeing less and less benefits. So what if you can shave off a few fractions of a second off loading your OS or an application? I am waiting to see what XPoint has to offer since it is magnitudes faster than current SSD technology. Perhaps it will usher in a newer performance benchmark. Or be a victim of diminishing returns…
Hi,
If you compare the read
Hi,
If you compare the read write figures for a SATA connected SSD against a M.2 SSD (Samsung Pro Evo in particular) either using a M.2 slot or a Pcie slot with an adaptor then you will find the PCIe NVME SSDs are more than three times faster overall.
I have my OS on 1 X Samsung Evo M.2 and another two Evos (2 X M.2,s are on my motherboard) and another on an PCIe adapter card. ALL of the M.2,s regardless of connectivity perform at almost the exact same level. I also have 1x SSD drive connected via a SATA port for further storage.
I play games on my computer and ALL of them are stored on my M.2’s. If I had known how much faster PCIe M.2’s were over SSD,s I would never have even bought a SSD in the first place.
If you don’t want to spend a lot of money to speed up your current computer without buying new CPU, MOBO, RAM, then invest in a PCIe M.2 connect either directly to the M.2 slot on the MOBO or if your current MOBO has not got a M.2 slot then buy an adapter for a few pounds and connect via a spare PCIe slot. Load your OS onto the M.2 and it will feel like you have supercharged your PC, load times from off to Windows will take about five seconds. use the spare capacity on the M.2 for Games and apps (Office for example). EVERYTHING you will do on your computer is SO much faster.
I would never go back to physical hard drives and or even SSD. Just make sure that any M.2 you buy is PCIe because the SATA M.2’s are no faster than SSD’ s, although they are still 3-4 times faster than physical HD’s.
A very easy way to differentiate between PCIe/SATA M.2’s is that PCIe M.2’s have only one slot at the end of the stick and SATA M.’s have two slots. Maybe a long winded answer but M.2’s (PCIe) are much much faster than SATA connected SSD’s .
Regards
Frank
To echo the other
To echo the other contributors, will this shave a few seconds off load times?
Perhaps a few fractions of a
Perhaps a few fractions of a second.
Thx guys, nice work!
@ Allyn,
Thx guys, nice work!
@ Allyn, can you explain the difference between these two cases:
1) 2x 256Gb 950pro in raid 0
2) 1x 512Gb 950pro
Both cases have the same amount of memory chips to distribute the load over, but in the raid case, you have twice the controller, is this the advantage?
Reads will see the same type
Reads will see the same type of boost as with 2x 512's. Writes will see the same effect / proportion scaling up from one to two 256's, but since the 256GB model has lower write performance to start with, two 256s will not beat two 512s.
Even with the slower write speed of the 256GB model, a pair of 256s will still beat a single 512 in all but low QD (1-2) latency. Everything else will be better – higher sequential writes (~1.5x) and reads (2x), higher random performance at moderate QD, etc.
Interesting, the real
Interesting, the real question is why though.
Isn’t an ssd controller not similar to a raid controller?
I was thinking a single 512 would be able to distribute the load similarly as 2×256 would in raid 0. I think that is true when only considering the memory chips. So where does the extra performance of the 2×256 come from?
The SSD controller will have
The SSD controller will have a fixed number of channels. The 512 GB model just has twice the amount of memory attached to each channel. I believe Intel SSD controllers use 18 channels. I am not sure how many the Samsung controller uses. They wouldn’t want to set up the controller to use half the number of channels with the 256 GB model since it would be effectively half the performance. You are not distributing across individual flash die, you are distributing across the channels of the controller. Twice the amount of flash die doesn’t mean twice the performance. Double the number of channels can double the bandwidth though, if there is no bottleneck elsewhere.
There does seem to be an
There does seem to be an effect on write with more flash die, even with the same number of channels. I don’t know exactly how this works.
why does one ssd appear to be
why does one ssd appear to be upside-down?
That was our first sample
That was our first sample unit, didn't have the full retail sticker on it.
The space between the
The space between the required four full length PCIe slots makes it look like it was chosen to fir M.2 SSDs in-between them
Typo there, I think u mean “fit”
Fixed, thanks!
Fixed, thanks!
You messed up your graphs –
You messed up your graphs – you labeled the x-axis as nanoseconds, when it should be microseconds. Is the 6ns RAID overhead meant to actually be 6 microseconds?
Crap, you’re right!
Crap, you're right! Corrections incoming! Thanks for the catch!
Cool, thanks! Looks real
Cool, thanks! Looks real good.
Clearly not enough PCIe and
Clearly not enough PCIe and m.2 ports.
Joking aside, it seems kinda weird to be able to save physical space on a board that big which would most likely be put in a roomy case. Maybe because it is such an expensive MB meant for high end enthusiasts looking for options on builds and/or mods?
Otherwise, very interesting review, great write up!
One of the biggest benefits
One of the biggest benefits to M.2 in desktop computers IMO is that the mobo delivers the power. If you only use M.2 storage, that’s one less power cable you need, and less clutter.
yea, how do we hot swap those
yea, how do we hot swap those with the video card on top of them?
Perhaps this is a silly
Perhaps this is a silly question, but are the log scales for the graphs on page 4 labeled accurately? The scales jump from nano-scale (1e-9) to milli-scale (1e-3), but shouldn’t the micro-scale (1e-6) be included in-between? If true, this would make the latency time reduction of running 3 drives in RAID only 1 order of magnitude instead of the 2-3 orders shown above.
Regardless, excellent analysis guys!
Yup, thanks for the catch,
Yup, thanks for the catch, updating things now!
Why is this prrsented as new,
Why is this prrsented as new, the ASRock FATAL1TY Z170 PROFESSIONAL GAMING I7 also has 3 m.2 slots.
This was the first board we
This was the first board we could get in capable of triple M.2.
would be curious to see how
would be curious to see how QD and latency compare when running in RAID1 or RAID5 configurations.
Roughly:
RAID-1 (2 SSDs):
Roughly:
RAID-1 (2 SSDs): Reads are similar to RAID-0. Writes are similar to single SSD.
RAID-5 (3 SSDs): Reads are ~ 2-3 SSD RAID-0 figures, writes are ~ 1-2 SSD RAID-0 figures.
Be advised there is additional CPU overhead in RAID-5 due to parity calcs.
Re: RAID1 reads are similar
Re: RAID1 reads are similar to RAID0… I’m surprised that it’s not closer to single SSD… sounds like no integrity / validation among both disks?
A common misconception.
A common misconception. RAID1 typically does not read data from both drives and then compare to see if they are the same, it will divide the reads across both drives and use the sector CRC’s to ensure data integrity and only then will it switch to reading the other drive for the bad sector(s).
RAID-1 typically reads back
RAID-1 typically reads back data in 'performance' mode, meaning it stripes across the drives as if they were in RAID-0. No error checking happens here, but you can tell RST to 'Verify' the array, which will scrub both drivers front to back and compare data.
Those are some pretty
Those are some pretty impressive iops numbers. This seems an ideal setup for a high core count, write intensive OLTP database system.
I’m wondering about the DMI bottleneck though. I understand why putting the SSDs behind the chipset allows for UEFI-level RAID configuration. However, say that you don’t want to use Intel RST, but instead rely on Linux’ MD-RAID or Solaris’ ZFS, then it would be better to have the m.2’s wired directly to the CPU, no ? Then again, the question then becomes *where* you’re going to get data to and from at a sufficient pace to keep that SSD array busy enough on a consumer level system like Z170.
Interesting article. Thank you very much for taking the time to document and share your findings.
Remember we were only writing
Remember we were only writing randomly to an 8GB span of sequentially filled SSDs here. OLTP would randomly write to a much larger span of the SSD (if not all of it), so to get good sustained random write performance you will need enterprise SSDs which can better handle sustained workloads to 100% of the volume.
(The latency principles still apply though).
the simplest explanation
the simplest explanation would be comparing it to multithreading
Moar IOPS = Lower Latency.
Moar IOPS = Lower Latency. Simple math.
This totally does *not* apply
This totally does *not* apply when a queue is involved. For example, the OCZ R4 hit very high IOPS, but used SandForce SSD controllers in a RAID to get there, so individual IO latency was far higher than what we are seeing here.
Allyn: You put in a ton of
Allyn: You put in a ton of work on this. Thank you for sharing! My wife has been griping her computer is slow and I always get the “good” stuff for myself, which is absolutely true, lol. So I was looking at all the latest technology, and was really wondering about the RAID 5 aspect with the 3 each M.2 connectors. You answered the questions I had. I have been using RAID 5 exclusively for many years, and my wife’s old computer (~8 yrs) has a 1.5 TB raid 5 C: drive, which always has to rebuild if the system locks down, which can take a day or more. Raid 5 still works, of course, but slows down considerably when rebuilding. So, I am a little paranoid about using raid 5 for the C: drive. I use a single SSD C: drive and a 3TB raid 5 D: on my own computer. Your comments about loading the system using a GPT external USB drive are crucial. I obviously am rusty on the latest bios settings terminology, but I have built my own computers for the last 20 years, one every 5 years with the latest stuff, so there is always a learning curve since I do it so seldom and technology changes.
Your article helps a Lot. Thank You!
System builder.
System builder.
I am about to build a triple
I am about to build a triple M.2 RAID0 system using three 512GB Samsung 950 Pro’s based on a Asrock Extreme 7+ motherboard.
Has anyone here done this already?
I am thinking of doing the
I am thinking of doing the same. How did that go for you – and what RAID level did you use?
Great write-up! It was hard
Great write-up! It was hard to get through some of the technical details but Allyn promised the next page was going to be amazing. I was expecting a free computer offer or something. For real though amazing details. Really excited about my next build!
Great write up and
Great write up and interesting findings. But could future videos have higher depth of field? Only the background TV is in focus.
We were trying a new camera
We were trying a new camera for the video and we might not have had all settings tweaked properly.
Wow, amazing storage
Wow, amazing storage review!
I’m trying to decide whether to go 850 or 950 mSATA single 250gb. Three way RAID 950’s is a different universe of performance. Love the new latency visuals. Ryan, should let Allyn keep this setup (make that a Patreon theshold)
No Patreon needed on this
No Patreon needed on this one. After putting weeks of development work into creating this testing, I'll be using it on all storage reviews moving forward.
(that said, please consider contributing anyway!)
Perhaps slightly out of
Perhaps slightly out of context for this article, but can anyone comment on how this config would affect an SLI installation? I believe that 3 M.2’s and multiple graphics cards will take up more PCIE lanes that are available in the Skylake architecture.
so basically your SLI cards would be forced slower? Which would have priority to the PCIE lanes or is it all multiplexed somehow?
This uses PCIe from the
This uses PCIe from the chipset. You will lose all of the SATA ports off the chipset to do this. This will take PCIe 15 to 26 from the chipset. The lower PCIe links are still available for USB, network, other controllers, and probably last PCIe slot. The graphics cards would be running off the CPU PCIe lanes connected to the x16 slots. I don’t know if this board supports 3-way CrossFire by using an x4 from the chipset. That would run into bandwidth limitations due to the link between the CPU and the chipset. It isn’t really relevant anyway. I am not sure what applications you would be running at home to really stress this set-up at all. What you’d you be running to stress this set-up and your graphics system at the same time?
Actually, it is only 20 PCIe
Actually, it is only 20 PCIe lanes from the chipset. The HSIO lanes 1 to 6 are USB3 only, while some of the 20 PCIe lanes can be switched to SATA. Using 3 x4 m.2 takes 12 lanes, leaving 8 lanes for other controllers or slots.
It may speed up some more
It may speed up some more complex game loads, but where this would really shine would be the home user that has other disk-heavy processes taking place *while* gaming on that same system.
“The end result result of
“The end result result of this is a RAID of SSDs gives you a much greater chance of IOs being serviced as rapidly as possible, which accounts for that ‘snappier’ feeling experienced by veterans of SSD RAID.”
You like writing “result” apparently.
Hah! I *do* like results!
Hah! I *do* like results! (fixed)