Latency Distribution and Latency Percentile
For a very long time now, I have hated the idea of plotting average latencies for SSDs. The reason is that you could have an SSD with a great average but a group of IO’s falling under a horrible maximum latency. You’d think the answer is to then simply use the maximum latency figures for tests, but that unfairly biases the results against an SSD that had *just one* IO that ran high during the test (which could simply be an unlucky context switch on the test host system itself). The only way to properly solve this problem is to start tracking all IO’s during tests.
Iometer 1.1.0 default latency bins
Intel took a crack at this by adding ‘latency bins’ to Iometer. A new build was posted just before the March 2014 press event held at Folsom, where we were also briefed on these updates. Latency bins are ranges of latency in which each IO is ‘sorted’. Intel’s purpose for this Iometer change was to help demonstrate the performance consistency of their SSD 730 Series SSDs. This added some granularity to Iometer’s output (we were no longer stuck with just average and maximum latency), but the problem was that they were necessarily very coarse. They were just good enough to demonstrate that Intel's SSDs were not throwing excessively long IOs when pushed past saturation, but that was just about all they were good for. Adding more buckets easily overloads Iometer’s bin sorting routine (*every* IO latency must be sorted as it comes in, and you can’t add a bunch of code to a loop that may be executed over 200,000 times per second). With each bin covering such a wide range of latencies, you could have two different SSDs with all of their IO’s falling into the same bucket. In a worse case, the second SSD results might be just on the other side of the line, falling into the next bin and causing it to appear far worse than the first. So we definitely need more bins, or some other way of doing things. If we can increase the resolution of the capture, the resulting data can be used to create a clean histogram from the results and we can then plot the latency specific performance of a given storage device.
Since the Folsom event, I’ve been working out a better way to get what I wanted. Since no tool out there could do it, I would just have to roll my own. The only way around the coarse bin issue was to create a capture system that could give us infinite resolution of the IO latencies pouring in from the devices under test. Let’s start with an example of that output:
Latency Distribution
Latency Distribution on linear vertical scale (click to enlarge)
The X axis above represents the Latency Distribution. The scale is logarithmic, spreading latencies across six decades (every major mark is 10x greater than the previous), making the units 10ns, 100ns, 1ms, 10ms, 100ms, 1s, and 10s. This type of data would normally be presented as a histogram bar chart, but we have sufficient resolution that we can plot the data as an unsmoothed line. The 50/decade figure was simply chosen to make the plotting job easier on Excel, but it is more than sufficient for our purposes here, and significantly higher and more evenly spread than the ~20 total bins provided in the new Iometer. The resolution chosen for the above chart represents more than 300 bins!
The vertical scale represents the number of IOs that fall into a particular latency (for a given second). The IOPS of a storage device is equivalent to the area under the associated curve. Showing this axis linearly makes more sense, but sometimes we must shift to log scale when including devices with relatively low IOPS next to others with very high IOPS:
Latency Distribution on logarithmic vertical scale (click to enlarge)
With log scale, the three HDD results which were previously stuck on the axis line can now be seen. The thing to keep in mind when looking at the heights on the log scale is that the higher parts of the peak are more significant than the lower parts when figuring the latency of the majority of IO’s. Don’t rule out the lower parts entirely though (why this is important will be seen below). We shouldn’t dwell on the Latency Distribution, as the real benefit in obtaining the above results is the much clearer picture you can derive from them:
Latency Percentile
Latency Percentile (click to enlarge)
If the Latency Distribution was overwhelming, this Latency Percentile should make things a bit clearer. The plot lines represent the area under the curve of the previous plot (corrected to 100%). Each line will climb from 0% up to 100% as it accounts for every IO and its respective latency. It makes the latency profiles of these devices painfully clear, but it is important to remember that the lines do not represent or indicate the IOPS of the device. I have included IOPS as part of the legend to help keep things in perspective.
Here is the breakdown of the results, starting with the slowest:
- The three HDDs are obviously the slowest of the bunch here. Latencies range anywhere from .01s (10ms) to nearly a second. The Latency Percentile lets us see the clear distinction between the three disk speeds (5400 vs 7200 vs 10k RPM). Spinning the disk faster shifts the curve to the left.
- The trusty old SATA G.Skill FlashSSD is actually a rebranded first gen Samsung SLC SSD. It saturates far earlier than QD32, and it is very slow, but as we can see by the near vertical line that almost looks like one of the gridlines, man is that thing consistent. I’ve been using these as the OS drives in our storage testbeds for just this reason. Note that this SSD gives us ~30x the IOPS of the HDDs, but the faster SSDs here turn in 10-30x greater IOPS at the QD=32.
- Next up is the RevoDrive 350. Why is this monster of a PCIe SSD in the list *behind* a pair of *SATA* SSDs? It’s not so much the VCA controllers fault as it is the very long IO pipeline of the SandForce controllers it is pushing. Finally we are able to see just how much more latent SandForce (even a RAID of them in this case) is compared to other SATA SSDs.
- Next is the Kingston HyperX Predator, which is extremely close to the result of the Intel SSD 730 (see the zoomed version below to more easily see the spread).
- Next is the Intel SSD 730. To keep the scale in check here, we are now 1/10th the latency of the FlashSSD and 1/100th to 1/1000th the latency of the HDDs!. The 730 was great when it launched, and can still outperform the previous PCIe SSDs in this list, but it is now eclipsed by:
- The Samsung 850 PRO, which outmaneuvers the SSD 730 thanks to its faster controller and faster VNAND flash.
- The 950 PRO turned in similar IOPS and very close average latencies, but with the help of our new data we are able to tell where the differences between these two capacities lie. We see that the first 55% track nearly identically, but then the smaller capacity starts to taper off. This is likely because the 256GB model has half the die count compared to the 512GB model. With 32 IO’s stacked up in the queue, the model with the fewer dies has a greater chance of some of the IO’s piling up behind a given die, which means that some of the IO’s will have to wait just a little longer to be serviced. This leads to a longer taper towards 100%. If you go back and look at the first chart you may now be able to pick out the difference there as well.
Here is a final chart expanding out the faster SSDs:
Latency Percentile – Zoomed (click to enlarge)
Here are the devices tested, laid out in order of performance:
We have a lot more of this data to comb through (varying queue depths, percentages of drive fill, etc), and in future reviews we will be shifting away from the off-the-shelf benchmarks and more towards our fully custom solutions and results. The above results were from fully preconditioned and randomly written SSDs, but future consumer pieces will incorporate partially filling / fragmenting SSDs. QD32 100% Read was chosen as a workload representing heavy consumer-level random reads (booting process of a fully loaded system, heavy content level loads of games, simultaneous app launching, etc). Some of these SSDs can scale higher with higher QD, but that is unrealistic even on power user machines. Feedback on the above is welcome in the comments and will be taken into consideration as I further develop this testing.
Got it installed yesterday,
Got it installed yesterday, clean install from a thumb-drive using rufus and GPT with W10 Threshold 2….plus it activated no problem. This is on a Maxuimas VII Z97 MB with the latest bios installed…about 6 minute on the install. Here’s a couple of links of screenshots using Magician 4.9 (just came out a couple of days) and CystalMark…
http://i822.photobucket.com/albums/zz143/fvbounty/cystal%201.jpg
http://i822.photobucket.com/albums/zz143/fvbounty/samsung%204.9%20first%20run.jpg
Here’s a link to a picture of
Here’s a link to a picture of temps running Cystalmarks….
http://i822.photobucket.com/albums/zz143/fvbounty/HD%20Sentinal3.jpg
After a fruitless week I am
After a fruitless week I am not able to load Windows 7 & boot from my Samsung the Pro 950 M.2 NVMe PCIe 256GB SSD when fitted to my Asus Z170 Deluxe Motherboard (latest BIOS v1302). Using both the Samsung Utilities for the Pro 950 M.2 NVMe PCIe SSD, I can see this device within the Windows 7 environment & know it works, but I just cannot load my W7 OS onto this card.
I have tried using the Windows 7 Rescue Disc, after cloning my W7 OS system onto the Pro 950 card, but this card just does not appear to exist in the DOS environment!
I am waiting for Asus to reply to my plea for help, but I am not hopeful.
I believe the answer is going to be with new BIOS update from American Megatrends, see link:
http://ami.com/news/press-releases/?PressReleaseID=338&/American%20Megatrends%20Announces%20Support%20for%20NVMe%E2%84%A2%20Host%20Interface%20in%20Aptio%C2%AE%20V%20UEFI%20Firmware/
Since Windows 10 includes a
Since Windows 10 includes a native NVMe driver, and can be installed and run for 30 days (without a product key) for free, why not try with that O.S. and see if the Samsung 950 Pro NVMe SSD can succeed at booting, whereas Win 7 was unable to do so. May require certain UEFI bios settings (such as: disabling CSM), as well as a complete wipe of any existing partitions, letting the Win10 installer create fresh ones.
I take it that you know that
I take it that you know that you have to install windows on this SSD with a UEFI bios setup? Also Samsung have released their own NVME driver. You can find it here:
http://www.samsung.com/global/business/semiconductor/minisite/SSD/global/html/support/downloads.html
Allyn,
Great Review.
Allyn,
Great Review.
I’ve got a question about using the 950 Pro M.2 on my new build. I’ve got an MSi Z97M gaming motherboard, it has an M.2 slot (X2 speed) & also supports NVMe in the UEFI/BIOS. Would I benefit from using the 950 Pro M.2 over the 850 EVO M.2 drive.. or would the 950 Pro M.2 be limited by the X2 M.2 slot? I’m looking at either the 256GB 950 Pro or the 500GB 850 EVO, if the Z97 M.2 slot is going to limit the speed of the 950 Pro to that of the 850 EVO, I’ll probably just go with the latter!? I’m still new to the way M.2 works so, thanks for the assistance.
Again, Thanks. Phil B.
Hi Phil B. This reply is
Hi Phil B. This reply is probably a bit late for you. I have both a 500G 850 EVO and 950 Pro in an i7-6700 build (Asus Z170M MOBO). The 950 is blazingly fast on M.2 NVMe with circa 1,500 MB/s write and 2,400 MB/s reads. For single threaded work on a desktop it’s great. Eg copying 1G files is almost sub second. However if you throw lots of work at it Eg a big W10 update it grinds to 100% busy with latencies over 1,000ms. I even got a peak atency of 10,000ms running Performance Test 8. In these circumstances the 850 is faster overall.
I have a “GA-Z97X-SLI” which
I have a “GA-Z97X-SLI” which has entrance to SSD M.2 10 / Gbs, if I buy a Samsung 950 Pro M.2, it will work 100%? with maximum efficiency?
Sadly, no. I also have the
Sadly, no. I also have the same board, and from what I understand, it only supports the first-gen NVMe M.2 drives at full speed. At best, it’ll work at half-speed.
Mind you, that’s according to what the manual says. I e-mailed Gigabyte about that too, and they were only slightly better than completely unhelpful.
So, again, not having actually tried it, I would say yes, but don’t expect it to perform at full capacity, not with this board.