PC Perspective Custom SSD Test Suite Introduction
Back in late 2016, we implemented a radically new test methodology. I'd grown tired of making excuses for benchmarks not meshing well with some SSD controllers, and that matter was amplified significantly by recent SLC+TLC hybrid SSDs that can be very picky about their workloads and how they are applied. The complexity of these caching methods has effectively flipped the SSD testing ecosystem on its head. The vast majority of benchmarking software and test methodologies out there were developed based on non-hybrid SLC, MLC, or TLC SSDs. All of those types were very consistent once a given workload was applied to them for long enough to reach a steady state condition. Once an SSD was properly prepared for testing, it would give you the same results all day long. No so for these new hybrids. The dynamic nature of the various caching mechanisms at play wreaks havoc on modern tests. Even trace playback testing such as PCMark falter, as the playback of traces is typically done with idle gaps truncated to a smaller figure in the interest of accelerating the test. Caching SSDs rely on those same idle time gaps to flush their cache to higher capacity areas of their NAND. This mix up has resulted in products like the Intel SSD 600p, which bombed nearly all of the ‘legacy’ benchmarks yet did just fine once tested with a more realistic, spaced out workload.
To solve this, I needed a way to issue IO's to the SSD the same way that real-world scenarios do, and it needed to be in such a way that did not saturate the cache of hybrid SSDs. The answer, as it turned out, was staring me in the face.
Latency Percentile made its debut in October of 2015 (ironically, with the 950 PRO review), and those results have proven to be a gold mine that continues to yield nuggets as we mine the data even further. Weighing the results allowed us to better visualize and demonstrate stutter performance even when those stutters were small enough to be lost in more common tests that employ 1-second averages. Merged with a steady pacing of the IO stream, it can provide true Quality of Service comparisons between competing enterprise SSDs, as well as high-resolution industry-standard QoS of saturated workloads. Sub-second IO burst throughput rates of simultaneous mixed workloads can be determined by additional number crunching. It is this last part that is the key to the new test methodology.
The primary goal of this new test suite is to get the most accurate sampling of real-world SSD performance possible. This meant evaluating across more dimensions than any modern benchmark is capable of. Several thousand sample points are obtained, spanning various read/write mixes, queue depths, and even varying amounts of additional data stored on the SSD. To better quantify real-world performance of SSDs employing an SLC cache, many of the samples are obtained with a new method of intermittently bursting IO requests. Each of those thousands of samples is accompanied by per-IO latency distribution data, and a Latency Percentile is calculated (for those counting, we’re up to millions of data points now). The Latency Percentiles are in turn used to derive the true instantaneous throughput and/or IOPS for each respective data point. The bursts are repeated multiple times per sample, but each completes in less than a second, so even the per-second logging employed by some of the finer review sites out there just won’t cut it.
Would you like some data with your data? Believe it or not, this is a portion of an intermittent calculation step – the Latency Percentile data has already been significantly reduced by this stage.
Each of the many additional dimensions of data obtained by the suite is tempered by a weighting system. Analyzing trace captures of live systems revealed *very* low Queue Depth (QD) under even the most demanding power-user scenarios, which means some of these more realistic values are not going to turn in the same high queue depth ‘max’ figures seen in saturation testing. I’ve looked all over, and nothing outside of benchmarks maxes out the queue. Ever. The vast majority of applications never exceed QD=1, and most are not even capable of multi-threaded disk IO. Games typically allocate a single thread for background level loads. For the vast majority of scenarios, the only way to exceed QD=1 is to have multiple applications hitting the disk at the same time, but even then it is less likely that those multiple processes will be completely saturating a read or write thread simultaneously, meaning the SSD is *still* not exceeding QD=1 most of the time. I pushed a slower SATA SSD relatively hard, launching multiple apps simultaneously, trying downloads while launching large games, etc. IO trace captures performed during these operations revealed >98% of all disk IO falling within QD=4, with the majority at QD=1. Results from the new suite will contain a section showing a simple set of results that should very closely match the true real-world performance of the tested devices.
While the above pertains to random accesses, bulk file copies are a different story. To increase throughput, file copy routines typically employ some form of threaded buffering, but it’s not the type of buffering that you might think. I’ve observed copy operations running at QD=8 or in some cases QD=16 to a slower destination drive. The catch is that instead of running at a constant 8 or 16 simultaneous IO’s as you would see with a saturation benchmark, the operations repeatedly fill and empty the queue, meaning the queue is filled, allowed to empty, and only then filled again. This is not the same as a saturation benchmark, which would constantly add requests to meet the maximum specified depth. The resulting speeds are therefore not what you would see at QD=8, but actually, a mixture of all of the queue steps from one to eight.
Conditioning
Some manufacturers achieve unrealistic ‘max IOPS’ figures by running tests that place a small file on an otherwise empty drive, essentially testing in what is referred to fresh out of box (FOB) condition. This is entirely unrealistic, as even the relatively small number of files placed during an OS install is enough to drop performance considerably from the high figures seen with a FOB test.
On the flip side, when it comes to 4KB random tests, I disagree with tests that apply a random workload across the full span of the SSD. This is an enterprise-only workload that will never be seen in any sort of realistic client scenario. Even the heaviest power users are not going to hit every square inch of an SSD with random writes, and if they are, they should be investing in a datacenter SSD that is purpose-built for such a workload.
Calculation step showing full sweep of data taken at multiple amounts of fill.
So what’s the fairest preconditioning and testing scenario? I’ve spent the past several months working on that, and the conclusion I came to ended up matching Intel’s recommended client SSD conditioning pass, which is to completely fill the SSD sequentially, with the exception of an 8GB portion of the SSD meant solely for random access conditioning and tests. I add a bit of realism here by leaving ~16GB of space unallocated (even those with a full SSD will have *some* free space, after all). The randomly conditioned section only ever sees random, and the sequential section only ever sees sequential. This parallels the majority of real-world access. Registry hives, file tables, and other such areas typically see small random writes and small random reads. It’s fair to say that a given OS install ends up with ~8GB of such data. There are corner cases where files were randomly written and later sequentially read. Bittorrent is one example, but since those files are only laid down randomly on their first pass, background garbage collection should clean those up so that read performance will gradually shift towards sequential over time. Further, those writes are not as random as the more difficult workloads selected for our testing. I don't just fill the whole thing up right away though – I pause a few times along the way and resample *everything*, as you can see above.
Comparison of Saturated vs. Burst workloads applied to the Intel 600p. Note the write speeds match the rated speed of 560 MB/s when employing the Burst workload.
SSDs employing relatively slower TLC flash coupled with a faster SLC cache present problems for testing. Prolonged saturation tests that attempt to push the drive at full speeds for more than a few seconds will quickly fill the cache and result in some odd behavior depending on the cache implementation. Some SSDs pass all writes directly to the SLC even if that cache is full, resulting in a stuttery game of musical chairs as the controller scrambles, flushing SLC to TLC while still trying to accept additional writes from the host system. More refined implementations can put the cache on hold once full and simply shift incoming writes directly to the TLC. Some more complicated methods throw all of that away and dynamically change the modes of empty flash blocks or pages to whichever mode they deem appropriate. This method looks good on paper, but we’ve frequently seen it falter under heavier writes, where SLC areas must be cleared so those blocks can be flipped over to the higher capacity (yet slower) TLC mode. The new suite and Burst workloads give these SSDs adequate idle time to empty their cache, just as they would have in a typical system.
Apologies for the wall of text. Now onto the show!
$.10 per GB or GTFO
$.10 per GB or GTFO
It’s hard to take comments
It’s hard to take comments like yours seriously.
People considered $1.00/GB a breakthrough affordability price point right up until it was reached. Then people considered it offensively expensive and said $0.50/GB was the affordability point. We passed that point as well and much the same thing occurred.
Now here we are with people complaining that disks are not $0.10/GB. If this new disk -were- that cheap, I suspect that you’d be complaining that it’s not $0.05/GB. Please recognize that you’re unlikely to ever be satisfied at any price point. Setting realistic expectations goes a long way toward fulfillment.
In the OP’s defense, $0.10/GB
In the OP's defense, $0.10/GB is a bit of a running joke on our podcast. Ryan want's it to happen yesterday, and I keep reminding him that we're just not there yet.
If not yesterday, how about
If not yesterday, how about tomorrow? 🙂
haha, glad someone got the
haha, glad someone got the reference.
I think this is not limited to just Ryan wanting it to be $.10/GB, I am pretty sure everyone wants that(by everyone, I mean customers).
So, 970 series is slightly
So, 970 series is slightly better than the 960 series. That’s what I got from this review.
Nothing escapes you
/s
Nothing escapes you
/s
That’s pretty much it.
That's pretty much it.
LOL @ prices.
LOL @ prices.
Some misinformation going on
Some misinformation going on here. IEEE1667 finally made it to the Samsung 960 EVO and Pro with the latest firmware. Or actually, the two latest in terms of the Pro but the previous got pulled.
The sad thing is that Samsung keeps complaining about UEFI firmware issues with most motherboards making it impossible to get IEEE1667/Microft Edrive to work with the 960 EVO and Pro as boot drives which is likely what they are being used as 99% of the time.
It’s not working with neither my Asus Maximus IX Apex or my Asus Maximus X Apex both running the latest BIOS/UEFI Firmware version and when contacting Asus about the problem they claim they don’t know anything about such a issue.
It works perfectly when the drive is being used as a secondary drive. So there seems to be something going on with the NVMe module in the UEFI firmware and how it loads in terms of Windows 10.
The big question is.. How does this all work with these new drives? Do they magically work without a UEFI Firmware fix like Samsung keeps claiming is needed for the 960 EVO and Pro and if so how is it that these new ones don’t require the same fix from motherboard manufactures?
Hopefully PC-Per and others can do some digging here.
Seems to be an issue with the
Seems to be an issue with the BIOS chipset manufacturers like Megatrends, Phoenix, etc, and Samsung has stated they are working with them to resolve the issue.
I consider IEEE1667 broken
I consider IEEE1667 broken for 960 until the community reports that it is working (especially after the firmware back and forths). Same goes for the 970. I'm taking Samsung at their word for this launch, but that will change if the community feedback is the same as it was for the 960. We have a limited sample size of systems that it may or may not work on, so this particular niche use case is better left to those more experienced in using it.
Would like to see a test
Would like to see a test measuring write speed using full disk encryption. The results would probably be similar to the saturated write on a full drive, but given that this is not an unusual setup these days, it might be interesting.
Modern SSDs encrypt to the
Modern SSDs encrypt to the disk regardless. Enabling encryption at the host level just changes the key.
Page “Performance Focus –
Page “Performance Focus – Samsung 970 EVO 250GB, 500GB, 1TB”, under sequential 250GB graph reads “1TB shows a cached (burst) write speed of 1.5GB/s, with sustained (saturated) writes falling off to ~300MB/s.”.
Probably needs to be changed to “250GB shows a cached […]”.
Thanks for the catch. Fixed!
Thanks for the catch. Fixed!
Im a little disappointed you
Im a little disappointed you guys weren’t able to do the same with the new WD Black . For friends,family, and myself i usually dont go higher than purchasing the 250GB. I then always use for storage a regular HDD. I see that the 1TB is on here , i already purchased the WD 250 and installed it last week. Would been cool to see both the WD 250GB VS the new Evo 250GB since there price the same. Like a budget to budget which should you get kinda deal since the better performance is usually is seen on the high capacity drives which is what was tested.
Were you able to use WD Black
Were you able to use WD Black 250 GB as a Windows 10 boot drive?
That was all we were sampled.
That was all we were sampled. I did ask for lower capacities…
What was the NVME driver
What was the NVME driver version used? Tom’s claims to get lower perf on the newest 1.3 drivers.
We used Samsung 3.0 drivers.
We used Samsung 3.0 drivers.
$0.01 per GB or GT*O
/s
$0.01 per GB or GT*O
/s
Any chances of a HP EX920
Any chances of a HP EX920 review? A few reviewers actually put it slightly faster than the 970 EVO at some tasks, especially real world testing and low queue depths all the while being much cheaper.
Allyn, I love how hyped you
Allyn, I love how hyped you are on storage, it’s ridiculous and awesome! I will not purchase a drive without it getting your “Editor’s Choice” stamp. ( my own money, sometime server budgets say otherwise ) Keep it up man!
Yes, we will soon be
Yes, we will soon be crowd-funding a prototype that will clone multiple copies of Allyn Malventano, for exclusive competitions against AI robots falsely claiming comparable knowledge, experience and analytical capabilities — kinda like famous chess matches with Russian masters of times past. My money is on Allyn (and clones), every time!
Hi guys, really appreciate
Hi guys, really appreciate your work. It would be really interesting to compare Samsung 970 with Seagate Nytro 3730 SSD Dual 12 Gb/s SAS 3D eMLC 400gb. Maybe in raid combination too. Just a thought that tickle senses. 🙂 Keep up the good work.
Alan, money no object someone
Alan, money no object someone is giving you a SSD for free. What do you choose Optane P900 or 970 Pro?
Money no object, I’d probably
Money no object, I'd probably do the 900P, but only in a >480GB capacity and if I had a spare slot to support it. M.2 is way more convenient for client storage.
Great review. Thank you for
Great review. Thank you for the context on scenarios wrt write speed after fast write. It shaped my purchasing decision. I always check w pcper Allyn before making a storage decision! I’m going to try one of these puppies on my trusty Z77 board which actually has an NVMe UEFI for M.2 boot. I’ll post results re: IEEE1667.
So, Allyn, upgrading from a
So, Allyn, upgrading from a 950pro 256 to a 970evo 1 gig much of an improvement besides the capacity? I have 3 950 pros in a soc force Mobo.