Performance Comparisons – Mixed Burst
These are the Mixed Burst results introduced in the Samsung 850 EVO 4TB Review. Some tweaks have been made, namely, QD reduced to a more realistic value of 2. Read bursts have been increased to 400MB each. 'Download' speed remains unchanged.
In an attempt to better represent the true performance of hybrid (SLC+TLC) SSDs and to include some general trace-style testing, I’m trying out a new test methodology. First, all tested SSDs are sequentially filled to near maximum capacity. Then the first 8GB span is preconditioned with 4KB random workload, resulting in the condition called out for in many of Intel’s client SSD testing guides. The idea is that most of the data on an SSD is sequential in nature (installed applications, MP3, video, etc), while some portions of the SSD have been written to in a random fashion (MFT, directory structure, log file updates, other randomly written files, etc). The 8GB figure is reasonably practical since 4KB random writes across the whole drive is not a workload that client SSDs are optimized for (it is reserved for enterprise). We may try larger spans in the future, but for now, we’re sticking with the 8GB random write area.
Using that condition as a base for our workload, we now needed a workload! I wanted to start with some background activity, so I captured a BitTorrent download:
This download was over a saturated 300 Mbit link. While the average download speed was reported as 30 MB/s, the application’s own internal caching meant the writes to disk were more ‘bursty’ in nature. We’re trying to adapt this workload to one that will allow SLC+TLC (caching) SSDs some time to unload their cache between write bursts, so I came to a simple pattern of 40 MB written every 2 seconds. These accesses are more random than sequential, so we will apply it to the designated 8GB span of our pre-conditioned SSD.
Now for the more important part. Since the above ‘download workload’ is a background task that would likely go unnoticed by the user, we also need is a workload that the user *would* be sensitive to. The times where someone really notices their SSD speed is when they are waiting for it to complete a task, and the most common tasks are application and game/level loads. I observed a round of different tasks and came to a 200MB figure for the typical amount of data requested when launching a modern application. Larger games can pull in as much as 2GB (or more), varying with game and level, so we will repeat the 200MB request 10 times during the recorded portion of the run. We will assume 64KB sequential access for this portion of the workload.
Assuming a max Queue Depth of 4 (reasonable for typical desktop apps), we end up with something that looks like this when applied to a couple of SSDs:
The OCZ Trion 150 (left) is able to keep up with the writes (dashed line) throughout the 60 seconds pictured, but note that the read requests occasionally catch it off guard. Apparently, if some SSDs are busy with a relatively small stream of incoming writes, read performance can suffer, which is exactly the sort of thing we are looking for here.
When we applied the same workload to the 4TB 850 EVO (right), we see an extremely consistent and speedy response to all IOs, regardless of if they are writes or reads. The 200MB read bursts are so fast that they all occur within the same second, and none of them spill over due to other delays caused by the simultaneous writes taking place.
Now for the results:
From our Latency Percentile data, we are able to derive the total service time for both reads and writes, and independently show the throughputs seen for both. Remember that these workloads are being applied simultaneously, as to simulate launching apps or games during a 20 MB/s download. The above figures are not simple averages – they represent only the speed *during* each burst. Idle time is not counted.
660p does ok here, falling short of the higher performing parts but holding pace with the WD Black and the Toshiba XG5.
Now we are going to focus only on reads, and present some different data. I’ve added up the total service time seen during the 10x 400MB reads that take place during the recorded portion of the test. These figures represent how long you would be sitting there waiting for 4GB of data to be read, but remember this is happening while a download (or another similar background task) is simultaneously writing to the SSD. This metric should closely equate to the 'feel' of using each SSD in a moderate to heavy load. Total read service times should hopefully help you grasp the actual time spent waiting for such a task to complete in the face of background writes taking place.
Since the 660p has to work harder than TLC or MLC SSDs, that ends up being reflected in the read service times. Performance is better with a lighter workload (no background downloads), but this test was meant to be a bit more on the demanding side, so it's no surprise to see the 660p fall where it did in these results.
holy shitballs this is pretty
holy shitballs this is pretty impressive.
I cant wait to see this with Samsung Controllers and NAND. Though, at 20c per gig, getting one of these on a sale will be an insane steal.
Totally with you on price.
Totally with you on price. Intel has undercut the market a few times in the past and I’m happy to see them doing it again. I’d also like to see Samsung come down to this same price point.
Thought that too, when it
Thought that too, when it came out. But now it’s down to around 10c per gig, at least in Germany, which it should have launched at. Now I’m definitely considering getting one, but I might wait for a sale since I’m stingy.
You forgot an edit –
You forgot an edit – Toshiba:
PC Perspective Compensation: Neither PC Perspective nor any of its staff was paid or compensated in any way by Toshiba for this review.
Well, I’d be surprised if
Well, I’d be surprised if Toshiba paid for this review.
Fixed. Thanks for the catch
Fixed. Thanks for the catch guys!
PC per has a long history of
PC per has a long history of shilling for Intel and Nvidia at the cost of AMD. As far as I can tell they have no reason to change. Their motto is fake tech reviews and to hell what anyone thinks.
Yeah, such a long history of
Yeah, such a long history of that (PCPer.com is previously AMDmb.com / Athlonmb.com). Also funny how our results line up with other reviews. Must be some grand conspiracy theory against AMD. /sarcasm
This is why I wish Ryan would
This is why I wish Ryan would turn verified comments back on so asshats like the previous one don’t post. I don’t understand why it was turned off in the first place, it made the comment sections much more bearable and pleasant to read, now, not so much.
Allyn, Going way back to a
Allyn, Going way back to a conversation we had many months ago (years?), given the low price per GB, is there any performance to be gained by joining these QLC devices in a RAID-0 array? The main reason why I ask is the “additive” effect of multiple SLC-mode caches that obtains with a RAID-0 array. I’m using this concept presently with 4 x Samsung 750 EVO SSDs in RAID-0 (each cache=256MB), and the “feel” is very snappy when C: is the primary NTFS partition on that RAID-0 array. How about a VROC test and/or trying these in the ASRock Ultra Quad M.2 AIC? Thanks, and keep up the good work!
Yeah RAID will help as it
Yeah RAID will help as it does with most SSDs. For SSDs with dynamic caches, that means more available cache for a given amount of data stored, and a better chance that the cache will be empty since the given incoming write load is spread across more devices.
Many thanks for the
Many thanks for the confirmation. I don’t have any better “measurement” tools to use, other than the subjective “feel” of doing routine interaction with Windows. But, here’s something that fully supports your observation: the “feel” I am experiencing is snappier on a RAID-0 hosted by a RocketRAID 2720SGL in an aging PCIe 1.0 motherboard, as compared to the “feel” I am sensing on a RAID-0 hosted by the same controller in a newer PCIe 2.0 motherboard. The only significant difference is the presence of DRAM cache in all SSDs in the RAID-0 on the PCIe 1.0 motherboard, and the SSDs on the newer PCIe 2.0 motherboard have no DRAM caches. I would have expected a different result, because each PCIe lane in the newer chipset has twice the raw bandwidth of each PCIe lane in the older chipset. With 4 x SSDs in both RAID-0 arrays, the slower chipset tops out just under 1,000 MB/second, whereas the faster chipset tops out just under 2,000 MB/second.
p.s. Samsung 860 Pro SSDs
p.s. Samsung 860 Pro SSDs are reported to have 512MB LPDDR4 cache in both the 256GB and 512GB versions:
https://s3.ap-northeast-2.amazonaws.com/global.semi.static/Samsung_SSD_860_PRO_Data_Sheet_Rev1.pdf
As such, a RAID-0 array with 4 such members has a cumulative DRAM cache of 512 x 4 = 2,048MB (~2GB LPDDR4).
DRAM caches on SSDs very
DRAM caches on SSDs very rarely cache any user data – it’s for the FTL.
Thanks, Allyn. FTL = Flash
Thanks, Allyn. FTL = Flash Transaction Layer
https://www.youtube.com/watch?v=bu4saRek7QM
So the tests are done with
So the tests are done with practically a full drive, right? Written sequentially except for last 8GB which are written to randomly. In a normal drive even when My Computer says the drive is full there is still a little bit of space left over, so you put 18GB of space free. So is this test simulating what it’s like to have a full or close to full drive from the user’s perspective?
Anandtech’s tests made a big deal about performance changing from empty versus full. Anandtech didn’t figure out when that performance drops (if it’s a cliff or a gradual decline), but it almost makes the reader feel like you might want to buy double the capacity you normally need just to be safe. It’s probably not that bad, but it feels like that emotionally.
Performance gains due to
Performance gains due to drive being empty are typically leveled out once you hit 10-20% or so (lower if you’ve done a bunch of random activity like a Windows install, etc. My suite does a full pass of all measurements at three capacity points and then applies a weighted average to reach the final result. The average weighs half full and mostly full more heavily than mostly empty performance. The results you see in my reviews are inline with what you could expect with actual use of the drive.
“Heavy sustained workloads
“Heavy sustained workloads may saturate the cache and result in low QLC write speeds.”
Looks like up to a third of good HDD level, right? Scary.
A third sequentially. Random
A third sequentially. Random on HDD is still utter crap. Also, it’s extremely hard to hit this state in actual use. I was trying.
hey Allyn, is there a way to
hey Allyn, is there a way to include these few tests. one where exam QLC sequential write performance once SLC buffer fills up. another being similar to Anand’s sequential fragmentation sequential performance testing for both read/write.
The sustained write
The sustained write performance appears in two tests – saturated vs. burst (where I show it at variable QD – something nobody else does), and on the cache test, where you can see occasional dips to SLC-> QLC folding speed. Aside from a few hiccups it did very well and was able to maintain SLC speed during the majority of a bunch of saturated writes in a row. If you need more than that out of your SSD and the possibility of a slow down is unacceptable, then QLC is not for you and you’ll need to step up to a faster part.
oh and FFS PLEASE PLEASE
oh and FFS PLEASE PLEASE remove google recaptcha its a waste of time, it took me TEN minutes to solve and to make 1 post
And you wasted it on that?
And you wasted it on that?
Google Recaptcha and street
Google Recaptcha and street signs! All those damn street signs and no proper explanation of just what Google considers a street sign. If you get too good at solving the ReCrapAtYa the AI thinks you are an automated bot!
Google’s ReCrapAtYa AI has gone over to the HAL9000 side and is evil to the power of 1 followed By 100 zeros! Just like Google’s search AI that forces you to accept it’s twisted judgment of just what it thinks you are looking for that’s not actually what you where looking for. Google’s search engine has become the greatest time thief in history of research.
Google’s Recaptcha AI is the damn Bot and Google search now returns mostly useless spam results. Google is a threat to civilization!
Sorry. Without that we spend
Sorry. Without that we spend more time culling spam posts than we do writing articles.
Nice review, Allyn the dram
Nice review, Allyn the dram on the 660p is 256mb and not 1gb. http://www.nanya.com/en/Product/3969/NT5CC128M16IP-DI#
You can also confirm it with the other reviews of the 660p.
Why do you think intel choose that size instead of the classic 1mb dram for 1gb nand?
Do you think it hampered performance?
Dumb question time:
is it
Dumb question time:
is it possible to make the entire drive work in SLC mode? With the size of the drives these days I could sacrifice the space for the speed and reliability.
So long as you only
So long as you only partition/use 1/4 of the available capacity, the majority of the media should remain in SLC-only mode.
I wonder if there is a way to
I wonder if there is a way to force it at the firmware level. Might be a good selling feature. I am sure i am not the only overcautious nerd who would value a modern ‘SLC’ drive.
I didn’t see any mention of
I didn’t see any mention of which NVMe drivers were used during this review. Not sure if the Windows drivers are much different than Intel’s own drivers.
@Allyn, you mentioned in the
@Allyn, you mentioned in the podcast that you weren’t able to saturate the writes with a copy. Rather than doing a copy have you considered creating data in RAM and then writing that? For example, create a huge numpy float and write it as binary to disk. Or a simple C program that just writes random noise to disk in a while 1 loop. Maybe even just pipe /dev/urandom to a file in several different terminals at once.
Hello, Allyn!
Did you use
Hello, Allyn!
Did you use IPEAK to create custom trace-based test suite?
IPEAK and similar developer
IPEAK and similar developer tools were used to capture traces, but our suite's playback workloads are based on analysis of those results, not directly playing back the streams. We do this so that we can properly condition and evaluate SSDs of varying capacities, etc.
May I ask when these 660p
May I ask when these 660p NVMe ssds will be readily available in the market place? I see the 512GB model at Newegg.com but neither that sku or any other sku at Newegg.ca OR Amazon.ca OR anywhere… 🙁 I would like to buy the 1TB model personally.
Don’t buy from the evil
Don’t buy from the evil non-tax-paying Intel corporation. Crucial have a new 1Tb QLC nvme ssd, Write Endurance 200Tb, 1Gb dram cache, at newegg.ca (CA$192, US$145):
https://www.newegg.ca/Product/Product.aspx?Item=N82E16820156199&Description=crucial%20p1%20ssd&cm_re=crucial_p1_ssd-_-20-156-199-_-Product
First of all, thanks for all
First of all, thanks for all of your ridiculously in-depth storage reviews. PC Perspective is my first, and usually only, stop when looking to purchase new storage.
Second, I believe there is a typo on the “Conclusion” page. You listed the 2TB endurance as “200TBW” instead of the “400TBW” Intel specs it as on ARK.
Happy Veterans Day from a fellow vet. Thank you for your service!
All three capacities have
All three capacities have 256MB of DRAM, not 1GB. This was already pointed out by a previous reader.
Also, the 660p uses a static SLC cache that is 6GB, 12GB, or 24GB, along with a dynamic SLC pool.
It’s possible this drive is using Host Memory Buffer or compressing the LBA map.