Performance Comparisons – Mixed Burst
These are the Mixed Burst results introduced in the Samsung 850 EVO 4TB Review. Some tweaks have been made, namely QD reduced to a more realistic value of 2. Read bursts have been increased to 400MB each. 'Download' speed remains unchanged.
In an attempt to better represent the true performance of hybrid (SLC+TLC) SSDs and to include some general trace-style testing, I’m trying out a new test methodology. First, all tested SSDs are sequentially filled to 100%. Then the first 8GB span is pre-conditioned with 4KB random workload, resulting in the condition called out for in many of Intel’s client SSD testing guides. The idea is that most of the data on an SSD is sequential in nature (installed applications, MP3, video, etc), while some portions of the SSD have been written to in a random fashion (MFT, directory structure, log file updates, other randomly written files, etc). The 8GB figure is reasonably practical since 4KB random writes across the whole drive is not a workload that client SSDs are optimized for (it is reserved for enterprise). We may try larger spans in the future, but for now we’re sticking with the 8GB random write area.
Using that condition as a base for our workload, we now needed a workload! I wanted to start with some background activity, so I captured a BitTorrent download:
This download was over a saturated 300 Mbit link. While the average download speed was reported as 30 MB/s, the application’s own internal caching meant the writes to disk were more ‘bursty’ in nature. We’re trying to adapt this workload to one that will allow SLC+TLC (caching) SSDs some time to unload their cache between write bursts, so I came to a simple pattern of 40 MB written every 2 seconds. These accesses are more random than sequential, so we will apply it to the designated 8GB span of our pre-conditioned SSD.
Now for the more important part. Since the above ‘download workload’ is a background task that would likely go unnoticed by the user, we also need is a workload that the user *would* be sensitive to. The times where someone really notices their SSD speed is when they are waiting for it to complete a task, and the most common tasks are application and game/level loads. I observed a round of different tasks and came to a 200MB figure for the typical amount of data requested when launching a modern application. Larger games can pull in as much as 2GB (or more), varying with game and level, so we will repeat the 200MB request 10 times during the recorded portion of the run. We will assume 64KB sequential access for this portion of the workload.
Assuming a max Queue Depth of 4 (reasonable for typical desktop apps), we end up with something that looks like this when applied to a couple of SSDs:
The OCZ Trion 150 (left) is able to keep up with the writes (dashed line) throughout the 60 seconds pictured, but note that the read requests occasionally catch it off guard. Apparently if some SSDs are busy with a relatively small stream of incoming writes, read performance can suffer, which is exactly the sort of thing we are looking for here.
When we applied the same workload to the 4TB 850 EVO (right), we see an extremely consistent and speedy response to all IOs, regardless of if they are writes or reads. The 200MB read bursts are so fast that they all occur within the same second, and none of them spill over due to other delays caused by the simultaneous writes taking place.
Now that we have a reasonably practical workload, let’s see what happens when we run it on a small batch of SSDs:
From our Latency Percentile data, we are able to derive the total service time for both reads and writes, and independently show the throughputs seen for both. Remember that these workloads are being applied simultaneously, as to simulate launching apps or games during a 30 MB/s download. The above figures are not simple averages – they represent only the speed *during* each burst. Idle time is not counted.
Now we are going to focus only on reads, and present some different data. I’ve added up the total service time seen during the 10x 400MB reads that take place during the recorded portion of the test. These figures represent how long you would be sitting there waiting for 4GB of data to be read, but remember this is happening while a download (or another similar background task) is simultaneously writing to the SSD.
Perhaps fanciful, but i agree
Perhaps fanciful, but i agree it could be a killer app?
“Conclusion: we have now reached a new era
in which mass storage is capable of performing
at close to the same sequential performance
as volatile DDR3 DRAM. Four such M.2 SSDs
in RAID-0 mode == ~8TB (before formatting).”
My take on it would be a less ambitious 2 drive raid 0 of 512gm 960 ssds.Best performing and cheaper.
PCIe Gen 3.0 allows 1GB ps per lane, bidirectionally, so 2GB per lane theoretical max.
OR, 8GB ps for the 4 lane dual M.2 ports on moboS.
In theory thats sufficient to max out 2 raid 0 960 ssdS, but 3500MB ps sequential reads (writes are 2100MB), are of course unidirectional.
so in theory it seems raid 0 pair of 960s yields 4000MB sustained, read or write.
I am pretty sure we will see 8 lanes available to m.2 mobo sockets (even w/ bargain AMD Ryzen mobos & cpus (32 lanes BTW)), allowing 7000/4400 MB ps read write in theory, w/o fancy controllers.
I dunno the numbers for ram bandwidth. a lot better am sure. not sure thats a deal breaker for my argument.
point is, 7000/4400MB are numbers in a league of their own compared to anything before – even in the server world. Its a new paradigm for coders.
ok, using it for virtual memory isnt as fast as real memory, but shit its big. I dunno enough about architecture etc., but a TB of ram may open many possibilities for completely new approaches to old coding problems.
the killer benefit of ssdS was fast random access. It transformed our PCs.
~150MB ps sequential was livable, access times were the killer on HDDs performance.
As many have said re the 960, more of the same will be barely noticed by many.
give a gamer 1 TB of passable virtual memory, and apps which use it, then that could be revolutionary.
it bears repeating btw, that IOPS has shown even more stellar performance gains in the 960, and I imagine thats important for virtual memory. As we hear, many consider this the main reason to spend the extra for the 960 over the 950.
PS, upon reflection, poor
PS, upon reflection, poor mans raid 0 on 4 lanes is still attractive for swap/page files, even with little read speed gain. Write speed almost doubles from a theoretical 2200 MB ps to 4000MB ps.