Performance Focus – 960 PRO 2TB
I think the easiest way to get through these new results for the first time is to just get the charts out there and walk you through them, so here goes. This page is meant to focus on results specific to the subject of the review – in this case, the 960 PRO 2TB.
Before we dive in, a quick note: I’ve been analyzing the effects of how full an SSD is on its performance. I’ve found that most SSDs perform greater when empty (FOB) as they do when half or nearly filled to capacity. Most people actually put stuff on their SSD. To properly capture performance at various levels of fill, the entire suite is run multiple times and at varying levels of drive fill. This is done in a way to emulate actual use of the SSD over time. Random and sequential performance is re-checked on the same areas as additional data is added. Those checks are made on the same files and areas checked throughout the test. Once all of this data is obtained, we again apply the weighting method above in order to balance the results towards the more realistic levels of fill. The below results all use this method.
I'll start you guys off easy. This is sequential performance. The 'Burst' nomenclature denotes the way the workload is applied. The 960 PRO is an MLC-based SSD without any hybrid caching at play, so for this drive, Burst results match Saturated results. I've standardized on Burst as it will better show true performance for the SLC-caching SSDs that we test down the line.
Speeds do look good (3.4 GB/s reads!), though things get a bit wonky at crazy high Queue Depths, likely due to the fact that we had to hack the 950 PRO driver to work with the 960 PRO (long story). Testing the 960 PRO with the Microsoft InBox NVMe driver yielded extremely poor and inconsistent write performance.
Now I'll ease you into random access. The blue and red lines are read and write, and I've thrown in a 70% R/W mix as an additional data point.
Something our readers might not be used to is the noticeably higher write performance at theses lower queue depths. To better grasp the cause, think about what must happen while these transfers are taking place, and what constitutes a ‘complete IO’ from the perspective of the host system.
- Writes: Host sends data to SSD. SSD receives data and acknowledges the IO. SSD then passes that data onto the flash for writing. All necessary metadata / FTL table updates take place.
- Reads: Host requests data from SSD. SSD controller looks up data location in FTL, addresses and reads data from the appropriate flash dies, and finally replies to the host with the data, completing the IO.
The fundamental difference there is when the IO is considered complete. While ‘max’ values for random reads are typically higher than for random writes (due to limits in flash write speeds), lower QD writes can generally be serviced faster, resulting in higher IOPS. Random writes can also ‘ramp up’ faster since writes don’t need a high queue to achieve the parallelism which benefits and results in high QD high IOPS reads.
Our new results have way more data to comb through, so I'm just tossing in some bonus material here. A sampling of the added data we have to choose from, should you feel daring enough to dive into the spaghetti below:
There will be additional data on this page of reviews moving forward, with cool bonuses like a Write Cache Test for those SLC+TLC SSDs. Here's a sample, using the 600p:
Cache size? 16GB. Wasn't that easy?
(man does that thing get stuttery when its cache is full)
Lets move on to the comparisons.







Perhaps fanciful, but i agree
Perhaps fanciful, but i agree it could be a killer app?
“Conclusion: we have now reached a new era
in which mass storage is capable of performing
at close to the same sequential performance
as volatile DDR3 DRAM. Four such M.2 SSDs
in RAID-0 mode == ~8TB (before formatting).”
My take on it would be a less ambitious 2 drive raid 0 of 512gm 960 ssds.Best performing and cheaper.
PCIe Gen 3.0 allows 1GB ps per lane, bidirectionally, so 2GB per lane theoretical max.
OR, 8GB ps for the 4 lane dual M.2 ports on moboS.
In theory thats sufficient to max out 2 raid 0 960 ssdS, but 3500MB ps sequential reads (writes are 2100MB), are of course unidirectional.
so in theory it seems raid 0 pair of 960s yields 4000MB sustained, read or write.
I am pretty sure we will see 8 lanes available to m.2 mobo sockets (even w/ bargain AMD Ryzen mobos & cpus (32 lanes BTW)), allowing 7000/4400 MB ps read write in theory, w/o fancy controllers.
I dunno the numbers for ram bandwidth. a lot better am sure. not sure thats a deal breaker for my argument.
point is, 7000/4400MB are numbers in a league of their own compared to anything before – even in the server world. Its a new paradigm for coders.
ok, using it for virtual memory isnt as fast as real memory, but shit its big. I dunno enough about architecture etc., but a TB of ram may open many possibilities for completely new approaches to old coding problems.
the killer benefit of ssdS was fast random access. It transformed our PCs.
~150MB ps sequential was livable, access times were the killer on HDDs performance.
As many have said re the 960, more of the same will be barely noticed by many.
give a gamer 1 TB of passable virtual memory, and apps which use it, then that could be revolutionary.
it bears repeating btw, that IOPS has shown even more stellar performance gains in the 960, and I imagine thats important for virtual memory. As we hear, many consider this the main reason to spend the extra for the 960 over the 950.
PS, upon reflection, poor
PS, upon reflection, poor mans raid 0 on 4 lanes is still attractive for swap/page files, even with little read speed gain. Write speed almost doubles from a theoretical 2200 MB ps to 4000MB ps.