Investigating the issue
** Edit ** (24 Sep)
We have updated this story with temperature effects on the read speed of old data. Additional info on page 3.
** End edit **
** Edit 2 ** (26 Sep)
New quote from Samsung:
"We acknowledge the recent issue associated with the Samsung 840 EVO SSDs and are qualifying a firmware update to address the issue. While this issue only affects a small subset of all 840 EVO users, we regret any inconvenience experienced by our customers. A firmware update that resolves the issue will be available on the Samsung SSD website soon. We appreciate our customer’s support and patience as we work diligently to resolve this issue."
** End edit 2 **
** Edit 3 **
The firmware update and performance restoration tool has been tested. Results are found here.
** End edit 3 **
Over the past week or two, there have been growing rumblings from owners of Samsung 840 and 840 EVO SSDs. A few reports scattered across internet forums gradually snowballed into lengthy threads as more and more people took a longer look at their own TLC-based Samsung SSD's performance. I've spent the past week following these threads, and the past few days evaluating this issue on the 840 and 840 EVO samples we have here at PC Perspective. This post is meant to inform you of our current 'best guess' as to just what is happening with these drives, and just what you should do about it.
The issue at hand is an apparent slow down in the reading of 'stale' data on TLC-based Samsung SSDs. Allow me to demonstrate:
You might have seen what looks like similar issues before, but after much research and testing, I can say with some confidence that this is a completely different and unique issue. The old X25-M bug was the result of random writes to the drive over time, but the above result is from a drive that only ever saw a single large file write to a clean drive. The above drive was the very same 500GB 840 EVO sample used in our prior review. It did just fine in that review, and at afterwards I needed a quick temporary place to put a HDD image file and just happened to grab that EVO. The file was written to the drive in December of 2013, and if it wasn't already apparent from the above HDTach pass, it was 442GB in size. This brings on some questions:
- If random writes (i.e. flash fragmentation) are not causing the slow down, then what is?
- How long does it take for this slow down to manifest after a file is written?
Just to double check myself, and to try and disturb our 'stale data' sample as little as possible, I added a small 10GB test file and repeated the test:
Two important things here:
- An additional HDTach read pass did not impact the slow read speeds in any way. This means that if there is some sort of error process occurring, nothing is being done to correct it from pass to pass.
- The 10GB file appears at the end of the drive. For those curious, the saturation speed (nice flat line at the max SATA speed) is simply how Samsung controllers return requests for unallocated (i.e TRIMmed) data. Since the SSD has zero work to do for those requests, it can instantly return zeroed out data for those requests.
Now to verify actual file reads within Windows. Simplest way shown here:
New 10GB file:
'Old' image file:
The above copies (especially the large older file) are nearly identical speed profiles to what was seen in HDTach. It's important to do this double check when using HDTach as a test, since it uses a QD=1 access pattern that doesn't play well with some SSD controllers. Not the case here, as despite the slow downs, the EVO's controller itself is snappy, but appears to be dealing with something else slowing down the process of data retrieval.
Let's dig further, with some help from the community:
What’s a good alternative to
What’s a good alternative to a Samsung EVO ?
The OCZ Vertex 460 is a fast
The OCZ Vertex 460 is a fast alternative that can be found on sale of a good price…
Agreed, I bought a Vertex 460
Agreed, I bought a Vertex 460 120GB for about $75 on Amazon the other day :). So far so good!
OCZ drives are of very poor
OCZ drives are of very poor quality. They use low-quality components and push them too hard to achieve the high speeds and low prices, and the result is a drive with a high failure rate.
With a new EVO coming from
With a new EVO coming from Samsung, as well as the M600 replacing the older Micron M550, you're going to see good deals on all of the soon-to-be prior gen stuff (840 EVO included – just as soon as they issue a firmware update for it).
Is it just the 840 series
Is it just the 840 series that is affected by this bug? Would the 830 or 850 potentially be susceptible as well?
Just the 840 and the 840 EVO.
Just the 840 and the 840 EVO. The 840 Pro is not affected. The 830 is not affected. The 850 Pro is also not affected. It is only the TLC-based models, apparently.
I don’t remember there being
I don’t remember there being a 840 version, only ones I ever heard of was 840pro and 840evo, didn’t ever hear of a 3rd model in there.
Yeah, they were soon obsolete
Yeah, they were soon obsolete by the EVO. It was Samsung's first attempt at TLC.
I have a couple of 840 EVOs
I have a couple of 840 EVOs in my office machines. I am eagerly waiting for update from samsung.
How long does it take for
My 1TB EVO is just 4 weeks
My 1TB EVO is just 4 weeks old and has clear signs of this problem.
My latest tests also seem to indicate that higher temperatures (but still in normal range < 60 celsius) further reduce the read speed.
No issues here in RAID
No issues here in RAID 0
You don’t understand. It’s
You don’t understand. It’s old data there the bug shows, not new. Most benchmark programs like Crystal, AS-SSD, … write new files for testing and then the speed is fine.
Has anyone tried using
Has anyone tried using Spinrite at level 2 or better level 4 to see if this fixes the issue?
I was thinking of
I was thinking of SpinRite, as well, when reading this article.
My suggestion would be SpinRite level 3 (only one write and two reads per sector).
I have done level 3 to a SSD in the past to bring a drive back to life after older firmware had created some issues – haven’t had a problem since!
Any Spinrite mode that writes
Any Spinrite mode that writes data back will restore full speed. I would use the lowest mode since higher modes may unnecessarily wear the flash.
You dont even have to do
You dont even have to do that. A simple wipe/reinstall, or in my case imaging, gets rid of the problem. It doesnt last long tho. I’ve seen some cases where it’s happening within weeks.
I have this speed degradation
I have this speed degradation problem on my 840 SSD 120GB (Basic, not Pro) sitting in a travel laptop I rarely use. These last 9 months it has been powered on only on Patch Tuesdays, to update the system. I mostly use portable versions of programs on it, all located nicely in a Portable Programs folder, so they can be copied elsewhere and then copied back easily as instructed here in this fine article. My question however is: Since this laptop gets very little use, does my 840 SSD consider the whole OS as “old data” as well? If so, how on earth would I go about rewriting those files? There’s a snowballs chance in Hell I am reinstalling Windows or cloning this drive, too much hassle for me because Samsung sold me a defect product. Will the myDefrag solution move/refresh the whole OS? And also the fact that I have to DEFRAG my SSD to temporarily fix this means that if/when Samsung finally decide to try and correct this with a firmware, it’s too late for me. They have lost a customer. I am going to demand a replacement. There. I got to vent a little bit..Very frustrating when you spend a few days unscrewing a gazillion screws to get the SSD placed into a Fort Knox certified laptop, install the OS and drivers, update and tweak the system and then put it away for later use. Then fire it up a month later only to see that the performance has degraded to worse than IDE speed, while sitting in the closet. This is so bad it’s hilarious.. Maybe it’s time we stop shopping for deals, data is important. Let’s all just buy the most expensive crap we can find and shut up. Let’s be Apple people.
Best temporary fix at the
Best temporary fix at the moment is to use:
with the ‘data disk monthly’ script.
This will work it will essentially move/rewrite every file (and not because there is bad fragmentation).
Ok, thanks. Still absurd
Ok, thanks. Still absurd having to shorten the drives lifespan by defragging/rewriting every month or so. Kinda like having to sandblast your face every morning before applying make up =) Samsung will pay for this dearly, word to mouth is more powerful than they think. My trusty old Intel 520 in my main computer still performs like a champ, tested that one too in the wake of this Samsung fiasco, from this day forward Intel will get my money.
From what seen in the story,
From what seen in the story, don’t need to do it monthly, maybe like every 6 months going by the graph, it takes 30+ weeks before it starts to really become a problem.
i thought it was a no no to
i thought it was a no no to defrag an ssd
It’s true, defragging is not
It’s true, defragging is not needed on SSDs because there are zero benefits and the rewrites only shorten the drive life.
But this is not a normal situation. The only reason the others are suggesting a defrag run is to “freshen” the written data so that it now falls before the date threshold when the drive slows down.
Nobody is suggesting the same use of defrag as on a spinning drive. Nobody is saying “turn on continuous or frequent defrag.” What they are saying is, sacrificing a relatively small number of rewrites every few weeks may be worth restoring the full performance of the drive. A very light defrag just happens to fulfill those criteria. But so does a quick backup and restore.
And if Samsung gets a firmware fix out soon, the number of rewrites sacrificed may be pretty small.
My drive and many other
My drive and many other drives are showing ~1MB/s in benchmarks and explorer transfers. The minimum speed is not ~50MB/s as stated in the article.
You have to differentiate
You have to differentiate between smaller (< 100KB) and larger files. Smaller files are inherently slower because the SSD can't read the data in parallel from different internal flash blocks. The bug was explicitly tested with larger files (>=500KB) which should always be read with sth like 300-500MB/s. That was because the tests wanted to eliminate the chance that the slow speeds come from a lot of small files and not the degradation.
It’s likely that smaller files are affected as well and might read much lower then the normal small file speed of 20-35MB/s. It has just not been examined closer so far.
The benchmark I used does not
The benchmark I used does not differntiate between file sizes, but when I try to copy an 8-month-old 1.5GB file, I see under ~1MB/s in explorer.
Wow. That would be the worst
Wow. That would be the worst case by far. Maybe u could run FileBench, the tool I wrote to test this issue:
thats about how bad my drive
thats about how bad my drive is performing: http://i.imgur.com/8EXZz9c.jpg
I work with Samsung as vendor
I work with Samsung as vendor for my company. I can tell you without a doubt they will be on top of this. Their performance for providing software/firmware fixes is top notch.
Here’s my completely
Here’s my completely uneducated hypothesis;
The cells are slowly drifting over time, they start to level out as the energy in the cell reaches a point where.. ehh… less voltage = less leakage?, but not so much that it flips a bit, just enough so that the error correction has a big job to do on every cell it comes across. and it doesn;t correct errors until the flash needs to be overwritten.
And that’s what’s wrong. Samsung, I await your employment offer.
On hearing this news, the
On hearing this news, the Samsung CEO should have had the executives responsible in the Office, and with apologies to their families, working on the problem, this includes the necessary engineering personnel, software and hardware. In order to even be able to afford to have holidays with the family, there needs to be sales, and sales require customer satisfaction. Better to have a little disappointment around holiday time than big layoffs later, the competition is not waiting for the holidays to be over.
This is why I won’t buy
This is why I won’t buy Sansung products. Their hard drives and optical drives had such horrible reliability (is it any wonder that they sold hard drives or $10-15 cheaper than every other brand) that MDG pulled their contract years ago. I know too many people that have been burned by their TV reliability too. Samsung couldn’t handle the support problems so they sold off the hard drive division of the company. And yes, OCZ suffered a similar fate when they couldn’t handle the warranty exchanges on their failing SSD’s.
Now it’s Samsung’s SSD’s.
I am also hearing that these
I am also hearing that these Samsung SSDs are not so good in RAID 0 configurations, with uneven numbers of SSDs, on the anandtech article about this issue, user’s posts. I know that the Samsung System software sucks, my series 3 laptop can not keep the WiFi off at bootup, half of the time it boots up with the WiFi on, and also will auto-connect to any available WiFi router, which is not so good for security or any air travel, where there may be the need to keep the WiFi off, Even disabling the WiFi in windows will not keep the WiFi off. It appears that Samsung’s QC in the software/firmware department is bad all around.
I haven’t had any problems
I haven’t had any problems with ANY Crucial SSD’s (knock on wood), and I’ve sold probably over fifty of them in my shop in the last year. I’ve had exactly 5 OCZ SSD’s (a mix of models) and all of them were dead within a year, so I stopped carrying them. That was before the company selloff. I’ve had the odd Kingston SSD fail too, but nothing as bad as OCZ. I’ve seen way too many Samsung hard drives fail in various machines to ever recommend or use them though.
I’ve got an 830 and 840 PRO
I’ve got an 830 and 840 PRO and I have had no issues for 2 years. And EVO appears to have work-arounds and fixes. That said,they DO appear to have a shorter life than the PROs, but you will probably go through a chunk of a peta-byte before you get there.
Regardless, I have to be a hard-drive snot and go for the high warranty stuff- and 850 PRO has a 10 year warranty. Overall I’ve been pleased with a lot coming out of Samsang and Korea in general.
None of two OCZ SSDs failes
None of two OCZ SSDs failes or slowed down!
I have used an OCZ Vertex 2 since December 2010 and it still performs without error from factory stock condition. I never flashed newer firmware on the Vertex 2.
I have used an OCZ Vertex 3 SSD since November 2011 in its stock factory firmware too. It has never been flashed with newer firmware either. Performs perfect.
Are you sure you bought legitimate OCZ SSDs?
I think you are a complete liar.
Your sample size is two. For
Your sample size is two. For a more informed decision, try reading Amazon reviews on those drives. Plenty of them died. I personally killed two in normal usage.
I had an 840 500gb ssd.
I had an 840 500gb ssd. Roughly 1 month ago, transfer speeds had dropped to around 11Mb/s regardless of file size. Also, I began recieving this error while transferring files from the 840 ssd “Error 0x8007045D: the request could not be performed because of an I/O device error”. All this began occuring after less than 1 year of use, and 2.5tb written to the drive (according to ss magician). My Samsung 128gb 830 however has had 4tb written to it and is still running strong. I think that the above situation speaks for itself.
-Excuse me, but aren’t ALL
-Excuse me, but aren’t ALL SSD (and NAND flash storage) have wear-leveling algorithm that supposed to move around (shuffle) the data internally? So those old data never get shuffled at all?
-What about data retention? What’ll happen to the data in a drive that has been unplugged for months/years?
-And how about the earlier 840 (non-EVO, non-Pro) which use 21nm TLC NAND (versus 19nm TLC NAND in 840 EVO)? Does it has the same issue too? I helped a relative upgraded his ProBook 6555b to a 120GB 840. 🙁
I thought the wear leveling
I thought the wear leveling was only concerned with new data, and old released/freed space not being used over and over again, but it keeps track of the number of write cycles for each block, and tries to utilize the least used free/freed space when it becomes available. The current problem appears to revolve around old files that are never accessed or moved for longer periods of time loosing their states, and having to have error correction applied in order to retrieve the data in an error free form, this error correction takes time and that is what is slowing things down. Of course, moving the file by reading it to another location, and back to the SSD, by various methods, refreshes the TLC in the short term, but does not alleviate the problem. Likewise the short term fix does impose more reading and writing to the SSD, and increases the number of cycles on the SSD, that would not be needed had this problem not arose in the first place.
There my be something to this
There my be something to this happening on a smaller process node, and the 21nm TLC NAND being able to hold a more stable state for a longer period of time, than the 19nm TLC NAND in 840 EVO, and further node shrinks will see this problem magnified. If this problem is intrinsic to the smaller process nodes, then some form of reengineering on the process chemistry/NAND geometry may have to be instated on the smaller process node shrinks in the future, or maybe die stacking at a little larger node can alleviate the problem in the shorter term, while more study is done into the causes in the long run. The firmware solution may including reading and writing in place, but that won’t help the wear leveling tradeoff, by having to do more read write cycles to refresh the TLC.
Excellent point, and that’s
Excellent point, and that's something we are looking into now. Wear leveling *should* spread writes across flash – even flash that contained the old data, meaning that writes taking place in regular use would act to 'freshen up' the stale data, even though those LBA's were not explicitly rewritten by the host OS. This does not appear to be happening with our samples.
Wear leveling usually takes
Wear leveling usually takes place during the start of a write, the SSD buffers a tiny bit of data into its memory, and then the SSD does some processing to figure out the best locations to place the data in order to make sure no cell is getting overused.
The problem with this is spare capacity is also used for this, thus when your SSD begins the process of reallocating sectors, the drive is effectively on its last legs since it means that all of the cells are close to death and the slightly weaker ones are dying first.
When the reallocation process starts then there is a chance for ecc to encounter an unrecoverable error during a write, and thus cause you to lose data. e.g., in the tech report SSD endurance test, the evo at around 100TB of writes, experienced some data corruption (likely a few flash cells were leaky and could not retain data for very long)
New Volume(E) = Samsung 840
New Volume(E) = Samsung 840 evo 512GB?
I am seeing this on my raid 0
I am seeing this on my raid 0 setup. I really only use my pc for gaming, so most of the data is old. I have 2 250s in raid 0 on z87 mb. I only have about 10% of the array filled with data. The first 10% of the array runs at 15-20mbs. Once I get past 10% the array goes to 1000mbs.
I tested out my EVO, and sure
I tested out my EVO, and sure enough saw the exact same issue. I ran a program called Diskfresh (free for home users) http://www.puransoftware.com/DiskFresh.html . It reads and writes all of the drive. After doing that the drive returned to decent speeds again.
Short term fix until the firmware comes out.
It is one of the many major
It is one of the many major issues if TLC flash, more bits per cell at smaller process sizes, you begin to rely more heavily on error correction (increasing error rates over time) which slows the performance. (While not recommended, you can significantly lower the error rate on an SSD by running spinrite on it on level 4 (though it will eat up a ton of writes (tried it on a 120GB evo that a friend bought for a system that was not used much (barely any writes)
To see the error rate, run spinrite on level 2 (read only test), then check out the error rates.
The smaller the process size is, the fewer electrons can be stored, and thus a loss of even a small number of electrons, can significantly increase the error rate.
MLC SSD’s do not have this issue due to the low error rates.
If the issue is truly an issue with the retention of electrons in the cell, then any fic they come up with will either be a more efficient error correction to minimize the speed loss, or something that will cause the drive to periodically rewrite cells when the drive is idle.
Any TLC 3d V-NAND will have
Any TLC 3d V-NAND will have to be looked at, and tested more for this issue, epically on smaller process nodes.
As far as more efficient/faster error correction, maybe adding more processing cores to the controller, and some internal background idle time read testing, with rewriting/refreshing on stagnant data if the error rates test too high, maybe on a per-file basis.
It would be nice if the SSD could be managed by some included with the SDD, software, that could move the old data onto a hard drive, after informing the user, and giving them the option, as even a regular hard drive would be faster, if the SSD error rate is too high on old SSD based data/files.
What are the data retention rates on hard drives, compared to SSDs, as far as being slowed down by read errors, and error correction induced delays/slowing of read speeds.
Should we be expecting
Should we be expecting Samsung to issue a targeted announcement to 840 EVO purchasers? I don’t recall if my purchase (Oct 2013, Amazon) included registration with Samsung. I’ll be checking PC Perspective for updates. This is disturbing news.
So, I should probably hold
So, I should probably hold off on reinstalling my win7 machine on a 840 Evo I have, as I planned to do this weekend. Guess I’ll use the mx100 I have instead. Thanks for the heads up guys.