Samsung’s fix for the EVO tested!

** Edit **

The tool is now available for download from Samsung here. Another note is that they intend to release an ISO / DOS version of the tool at the end of the month (for Lunix and Mac users). We assume this would be a file system agnostic version of the tool, which would either update all flash or wipe the drive. We suspect it would be the former.

** End edit **

As some of you may have been tracking, there was an issue with Samsung 840 EVO SSDs where ‘stale’ data (data which had not been touched for some period of time after writing it) saw slower read speeds as time since written extended beyond a period of weeks or months. The rough effect was that the read speed of old data would begin to slow roughly one month after written, and after a few more months would eventually reach a speed of ~50-100 MB/sec, varying slightly with room temperature. Speeds would plateau at this low figure, and more importantly, even at this slow speed, no users reported lost data while this effect was taking place.

An example of file read speeds slowing relative to file age.

Since we first published on this, we have been coordinating with Samsung to learn the root causes of this issue, how they will be fixed, and we have most recently been testing a pre-release version of the fix for this issue. First let's look at the newest statement from Samsung:

Because of an error in the flash management software algorithm in the 840 EVO, a drop in performance occurs on data stored for a long period of time AND has been written only once. SSDs usually calibrate changes in the statuses of cells over time via the flash management software algorithm. Due to the error in the software algorithm, the 840 EVO performed read-retry processes aggressively, resulting in a drop in overall read performance. This only occurs if the data was kept in its initial cell without changing, and there are no symptoms of reduced read performance if the data was subsequently migrated from those cells or overwritten. In other words, as the SSD is used more and more over time, the performance decrease disappears naturally.  For those who want to solve the issue quickly, this software restores the read performance by rewriting the old data. The time taken to complete the procedure depends on the amount of data stored.

This partially confirms my initial theory in that the slow down was related to cell voltage drift over time. Here's what that looks like:

As you can see above, cell voltages will shift to the left over time. The above example is for MLC. TLC in the EVO will have not 4 but 8 divisions, meaning even smaller voltage shifts might cause the apparent flipping of bits when a read is attempted. An important point here is that all flash does this – the key is to correct for it, and that correction is what was not happening with the EVO. The correction is quite simple really. If the controller sees errors during reading, it follows a procedure that in part adapts to and adjusts for cell drift by adjusting the voltage thresholds for how the bits are interpreted. With the thresholds adapted properly, the SSD can then read at full speed and without the need for error correction. This process was broken in the EVO, and that adaptation was not taking place, forcing the controller to perform error correction on *all* data once those voltages had drifted near their default thresholds. This slowed the read speed tremendously. Below is a worst case example:

We are happy to say that there is a fix, and while it won't be public until some time tomorrow now, we have been green lighted by Samsung to publish our findings.

« PreviousNext »