An unfortunate tale about Samsung's SSD 840 read performance degradation

An avalanche of reports emerged last September, when owners of the usually speedy Samsung SSD 840 and SSD 840 EVO detected the drives were no longer performing as they used to.

The issue has to do with older blocks of data: reading old files is consistently slower than normal - as slow as ~30MB/s - whereas newly written files, like the ones used in benchmarks, perform as fast as new - around 500MB/s for the well regarded SSD 840 EVO. The reason no one had noticed (we reviewed the drive back in September 2013) is that data has to be several weeks old to show the problem. Samsung promptly admitted the issue and proposed a fix.

Update (July 20, 2016): Samsung has released version 4.9.7 of their SSD Magician software. Even though it's not mentioned on their website, this firmware update (DXT0AB0Q) is also available for the Samsung SSD 840, predecessor of the more popular SSD 840 EVO. We have published some follow up tests here for your reference.

The First Firmware Update

About a month later, on October 15th, Samsung released an updated firmware for the 840 EVO that covered both 2.5" and mSATA models (EXT0CB6Q and EXT42B6Q respectively). The update consisted of a two-stage process:

1) A new firmware with an updated algorithm for handling the inherent voltage drift that occurs in all NAND based storage devices as they age but is reinforced by how many bits the NAND stores:

  • In SLC NAND only one bit is stored per cell, this is great because it's very easy to read one bit, it can either be 0 or 1.
  • In MLC NAND two bits are stored per cell, so it gets harder to read but there's a cost advantage: you get twice as much storage space from the same amount of NAND.
  • In TLC NAND three bits are stored per cell, so again the complexity increases but the advantage is that you can store 50% more information vs MLC, further reducing costs.

Image courtesy of Anandtech

According to Samsung, the algorithm that adjusts the voltages used to read the NAND as it ages had a problem which meant that data previously written but never rewritten became harder and harder to read. The speeds to read such a file could plummet from 500MB/sec to below 50MB/sec, a 10x reduction in performance!

But this was a difficult problem to detect, because most benchmarking programs write new data that they then read back, which circumvents the problem as it only occurs on old data. However most data used by users is indeed old: your Windows installation folder, installed apps, your documents, game files, etc.

2) The second stage of Samsung's new firmware with the updated algorithm mandated that all data on the disk should be rewritten to restore performance on older data. Since it took around 8 weeks for the issue to become visible in the 840 EVO, this meant that we could not fully know if Samsung's firmware worked or not until some weeks later.

A Second Firmware Update: Reading Between The Lines

We couldn't know for sure if the firmware was a successful solution in the long term, and in fact the problem did come back. Samsung started to work on newer firmware (EXT0DB6Q), but this time with a different approach: instead of simply adjusting the algorithm for reading old data, the disk would also continuously rewrite old data in the background.

It's not an elegant fix, and it's also a fix that will degrade the lifetime of the NAND since the total numbers of writes it's meant to withstand is limited. But as we have witnessed in Tech Report's extensive durability test there is a ton of headroom in how NAND is rated, so in my opinion this is not a problem. Heck, the Samsung 840 even outlasted two MLC drives.

As of writing, the new firmware has only been released for the 2.5" model of the SSD 840 EVO, so users of the 840 EVO mSATA model still have to be patient. It should also be noted that the new firmware does not seem to work well with the TRIM implementation in Linux, as this user shared how file system corruption occurs if discard is enabled.

The route Samsung has taken with this latest fix is significant: the original problem was not in the firmware of the drives, it's Samsung's TLC NAND which drifts in such a way that it's not possible to write a generalized algorithm that accounts for it. Thus by admission we now know this is a fault inherent to the NAND used in the Samsung 840 EVO.

How About Other TLC SSDs?

Samsung claims the read performance degradation issue only exists in the NAND used by the popular SSD 840 EVO. However there are OEM versions of the drive that use the exact same NAND. For example, the Samsung SSD PM851 usually seen on Dell products. Case in point, here are users posting on support forums showing the exact same problem.

Then there's the "vanilla" SSD 840 which was the first drive to use TLC NAND. As things stand today no updated firmware has been released for this drive. Samsung Germany admitted the problem exists on the 840, but in Samsung's subsequent communications they have always claimed that the issue does not exist on it. Here's an extract from a recent Samsung Q&A posted at PC Per:

PC Per: Will there be a firmware update for the other Samsung TLC-based SSD models that have also demonstrated this read performance issue? If so, which models and how soon will that firmware be made available?

Samsung: This issue had been reported for the 840 EVO SSD only.

Well, here is a Samsung SSD 840 "vanilla," or whatever you want to call it:

According to data we've gathered from user forums:

  • The 840 EVO uses 19nm TLC NAND, it takes about 8 weeks to degrade.
  • The regular "840" pictured above uses 21nm TLC NAND, it takes about 40 weeks to degrade.

The test above was performed on a Lenovo ThinkPad T530 running Windows 7, using a Plextor M5M mSATA drive as primary storage and the Samsung SSD 840 connected as a secondary drive. A 64KB block size was used in HD Tune, this limits the peak performance of the drive to ~375MB/s as you can see where it flat lines. This flat also corresponds to the free space area of 40GB on the drive; it contains no data and thus is not affected by the degradation.

The first part of the disk has really poor performance, the reason is very simple: The drive was cloned from a regular HDD in a Compaq laptop, and the first partition contains a 13GB recovery partition (really bad design for a mechanical HDD as that's where they perform best, but I digress).

That recovery partition has 2GB free space, which corresponds with the lone spike in performance in the first 13GB batch of the test. Obviously the data in the recovery partition never changes and thus sees the worst performance.

Delving deeper using the SSD Read Speed Tester tool developed by forum user Techie007 for the sole purpose of testing and visualizing the issue in the Samsung 840 drives, we get a graph of the performance of files in relation to how many weeks old they are. The graph includes all partitions because I mounted the recovery partition as a volume mount point under the main partition:

Looking at the graph, it becomes increasingly clear that the older the file is, as shown on the x-axis, the worse its read performance becomes.

Because I was testing with data that is several years old and the tester app limited the visualization to 99 weeks, I took the raw data and put it on Excel to see what I came up with:

The x-axis shows how old the data is in days, and the y-axis the read speed in MB/s. A healthy drive would have shown a flatter horizontal line hovering around the 500MB/s mark, but instead we get this mess!

For the sake of comparison, here's what a Samsung SSD 840 Pro looks like in SSD Read Speed Tester. Because the SSD 840 Pro uses MLC NAND, it doesn't suffer from the same degradation issues:

Temperature Driving SSD Performance?

My benchmark results saw wild fluctuations and I could not understand why. As it turns out the drive is also sensitive to heat, and not in the way you might expect: the drive actually works better the hotter it is!

This is not so strange because temperature differences affect the voltage drift in the NAND. Only now we begin to realize just how difficult it must be to get that algorithm right... it's just a pity that it took Samsung just as long to realize as well.

The data for the graph above was gathered using SSD Read Speed tester when the drive reported that it was 40°C. The graph below loops the previous graph and the same test performed with the drive cooled to 15°C. You can see how the worst-case performance is right down at ~50MB/s, a far cry from the optimal 500MB/s for this drive.

It's entirely possible to presume that while the NAND in my drive performs better with increased temperatures, a different drive might be the total opposite. It's also been proven by PC Perspective that the drive's controller will throttle if it becomes too hot, so I would in theory want to cool down the controller, but heat the NAND chips to get optimal performance.

Just like the SSD 840 EVO, there are OEM variants of the standard SSD 840. The Samsung SSD PM841 uses the same 21nm NAND, as does the SSD PM843. Samsung claims these drives do not have the speed degradation issue either, but the data above speaks for itself. We have tested a second SSD 840 in-house that shows the same degradation patterns along with the numerous reports that can be found online.

Dell is not the only OEM using the drives either. The Razer Blade Pro laptop we reviewed recently sports one of these Samsung OEM drives, and honestly "like new" performance is really good. Microsoft also uses them in the Surface Pro 3, and Samsung went as far as releasing a firmware update to fix read degradation, only to pull it later.

Yet another potentially affected product is the Samsung SSD 845DC EVO**, an enterprise drive for server use, specifically suited for and I hope you enjoy irony as much as I do: "suitable for read-intensive applications". The 845DC ECO uses NAND with the exact same part number as the 840 EVO and PM851: K90KGY8S7M-CCK0. Samsung may have binned the best part of the TLC NAND for use in this drive, so the problem is likely to take a bit longer to show up on it.

Quoting Samsung below, these were the kind of claims the company made when it promoted the use of TLC NAND. In all fairness, we've recommended the drives in our reviews, as have many others, and we happen to be using a handful of them in our systems as well.

"To top it all off, you can rest assured knowing that your SSD will continue to offer excellent performance throughout its useful lifespan. With its simple upgrade solution and sustained, industry- leading performance, the Samsung SSD 840 is the single best upgrade you can make to your PC."
Source

"The 840 Series represents the first consumer SSD to implement 3-bit/cell MLC (also called TLC) technology (...) This is nothing a good firmware algorithm can't handle, however. Samsung's 3-bit/cell MLC-based SSD 840 Series, equipped with mandatory OP, will still far outlast the useful life of the hardware it powers."
Source

Reliability, as in data loss, has not been put into question. So here's my open request to Samsung: admit the problem exists in all the affected drives as evidenced in this article and in the countless reports found in this lengthy thread on the Overclockers.net forums and elsewhere online.

As of writing, this single discussion has gathered over 2,770 replies and 345,000 views. Thus far Samsung has decided to ignore the SSD 840 and all the aforementioned variants even though the drives carry 3-year warranties. Samsung, the ball is on your court now...