Enterprise SSD flaw bricks drives and renders data unrecoverable after 40,000 hours

Cal Jeffrey · Mar 25, 2020

PSA: If you run any Hewlett Packard Enterprise servers or storage solutions, you'd be wise to update its firmware before October. Some SAS SSDs in these products will "catastrophically" fail after 40,000 hours of uptime. If you use enterprise SSDs from other OEMs, you might want to be sure your firmware is current with them as well, since the flaw is not unique to HPE products.

Hewlett Packard Enterprise (HPE) has issued a critical warning for some of the solid-state drives it uses in a number of its enterprise server and storage products. The “flaw” causes the SSDs to brick at exactly 40,000 hours (4 years, 206 days, 16 hours). HPE warns that this is a catastrophic failure that will render all stored data unrecoverable.

Equipment installed with firmware prior to HPD7 is subject to this issue. So far, these drives should be in working order as most shipped less than five years ago. The company predicts that SAS SSDs that have not been updated should start experiencing failure no earlier than October 2020.

Four specific products have been identified as susceptible to this flaw, including HPE model numbers EK0800JVYPN, EO1600JVYPP, MK0800JVYPQ, and MO1600JVYPR. These are 800GB and 1.6TB drives.

The defect is apparently not isolated to HPE equipment and could affect other OEMs as well. Hewlett Packard says it was notified of the flaw by an unnamed SSD manufacturer, which some have speculated is SanDisk.

It is also not the first glitch of this kind. In January, HPE issued a similar warning for SAS SSDs that would fail after 32,768 hours. That problem had a much broader scope affecting 20 different SKUs.

Administrators should update firmware immediately and contact HPE support if they run into any issues. The firmware version to look for is, as previously mentioned, HPD7. The company has a fixes available for VMWare, Windows, and Linux on its website. It also has documentation and tools for determining the total uptime of affected products.

Masthead credit: Sergiy Palamarchuk via Shutterstock

Permalink to story.

https://www.techspot.com/news/84549-enterprise-ssd-flaw-bricks-drives-renders-data-unrecoverable.html

psycros · Mar 25, 2020

"Flaw". "Glitch". The maker of the SSD's chipset has some explaining to do.

VitalyT · Mar 25, 2020

One of the key tests for any PC hardware, is by changing internal clock far backwards and forward, to see that it does not affect the product's functionality.

But who am I to lecture HP, they know better, they just don't bother.

umbala · Mar 25, 2020

VitalyT said:
One of the key tests for any PC hardware, is by changing internal clock far backwards and forward, to see that it does not affect the product's functionality.

But who am I to lecture HP, they know better, they just don't bother.

That's not how this issue works. Think of it like an odometer. The flaw is based on how many hours the hardware has been running. Changing the clock so that it's 4 years ahead would make no difference. Example: your SSD has been running for 1000 hours and you change your computer clock ahead by 4 years, it would still show as 1000 hours and wouldn't simply add another 4 years to it.

Irata · Mar 25, 2020

Isn‘t that what HP‘s inkjet printers used to do ?

Curious who the manufacturer is. Don‘t usually associate Sandisk with server class ssd but I may be wrong here.

texasrattler · Mar 25, 2020

Sandisk is owned by Western Digital for the past few yrs. WD is known for server related equipment. Obviously HP had these put in before SD was bought by WD. I guess now we know why you dont use products that arent known for that kind of use. HP didnt get that memo or simply, they cared about the price. Well hopefully this will be a lesson for anyone, you get what you pay for.

I use a SD ssd 1tb. Not a single issue ive seen. Had it over a yr. Granted I am not using it in a server or anything like that. Just for games.

DelJo63 · Mar 25, 2020

See another article at

SSDs have a built-in “time of death”

The downside of the SSD? The downside of SSDs with the NAND Flash-based chips is that they have a limited lifespan by default. While normal HDDs can – in theory – last forever (in reality about 10 years max.), SSDs have a built-in “time of death”. To keep it simple: An electric effect results...

www.techspot.com

brucek · Mar 25, 2020

What exactly happens at 40,000 hours? Does the firmware attempt to apply some sort of maintenance that had a huge bug in it? Curious minds want to know...

amoeba00 · Mar 25, 2020

I'd be interested to know if there were a utility that one could run to see if their SSD drive is affected. Would certainly be helpful to those who want to know if they are affected in case their particular OEM re-brand decides not to make a patch available.

sac39507 · Mar 25, 2020

Enterprise environments have the adequate backups in place so they don't have to worry (at least they are supposed to and if not, fire their IT people)

Scshadow · Mar 25, 2020

sac39507 said:
Enterprise environments have the adequate backups in place so they don't have to worry (at least they are supposed to and if not, fire their IT people)

And? I fail to see a point provided here. First off, you should have offline backups and offsite backups. But I still wouldn't want to see an storage array go down. What if several of the same drives were installed at the same time? You may not have enough parity data to recover if the failures are really close together. I wouldn't want to have to recover from an offline or offsite backup to restore data. Thats time consuming. It would be better to just fix the firmware.

Evernessince · Mar 25, 2020

sac39507 said:
Enterprise environments have the adequate backups in place so they don't have to worry (at least they are supposed to and if not, fire their IT people)

This is less about loss of data and more about having drives fail when they shouldn't.

Uncle Al · Mar 26, 2020

Sounds like somebody found an update to the old Y2K bug! LOL

trparky · Mar 26, 2020

brucek said:
What exactly happens at 40,000 hours? Does the firmware attempt to apply some sort of maintenance that had a huge bug in it? Curious minds want to know...

It probably has something to do with some kind of internal timer where if the incremented number exceeds the capacity of its storage location it bricks the firmware because it doesn't know how to handle it. Think of it as a fatal exception or BSOD for the firmware.

sac39507 · Mar 26, 2020

Scshadow said:
And? I fail to see a point provided here. First off, you should have offline backups and offsite backups. But I still wouldn't want to see an storage array go down. What if several of the same drives were installed at the same time? You may not have enough parity data to recover if the failures are really close together. I wouldn't want to have to recover from an offline or offsite backup to restore data. Thats time consuming. It would be better to just fix the firmware.

The point is that it's not a total data loss situation if proper backups are in place. Of course they should implement the fix ASAP to avoid all the headaches of recovering from backup. I guess my wording of "they shouldn't worry" painted the wrong picture. They should worry because of all the labor and problems it can cause.

Why are you even mentioning data parity and recovery? I'm not even talking about rebuilding a bad array but rather a full recovery from backup to freshly implemented healthy array. I get your point but don't understand why you didn't get my clear and obvious one.

sac39507 · Mar 26, 2020

Evernessince said:
This is less about loss of data and more about having drives fail when they shouldn't.

Except this part from the article:

"HPE warns that this is a catastrophic failure that will render all stored data unrecoverable. "

brucek · Mar 26, 2020

trparky said:
It probably has something to do with some kind of internal timer where if the incremented number exceeds the capacity of its storage location it bricks the firmware because it doesn't know how to handle it. Think of it as a fatal exception or BSOD for the firmware.

Maybe, except 40,000 hours sounds suspiciously like a human-defined threshold, not a computer one (it's not a power of two, nor is it if you multiply by 60 to get seconds.)

trparky · Mar 26, 2020

brucek said:
Maybe, except 40,000 hours sounds suspiciously like a human-defined threshold, not a computer one (it's not a power of two, nor is it if you multiply by 60 to get seconds.)

Perhaps planned obsolescence?

Darth Shiv · Mar 27, 2020

brucek said:
What exactly happens at 40,000 hours? Does the firmware attempt to apply some sort of maintenance that had a huge bug in it? Curious minds want to know...

Numeric overflow... stops the firmware from being able to operate I presume. Meaning the data should still be intact on the drive just some intervention required to unbrick the drive.

Darth Shiv · Mar 27, 2020

trparky said:
Perhaps planned obsolescence?

That PR move would have gone down like a lead balloon. Nobody is that stupid in the info age. Every company that tries that pays a pretty hefty reputational price.

hk2000 · Mar 27, 2020

sac39507 said:
Enterprise environments have the adequate backups in place so they don't have to worry (at least they are supposed to and if not, fire their IT people)

I think most of 'em rely on RAID installations, and if the disks were deployed at the same time, then they'll probably fail simultaneously.

Markoni35 · Mar 27, 2020

The term is "planned obsolescence". They make it fail, so you have to buy another one. Or they have really bad programmers. Again.

Ben Myers · Mar 27, 2020

I like umbala's analogy with a car's odometer. Here is how the article needed to explain:

HPE SSD's, like all modern SSDs and hard drives, have SMART (Self-Monitoring, Analysis, and Reporting Technology) built into the drive firmware. Part of SMART is a counter that measures the number of hours when a drive has been powered on. For whatever reason, when this counter hits 40,000 hours of metered use, the drive bricks.

QUICK! If you have HPE drives, go to the HP website and see if there is a firmware update. Between which dates were these drives placed in service?

Ben Myers · Mar 27, 2020

Markoni35 said:
The term is "planned obsolescence". They make it fail, so you have to buy another one. Or they have really bad programmers. Again.

Not planned obsolescence. Idiocy of whomever programmed the drive firmware.

Enterprise SSD flaw bricks drives and renders data unrecoverable after 40,000 hours

Posts: 4,491 +1,612

Posts: 4,913 +7,644

Posts: 7,271 +8,936

Posts: 797 +1,598

Posts: 2,290 +4,004

Posts: 1,590 +799

DelJo63

Posts: 2,118 +3,409

Posts: 95 +43

Posts: 424 +239

Posts: 672 +330

Posts: 5,469 +6,160

Posts: 10,519 +10,149

Posts: 1,607 +1,994

Posts: 424 +239

Posts: 424 +239

Posts: 2,118 +3,409

Posts: 1,607 +1,994

Posts: 2,397 +908

Posts: 2,397 +908

Posts: 197 +106

Posts: 1,318 +541

Posts: 395 +149

Posts: 395 +149

Similar threads