5 Signs Your Storage Drive is About to Fail

Holy *****s,

You have scared the **** out of users, especially me, that my hdd might be failing, and you have not mentioned any tools to check on the hdd status.

You should have mentioned some such tools on this page itself, or given a link to a page that discusses such tools.

You think you are being naughty, but it is a horrible joke.
--
Rawat
 
Holy *****s,

You have scared the **** out of users, especially me, that my hdd might be failing, and you have not mentioned any tools to check on the hdd status.

You should have mentioned some such tools on this page itself, or given a link to a page that discusses such tools.

You think you are being naughty, but it is a horrible joke.
--
Rawat

I always have the best tools on hand for failing drives

The best tools are spare SSD's with Windows pre-installed and ready to go at a moments notice






 
I was wondering how I might use the drive manufacturer's utility, a USB flash and a copy of Chromium OS to assess encroaching failure.
 
@Wizwill I do appreciate and thank you for the effort on the history lesson, but I was already aware of all the basic technical, historical and market facts about SCSI and SCSI drives. I just never worked with them directly, since by the time I started working with servers in IT, serial attached SCSI (SAS) drives had already become standard and these are very similar to SATA drives. So I'm unaware of finer details such as which SCSI drives and SCSI generations supported low level formatting (which you didn't answer :confused: ).

To my awareness, the original, Parallel SCSI, came in several different flavors (which you termed SCSI-1, SCSI-2) defined by their throughput capacity. It has been a long time but I recall a SCSI-40, a SCSI 80 and, I believe a SCSI 160 and possibly a SCSI-320. All had the same low-level format (LLF) capabilities. The LLF was originally intended to be used to "marry" a particular SCSI hard drive with the SCSI card on the mainboard. All the other tricks are gravy.
I still have some original Adaptec SCSI card literature packs if you are interested.
 
I'm case 5. Blue screen.
But I didnt understand it was my hd till I discovered "Disk" Errors inside the event viewer. You should have mentioned "Disk" errors as a way to determine a faulty hdd.
 
Another way to check is in task manager, the drive might pass the SMART test but run super slow. I've seen this a few times on older HDDs.

When looking in the performance tab, look at the drive's "Average response time" if it's really high and you aren't doing anything then the drive is dying.
 
I have seen micro-oxidation of the connectors cause weird issues. Reseating or replacing the cable fixes such things and should be the FIRST thing to check when a drive is acting flaky.
 
Had a WD Blu M.2 up and die within the first week of use. Only thing that I lost though was the Windows Installation and recent /appdata files as I had already moved my documents, music, pictures and other important folders to other drives plus I backup the system every week - Thank you Macrium Reflect.

Had to restore from a backup once but I at least could. Bad Win Update borked USB drivers so I couldn't do anything with the keyboard/mouse - always keep a PS2 keyboard on hand and have a port for those times USB gets fragged
 
I'm so paranoid...I have mirrors for my backups of photos I've taken over the decades. One is on a cloud, one is on and external drive and another is on an NVME.

"He who does not back up, is doomed to crash" -- Confucius
 
No reference was made to SMART data, simply because for an individual user, with one or two drives, the information isn't particularly useful. This is because such data, like all statistical information, is only robust when the sample size is large. Cloud service vendors, such as Backblaze, do monitor the data and use it predict when a drive may fail, but this is for nearly 150 thousand drives at a time. Back in 2016, they reported that for 70k drives they were using in their data center, 23% of the drives that failed reported no SMART indicators above zero.

Some storage vendors, such as Crucial/Micron, recommend that one only uses their tools for analysing SMART data because "there is no industry-wide standard to tell you which numbered SMART attribute describes a specific physical property of a drive. The attribute descriptors will vary between SSD and HDD, and even between the various SSD vendors."

So rather than making the article more complex or confusing than it needed to be, I chose to focus on the aspects that any user is more likely to notice, rather than suggest that they monitor data that has a reasonable chance of not giving any indication of an impending failure or provide an incorrect prediction due to data misinterpretation.
WHAT???? The SMART information isn't particularly useful for an individual user with one or two drives??? I will be extremely polite and ask you to cite your source for your statement. BackBlaze compiles data about its own thousands of server drives, which have absolutely no relation to the single drive installed in your system or mine.

The SMART data for one drive is useful because it tells to very exacting information about the health of the drive media, I.e. platters, of THAT SPECIFIC DRIVE. SMART tells if the drive has reallocated (failed) sectors, sectors pending reallocation, etc, etc. SMART data also tells you the hours that a drive has been used. Actually, refer to my first posting in which I explain which SMART data elements are important to me in diagnosing a client's computer system.

SMART started out as a standard agreed by Compaq, IBM, and a lot of other companies. Yes, it is not perfect. It cannot detect failing mechanical components of spinning hard drives. But SMART remains an accurate indicator of failure of the recording media of spinning disk drives.

SMART went of the rails with SSDs, as some manufacturers hide all of the SMART data, others make only part of it available. A lot of this is due to competitive reasons. Perhaps the best metric of impending SSD failure would be the number of remaining provisioned spare sectors as a ratio to the original number provisioned. My gut feel is that if fewer than 10% of the spares are available, it is time to replace the SSD before it runs out of spares and has a hard failure from which data cannot be recovered. SSD load leveling algorithms help, but they are no panacea.

The reason that Crucial/Micron don't want to divulge SMART information is the same as all the other SSD manufacturers, for competitive reasons. They do not want to tell you how many sectors have been provisioned as spares in relation to the usable drive capacity itself. Provision too few and you get SSD failure sooner.
 
Last edited:
I was hoping to read at least as a last few points
- use drive monitoring sw to alert any issues
- check operating system logs
otherwise the article is so vague that there is no wonder ppl have no idea how computers work.
 
Another "crucial" note to for example, data recovery, wether itll be a corrupted firmware, broken PCB or something like that matter, is that such things should only be considered as a workaround to get your data back. You cant relie on a drive that chrashed already 1 time. The overall faillure ratio has increased with 50%.

Most HDD's that went faulty where Seagate's. Just cant understand how they allow for such a high tolerance in HDD's that where dying with no given reason. Ive done HDD recovery as a service for over 2 years, most brands where seagate's and 2nd WD's.
 
Back