Tutorial: Fault diagnosis - the basics

Hatrick

Posts: 90   +0
My credentials? None, unless you accept old age and a lifelong passion for finding solutions with the minimum of outside help. (See my signature) So, if it's instant, labour free, guaranteed answers you are looking for, you are in the wrong place. That said, let's start.

If a car fails to start the acronym FALSE should spring to mind as the first step in diagnosis: fuel, air, lubrication, spark, exhaust. With radio or TV the signal is traced to the point where it fails to move on to the next stage. But with a computer there is no easy acronym, the only input is a keyboard and the only output a monitor, so the diagnosis is an exercise in probability. No test is a guarantee that a component is fault free, but a pass improves the probability that it is.

For a short spate of BSODs, the help of the TS minidump gurus is an obvious first resort, although I would recommend taking a shot at it yourself - there is plenty of on line information to, at least, give you an inkling of what it's all about. But, for the patient among you, or for series of crashes, spread over a period, the following procedure, based on Sherlock Holmes' dictum that "When you have eliminated the impossible, whatever remains must be the truth" may be helpful.

(1) Physical - with the power off:-
Check all external connections.
Remove side panel, ground yourself and gently, but firmly, press the edge of each card to ensure that it is securely seated. With a Philips screwdriver, make sure that all screws 'bite'. Clean out every speck of dust, particularly from airholes in the casing, the fans and between the fins of heatsinks. And keep you eyes open for 'fried' components.

(2)Power on:-
Win+pause/break brings up System Properties. Advanced tab - Startup and recovery - settings - uncheck 'automatically restart'. This stops Windows immediately rebooting after a blue screen, giving you the opportunity to note details of the event and, perhaps, identify an offending driver.

(3)Power supply -
Download 'Speedfan' (free) from sysinternals.com and run it. This gives full details of temperatures, voltage and fan speeds with, if your sytem supports it, the capability to vary fan speeds to influence temperatures and noise levels. The help file can be found by clicking the question mark on the S.M.A.R.T tab.

(4) Hard disk:-
The S.M.A.R.T tab in Speedfan will tell you everything you ever wished to know about the state of your hard disk(s) if you click on the down arrow at the top - (You need administrative privileges to see HDD details). It also offers full, immediate, on line analysis (free), giving even more information. Again, use the help file.
The information displayed on the tab is taken from a reserved part of the HDD where every significant disk event, from the date of installation onwards, is recorded.
If you are a belt and braces type of guy, you might also wish to run chkdisk /f and/or the manufacturer's disk diagnostic programme.

(5) CPU:-
If you moonlight as a computer geek at Caltech and have access to their laboratories, you might be able to do an anlysis of the CPU. Otherwise, forget it.

(6) Memory:-
Give it a thorough workout with Memtest or Microsoft's Windiag. Run for several hours.

(7) System:-
Defragment, then run sfc /scannow from the command line. This will ensure that every system file is correct. Follow this with full system scans for virii, trojans, worms, malware etc.

(8) Drivers:-
(A) I don't understand the relationship between the BIOS and drivers, but assume there is one, and take a cautious attitude. If the BIOS is old and hasn't been, or can't be updated, then I advocate care when updating drivers, taking them one at a time and testing thoroughly before updating another. If the BIOS is recent, it might be safe to accelerate the process.
(B) Go on line and Google {allintitle: verifier driver} and start reading! When you understand what verifier does, how to use it and how to stop it, you will find it in your system32 directory.
Before letting it run for the first time, check the list of unverified drivers for any which remain from uninstalled programs. Reboot and delete them, then run verifier again with the standard settings. The reason for this is that verifier slows down everything, and you don't want it wasting time on drivers which are never used.
I suggest listing the unverified drivers being examined and, over (say) a week, bring them into play, individually or in groups, alongside your normal activities.

(9) Minidumps:-
These are merely snapshots of what happens in the last few milliseconds before a crash. The probability of a correct diagnosis based on one dump is near zero but, given half a dozen, it improves dramatically while, I suspect, never getting really close to certainty. What the TS gurus search for, I believe, (and I'm sure they'll tell me if I'm wrong) is consistency, which, coupled with their experience, gives them strong pointers to probable causes.
If, when trying it yourself, you constantly get error codes indicating system files, hard disk, memory or power supplies as the probable fault you should, if you have followed the above steps without problems, feel reasonably confident in rejecting them and, consequently, have narrowed the area of search quite considerably.

(10) Some faults of a different type:-
If you google this - {Repairing Windows XP in Eight Commands} - you will discover what appears to be a very useful document. I cannot vouch for it, not having had the particular problem it deals with, but both content and author seem sensible so it would certainly be tried if the need arose.

Final thoughts:-
(A) The number of times a disk is started and stopped is recorded in that area reserved for disk data because it is a major factor in determining the life of the disk. It is, therefore, reasonable to infer that the fewer times this event occurs, the longer the potential disk life.
(B) Components which are heated and cooled expand and contract, are stressed and more likely to fail than when run at a constant temperature. Good examples are light bulbs and cooker rings which almost always fail at switch on, but rarely in use.
The inference is obvious, if your sole concern is preserving a computer for the longest possible time. Of course, there are environmental and cost issues to take into account, like whether the cost of electricity used will be less than the cost of a new pc, and whether the environmental damage done in extracting, processing, manufacturing, transporting and marketing a new computer is likely to be more or less than your contribution. And whether you can stand those damned fans whirring away, 24/7. Personally, I can't.

I hope this helps, but please feel free to add comments and suggestions for improvement; I'm always happy to learn.
 
Thank you.

Your systemization is bound to reduce a lot of haphazard trial & error.

I look forward to additional suggestions from the expert panel at TECHSPOT.

I, for one, know all too well, the state of helplessness and frustration when the local support guy shoots arrows in the dark, pretending all along that he is a Mr. know-all.

And a very happy New Year to you.
 
Some info is good, but you will need to reread your manual about the recording of the number of times a disc is started and stopped.
I am old and wise, as well, and that is a pretty good credential, but it doesn't replace good research.
 
3rd item on the Speedfan analysis. Mine is E5C, i.e. 3676.

By the way, I did not claim to be wise, merely old.
 
Do you have any idea what to do with a message that pops up on my screen saying: "upnt.exe has encountered a problem and needs to close." I run Windows XP Home. I got this message for the first time today.
 
Hi NuBro and welcome to TechSpot.

upnt.exe is a spyware file and needs to be removed.

Please navigate to our Security and the Web forum and click the new thread button. Give your new thread a descriptive title, and then describe your question/problem regarding upnt.exe in the message body.

Regards :)

btw nice guide Hatrick. :)
 
Hi Hatrick,
I have a problem with BSOD and my puter will not allow me to enter safe mode or reformat any of the 3 HDs I have tried in it. It allways hangs. I tried reformating and installing XP on my other puter and put it back into the othert one but only got as far as the Windows logo with the think bar running... and running...ect. I have changed ram, graphic cards, hard drives, different reg. XP copies all to no avail!
 
Hard Drives???

What type of security do you have installed? What firewall? Comodo, Kerio, and Zone Alarm cause this problem on some installs.
 
I have put a post in under "bsod", as I only realised this was the wrong place to ask these questions. But this is on attempting to reinstall windows on formated disks, before any firewall is present.
 
Back