BSOD: Hardware Software and what to do?

Status
Not open for further replies.

adiamond

Posts: 6   +0
I'm running Win2k Pro, sp4+patches. I have an Epox 8kha+ mbd, Athlon XP.
My computer BSODs about once a day (though sometimes > 2 days) without any direct cause (almost always when I'm not at my computer). Most of the time it lists the NT kernal (ntoskrnl or whatever) though I've also seen tcip.sys. Frankly, given that the computer's usage pattern, I think those drivers just represent a random sampling what's running behind my back (i.e. mainly idling with some web browser auto updating of pages and perhaps local network stuff).
Up until about 2-3 months ago this computer was rock solid. I don't recall adding any hardware or device drivers immediately preceding this routine.
I do have god knows how many interesting services that run at startup (NAV, a Fax, Matrox Powerdesk, Proxomitron, etc.) though I don't seem to recall installing any immediately preceding these symptoms.

I ran memtest86 for hours and no errors.
I know that the epox8kha+'s suffered from the bad capacitor batch issue. So, I looked at the mobo and I can see one of the bazillion capacitors has a minor issue but frankly.
1) It's minor. It's a little brown on the top and it's not caked with crap - most of the top is still shiny (though this doesn't bode well for the future)
2) My previous experience with bad mobo's is that they're easy to spot. The crash through the OS straight to reboot and crash often.

So, hardware doesn't seem nutty but not a great bet either.

All in all if someone put a gun to my head, I'd say device driver or service but, as I've said above, that's just because it's the less silly choice.

Even as I write, I am conducting a test. I have always done backups by backing up my entire hard drive on spare hard drives. As it turns out I have a hard drive back up that predates the whole crashing issue. I have booted using this old hard drive and will run this computer for at least 3 days. If it doesn't reboot after that time I will conclude that I have some kind of service/device driver issue but, given the blue screens and the long up time between them, I can't imagine how I'm going to ferret that out.
If I go driver by driver, service by service I will be dead 100 years before I find the problem.


The fact is, I have spent a long time getting this machine the way I want it. So many apps - some probably no longer obtainable but necessary. I don't want to reinstall fresh. So, I have two thoughts for a hail-mary strategy
1) A new mobo. Besides an "update/repair" that comes with a new mobo deleting any number of drives due to the new mobo, unlike my current mobo, I wouldn't have my add-in extra HD IDE card (for fast copies in Ghost DOS mode), the sound card (a long story), or the network card. That's a lot of extra drivers.
2) Install an XP prof upgrade (over my win2k Pro). Same accidental fix with new driver theory (well sort of - at least a new kernal).

Anyway, it's just awful, the problems and my solutions, so I'm up for suggestions.

Thanks


www.envisionsystemsllc.com
EnVision Systems LLC
Providing custom solutions for industrial pattern recognition and image processing problems.

Andrew Diamond
4170 Pilon Point
San Diego, CA 92130
Phone: 858-509-3115
Fax: 858-509-3116
 
epox makes a lackluster product to begin with, and you said yours has capacitor problems. get a new board, as the odds are overwhelmingly in favor of the board being bad. the very nature of capacitors could explain why the computer runs for about a day before crashing.
 
Re: The board

Thanks for your response,

1) But if the capacitor was the issue, i.e. a mobo problem, wouldn't that tend to mantifest as a total crash down to re-boot rather than getting caught and resulting in a BSOD?
Can you think of any way of testing your hypothesis? In particular, someway such that if it is the capacitor, I could get the machine to BSOD in an hour rather than a day (if I'm lucky)?
2) Right now I'm on my second day of my test (as mentioned in my previous email) of running off of a pre-crash backup HD on the same exact computer. So, since I'm quite sure the problem isn't the original HD, I'm now more inclined to believe software. (say > 80%)
I don't think I've seen this computer go 3 full days without a BSOD in the last 2 months so I figure I'll let this test go to Sat. Morning (3.5 days) and if it's still running I guess I have to conclude that it's some software issue (i.e. driver/service/security patch thing).
I simply can't imagine how I'm going to resolve that if it takes me a two days of not crashing to be just about 90-95% confident that the problem has been fixed. (8 billion drivers and services X 2 days apiece. Jesus).


As far as epox goes. I've really like this board. I've had troubles with 2 ECS, 1 Gigabyte, 1 FiC, 1 MSI mobo. (Nothing but good things experiences with Asus and Shuttle.) This Epox has been a rock. It had lots of overclocking options and was cheap. Obviously, this board is old and will need to be replaced but I figured that I'd wait until 64 bit longhorn and then get an AMD 64 that can use it (and has been tested with the final OS too!)

Thanks
 
Looks more like hardware

I know it seems goofy, silly, and kind of a crazy, like a person talking to himself, to bother writing this post to "my" own thread. It's just for search engine possibility (i.e. posterity).

After running the test I mentioned in my previous post, booting off of a old HD on the same computer that at the time didn't have any crash issue, everything was running fine. So, I stressed the system by doing a complete HD file search with content (I looked for the word "fred"). This caused a BSOD in under 15 minutes I think. This time the BSOD driver, as I recall, was ntfs.sys.
Though no previous BSOD had ntfs.sys as the driver cause, I have to believe that this is a random driver cause as it makes sense that ntfs.sys was the driver that just happened to be executing when the crash occurred.
I guess there are other possibilities (cpu, addon IDE card, hd) but that seems really unlikely.
I put the bad mobo probability > 50%.

So, I ordered a new mobo.

I decided to do the easy/cheap thing and order a Shuttle AN35N mobo for $65 and so I need no new parts (I'm keeping my Athlon XP 2400 w/ 266 FSB). Good enough for now.
FYI, I chose that board because I like Shuttle, I've have that exact same board in another computer and it works like a charm, has good features, and is cheap (and it doesn't have one of those loudish death-prone southbridge? fans).
I have another real old shuttle board in a real old computer from 95 or so that runs a Pentium MMX. It's run pretty much 24/7 since then.
I almost went to a Althon64 but all the mobo manufacturers I trust that were using Nforce chip sets (I've had enough of Via) demanded PCI Express and I'm not inclined to replace my dual dvi matrix agp card right now. I figure I'll wait until longhorn is stable before going 64
 
to stress out the board best, run a downloadable program called prime95. run a 'torture test', selecting the 'in place fft' option. this uses the most power and produces the most heat, making it ideal to test your faulty epox mobo. from the sounds of everythin you've said, it still looks like the epox board is the offending part.
 
When Windows crashes with blue screen, it writes a system event 1001 and a minidump to the folder \windows\minidump
Check system event 1001 and it has the content of the blue screen

Control Panel -> Adminstrative Tools -> Event Viewer -> System -> Event 1001. Copy the content and paste it back here

Zip 5 to 6 minidumps and attach the zip files here. I will study the dump and find out the culprit.
 
No recent minidumps

Though my windows minidump folder, c:\winnt\Minidump, does contained 29 dmp's, none is more recent than 9/5/2003. Considering that this machine has BSOD'd so many times in the last two months that doesn't appear to make sense.
There are also no 1001 events displayed in my event view which eems to display events back 8 mo'.s This also doesn't make sense.

Anyway, there doesn't appear to be any dumps for me to send.


Thanks anyway,
Andy
 
Anyway, there doesn't appear to be any dumps for me to send.
windows attempts to write a minidump file. if the system fails in such a way that the hardware is not able to execute the code in the windows kernel, you don't get anything.
 
Murphy's law

Besides that (no minidump - when I need one). Since I had the BSOD in my previously mentioned pre-crash backup hd test (where I used an pre-crash HD on the same computer), I'm back to running my current HD. I think I'm on my 3rd day with no BSOD which would be a record for the last 2 or so months. Hillarious. The classic Murphy's law "it works when you want to see it fail" repair job.
I haven't tried the prime95 stress test but that's because I've already ordered a new mobo and I'm going to install it regardless. Actually, come to think of it, I probably should run prime95 anyway now and see if I can get this machine to consistantly fail quickly so I have an experimental control from which to verify what happens with prime95 on the new mobo.
Thanks
 
I'm responding to my own post of our posterity.
I have installed a new mobo and the BSOD's seem to have gone away. System now works great.

I don't know whether that's because of all the drivers that were replaced as part of the win2k "repair" that replaces the HAL with a new mobo intall (and taking into account the two less addon cards in my new system which don't need dirvers) or because my mobo really had a problem.
Anyway, it was super easy to do as per the MS knowledgebase doc on upgrading mobos (though I had eased my path by making a win2k sp4 slipstream CD as per some article I found on the web a long time ago. Worked like a charm)

BTW, I ended up with a ASUS A7N8X-E Deluxe mobo after tring the shuttle AN35N because, I believe, I accidentally fried the Shuttle mobo during install and Fry's Electronics had the ASUS but no longer had the AN35N AND, the good part, if you kill a new mobo from Fry's you can return it for a full refund or credit. A lovely idea for mobos.
Remember, Fry's will match prices but you need to bring the Ad and they have to verify by calling the store so the store has to be open at the hour that you go to Fry's. I saved about $20 that way.

Personally, I like Shuttle better than Asus because:
1) Asus Charges more
2) Asus seems put's in "clever" extra features that introduce incompatabilities (MBM doesn't seem to play well with this board)
Still, Asus is one of my favorite MBs (along with Epox and Shuttle)
So my mobo has two built in LANS, one gagabit, and firewire. I have no need for either.
Lots of USB ports, that's nice.
 
Status
Not open for further replies.
Back