Ghost in the machine - BSOD

Status
Not open for further replies.

tpfagan

Posts: 43   +0
All,

I've been researching the multiple BSOD I've been having for the past week and found this forum to have the most comprehensive information concerning the dreaded BSOD. This is why I am posting here for the first time in hope that someone can provide me with some new insight to my problem.

Background:

About a week ago, my wifes computer started freezing up and Blue screening. I started looking into it and found nothing obvius in the logs and the Minidumps referenced the Atapi.sys and Ntoskern.exe. My wife had indicated that she did not download anything or load anything new - but who really knows.

I started looking for Spyware and viruses (this machine had no issues for almost 2 years). Nothing jumped out. The computer locked up twice while I was searching but no minidumps were generated. I started to suspect the power supply. I ran a chkdsk and -- the disk was a problem. After a reboot from the chkdsk, I could not boot. The Hard drive was FUBAR and was showing ARES C64K in bios which with a little google-fu meant I was pretty much screwed.

So great, found the problem; replaced the hard drive (and picked up an extra power supply just in case) and restored a backup from 2 months ago. Everything was great. Then I got a blue screen and another and another. Put in the new power supply. Still getting blue screens. The only common part of the error message from the blue screen was the IRQL = 0xFF.

Frustrated from not making any progress and the complaints from my wife, I virtuallized her hard drive and ran her system in Virtual PC 2004 from another machine. I felt if the OS was screwed, it would crash in the VM. My wife can now RDP into her machine and resume her life and I could take extra time with her PC.

I ran Memtest86 v3.2 for 6 hours, then ran memtest86+ 1.65 for 4 hours, then ran Windows memory diagnostics for 3 hours. All reported no errors from the memory. hhhhhhfffftttt

I reloaded her machine with a clean OS, reloaded the drivers. Downloaded critical updates and otherwise left the machine bare. I Got a few blue screens from what was identifed as a video card driver issue. Uninstalled the old video driver and reinstalled a new.

Her Vm has been running solid for 5 days now... no problems at all.... but her computer is still having problems.

At this point, I'm thinking motherboard or CPU. I've gotten 4 blue screen since then and I'm attaching them for review as I'm not very proficient at understanding the information provided in these logs.

Hopefully some experienced eyes can provide me with some direction.


The Specs:

Mini-Q 765 micro itx (Jetway)
Athlon XP 3200
1G (512x2) DDR2700 pqi ram
80 seagate IDE HD (new)
DVD-rw - Liteon
On-board Vid (GF4 mx)
On board LAN (GF MCP)
On-Board Sound (CM audio??) not sure
430w Thermaltake PS
2 printers hooked via USB
1 CF card reader hooked via USB
 

Attachments

  • mini110906-04.txt
    13.6 KB · Views: 5
if you have windbg follow the sample i posted below use the bold com... as they are shown then post back results


kd> .bugcheck [Lists bugcheck data.]
Bugcheck code 0000000a
Arguments 00000000 0000001c 00000000 00000000

kd> kb [Lists the stack trace.]
ChildEBP RetAddr Args to Child
8013ed5c 801263ba 00000000 00000000 e12ab000 NT!_DbgBreakPoint
8013eecc 801389ee 0000000a 00000000 0000001c NT!_KeBugCheckEx+0x194
8013eecc 00000000 0000000a 00000000 0000001c NT!_KiTrap0E+0x256
8013ed5c 801263ba 00000000 00000000 e12ab000
8013ef64 00000246 fe551aa1 ff690268 00000002 NT!_KeBugCheckEx+0x194

kd> kv [Lists the trap frames.]
ChildEBP RetAddr Args to Child
8013ed5c 801263ba 00000000 00000000 e12ab000 NT!_DbgBreakPoint (FPO: [0,0,0])
8013eecc 801389ee 0000000a 00000000 0000001c NT!_KeBugCheckEx+0x194
8013eecc 00000000 0000000a 00000000 0000001c NT!_KiTrap0E+0x256 (FPO: [0,0] TrapFrame @ 8013eee8)
8013ed5c 801263ba 00000000 00000000 e12ab000
8013ef64 00000246 fe551aa1 ff690268 00000002 NT!_KeBugCheckEx+0x194

kd> .trap 8013eee8 [Gets the registers for the trap frame at the time of the fault.]
eax=dec80201 ebx=ffdff420 ecx=8013c71c edx=000003f8 esi=00000000 edi=87038e10
eip=00000000 esp=8013ef5c ebp=8013ef64 iopl=0 nv up ei pl nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010202
ErrCode = 00000000
00000000 ??????????????? [The current instruction pointer is NULL.]

kd> kb [Gives the stack trace before the fault.]
ChildEBP RetAddr Args to Child
8013ef68 fe551aa1 ff690268 00000002 fe5620d2 NT!_DbgBreakPoint
8013ef74 fe5620d2 fe5620da ff690268 80404690
NDIS!_EthFilterIndicateReceiveComplete+0x31
8013ef64 00000246 fe551aa1 ff690268 00000002 elnkii!_ElnkiiRcvInterruptDpc+0x1d0
 
Thanks for the quick respose. I've captured those logs. It appears that the trap is at the same location for all of the logs. Hopefully that means it points to a single problem.

Attached for your review.
 
in the bottom I have noted what the last stack befor it crashed was wich looks like your CPU try this open the pc check for dust on the cpu and fan if so clean out with can of comprassed air also get a CPU bech tool to test your cpu


kd> .trap 80550140
ErrCode = 00000000
eax=07ee6924 ebx=ffdffc70 ecx=ffdffc70 edx=0000000a esi=ffdffc50 edi=8496e130
eip=f75691b2 esp=805501b4 ebp=805501d0 iopl=0 nv up ei pl zr na pe nc
cs=0008 ss=0010 ds=0002 es=0000 fs=f000 gs=0483 efl=00000246
amdk7!AcpiC1Idle+0x12: <----- CPU
f75691b2 6a00 push 0

kd> kb
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr Args to Child
805501d0 804dbb37 00000000 0000000e 00000000 amdk7!AcpiC1Idle+0x12 <----CPU
805501d4 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x10
 
Ok, the machine is clean. I cleaned it when I removed the hard drive, but checked again just to make sure.

I can run a CPU benchmark test = can you let me know what I'm looking for? Are we hoping for a crash or looking for degradation???

I'd like to know what I should post back?

Thanks again for your help.
 
https://www.techspot.com/downloads/433-everest-ultimate-edition.html

u could use this tool and post a picture of the main page and run a test on your cpu look for signs of heating out what your cpu can stan my cpu can take 55c.

Your CPU and chipset temps shouldn’t be getting anywhere near the 65-70c mark when at full load. If your temp is at these levels when idle, you are running too hot.
Your GPU temp will depend on the card, but they usually run a lot hotter than CPUs so don’t be alarmed if it is showing a reading of 70c.

If you find that your system is in fact overheating, take a look at the fans. Are they caked in dust? Also check your case fans. Are they spinning? If all fans appear to be working, you may need to re-seat your CPU. Here is a well written article on how to do it: http://compreviews.about.com/od/tutorials/ss/DIYCPU.htm
 
Never have i heard of BSOD's from overheating! the pc would simply power off automaaticly. But... is this machine home built?
 
It's a barebone. I drop in the CPU, memory and HD. The CPU had thermal protect and the MB supports it. It' suppose to instant off with a hi-lo-hi-lo tone if that ever happens. I've never heard it on this box. I'm d/l'ing the software and will post what I find.
 
ok, I ran the CPU stress test for about 40 min at 100% and it pretty much stayed between 50-55C with the average being around 52. I've attached the picture. Fans are working and clean.

I had another blue screen that was identified as video driver issue. I checked the video dll version on another like mine to make sure I was running the same version video dll's. I wasn't but am now.
 
ok well from the picture it looks fine hmm... you said you just updated your video card right well if it crashes again post the minidump file. correct me if I am wron this is what has been done to te pc.

ram test.
cpu test.
new PSU.
clean from the inside.
hdd test.
new hdd.
update all drivers.

have you look at the motherboard to see if there are any capacitors that look like they are going to bust or are leaking. also let us know if the update you did for the video drive fix ur problem
 
Thats pretty accurate.

I did inspect the caps for bulges and leaks but forgot to mention it. I checked that out while I was replacing the power supply.

*When* I get my next crash. I'll post the whole dmp this time unless you prefer I process it. I'm also going to try to get more time with that second system to check more driver versions. That second machine has been stable for 3+ years.
 
Uninstall the CPU driver and see if it makes a difference. Windows will install it's own driver when you reboot.
 
I has a lock up with no minidump and a blue screen with a minidump. I'm attaching the file. It looks basically the same. same trap location. Same irql. etc.

As far as uninstalling the cpu driver and rebooting. I'm not sure how to do that. I beleive my CPU driver is amdk7.sys - but not sure how to force it to reload...

I'm reconciling driver version with another machine like mine - but turns out I have a different built in audio chip, so I'll post if any of those drivers are incorrect and if it *seems* to make a difference.
 
ok try this you said you have two 512 sticks of ram try putting one to see if it crashes if it dose try the other the reason I am asking to try is because although the test you used to test your ram are good thier have been cases where they pass bad ram
 
I went through all of the devices in device manager and comapred it to another machine just like mine. There were a number of devices that were loaded with *different* drivers. Now, I can't say that becuase they were different, they were wrong - but the other system is running fine. While I was going through the expercise of comparing devices from machine 1 to machine 2. My machine Blue screened 2-3 times. One of the times it still complained it was the video driver - which I know if ok as it is the same version on the other machine. What I did find that was interesting (mildly) was that one of my system devices which was identified as PCI standard to PCI-to-pci bridge should have been NForce2 AGP host to PCI Bridge. I updated the driver and we shall see.

I also uninstalled the processor device driver from device manager and rebooted. Nothing dramtic to report as of yet. I'm gonna hold out before I change anything else to see if these changes have ANY effect. I've already changed *too much* in a short period of time to know if everything is ok, which item was the silver bullet.

I'll attach more minidump if I have any more from this point forward. Then I'll start swapping parts out (Memory, and Processor) from another know good machine.
 
Well,

Not all stories have a happy ending. On the last blue screen, I found my machine with a black screen and the computer was on, but quiter than normal. Only the chipset fan was running. Alas, the Motherboard has choked and gasped for one last time. It will no longer post.

I did the humane thing and harvested it's organs for the next machine and have already transplanted the CPU into a better, more deserving machine.

Thanks for everyone's help (i.e. xxdanielxx for the most part).
 
Status
Not open for further replies.
Back