Minidumps, diagnostic tests, & diagnosis help

Status
Not open for further replies.
Greetings all. This is my first post, and thanks in advance for any feedback.

System Info--
Motherboard: Micro Star K7T Turbo2 MS-6330 ATX
CPU: AMD Athlon 1.1 GHz
HDD: Seagate ST320423a 20gb
RAM: Crucial 256mb PC 100 CL2 and Kingston 512mb PC 133 CL3
Video card: nVidia RIVA TNT2 Model 64 32mb
OS: Win XP Professional

Between June 25 and July 2, my computer has crashed repeatedly and created 34 minidumps, 27 before a repair install of XP Pro and 7 during and after the repair install.

In each of the 34 cases, Windows Debugger gives the reading:

Unable to load image ntoskrnl.exe, Win32 error 0n2
*** WARNING: Unable to verify timestamp for ntoskrnl.exe

Here is a summary of the BugCheck diagnosis:

Probably caused by:

csrss.exe (23 times)
ntoskrnl.exe (4 times)
win32k.sys (3 times)
atapi.sys (2 times)
ftdisk.sys (1 time)
memory_corruption (1 time)

From what I read on the www, I guessed that my problem was my HDD, RAM, or video card.

After my repair install of XP, I began running diagnostics.

I ran “Seatools” diagnostic on my Seagate HDD. It found two bad sectors and successfully repaired them.

I then cleaned my video card thoroughly and reinstalled it. My computer hasn’t crashed since: it’s been about 24 hours.

I then ran Microsoft “Windows Memory Diagnostic” on my RAM cards, with curious results. I have 2 RAM cards (512mb and 256mb) and 3 RAM slots on my motherboard.

First test (512 RAM in slot 1, 256 RAM not installed): Long test found 986 errors in Test 7 of 11 (MATS+ cache disabled) then stalled.

Second test (512 RAM in slot 3, 256 RAM not installed): Long test found no errors.

Third test (256 RAM in slot 3, 512 RAM not installed): Long test found no errors.

Fourth test same as first test: Long test found no errors on two passes.

Fifth test (512 RAM in slot 1, 256 RAM in slot 3): Long test found 60,000+ errors during Test 6 (WINVC) and I aborted test.

Reinstalled 256 RAM in slot 3.

Sixth test same as fifth test: Long test found no errors.

I plan to leave the RAM card right where they are until further errors occur.

My final test was chkdsk /r in Recovery Console. Chkdsk found and fixed one or more errors on the volume.

So my questions is, how likely is it that the problem was/is:

a) operating system
b) video card
c) RAM cards
d) RAM slots very finicky about placement of RAM card in slot
e) other

And what should I do at this point?
 
memtest results

Successful start-ups have been random today: probably 1 success for every 4 failures, which end in either black screen (no BIOS or system info, etc), automatic restart, or blue screen. Safe-mode is no more successful than normal start-up for booting OS.

I ran memtest86+ (v1.70) with the following results (what memtest calls pass 0, I’m calling pass 1):

512 RAM in slot 1: over 1 million errors found in test 5, pass 1, then memtest froze.

512 RAM in slot 2: 20 errors found in test 6, pass 1; over 5 million errors found in test 5, pass 2, then memtest experienced an “unexpected interrupt” and halted.

512 RAM in slot 3: no errors found but memtest experienced an “unexpected interrupt” during test 8, pass 1, and halted.

256 RAM in slot 1: over 1 million errors in test 3, pass 1, then memtest froze.

256 RAM in slot 2: over 2 million errors in test 3, pass 1, and I aborted out of impatience.

256 RAM in slot 2: 47 errors found in test 1, pass 1, over 350,000 errors in test 2, pass 1, no errors found in remainder of pass 1; no errors found in pass 2 or pass 3.

256 RAM in slot 3: 10 errors found in test 3, pass 1, then memtest experienced an “unexpected interrupt” and halted.

256 RAM in slot 3: 25600 errors found in test 4, pass 1, then memtest froze.

I don’t know what to make of these results.

As the memtest website states: “Please be aware that not all errors reported by Memtest86 are due to bad memory. The test implicitly tests the CPU, L1 and L2 caches as well as the motherboard. It is impossible for the test to determine what causes the failure to occur. However, most failures will be due to a problem with memory module. When it is not, the only option is to replace parts until the failure is corrected.”

It seems unlikely that both RAM cards would become corrupt simultaneously, unless a short in the motherboard somehow corrupted both of them.

Is it possible that the memory corruption is coming from the video card? I think one of the circuits on my video card is exposed: I can see half a millimeter of exposed metal on one of the circuits, where the green plastic covering has been scratched off or something. I have attached two png images of the circuit in question.

I have also attached two zip folders with 5 minidumps each.

What do people think is causing the problem: motherboard, CPU, RAM, videocard...?
 

Attachments

  • videocard1.PNG
    videocard1.PNG
    63.1 KB · Views: 5
More info

Start-ups have continued to be random and largely unsuccessful.

I went into BIOS and disabled CPU L1 and L2 caching as well as the Video BIOS shadowing, just to see what would happen.

I then ran memtest on the 256 RAM: 5 passes with no errors found. But immediately afterward and on subsequent occasions I was unable to start the computer (black screen or automatic restart) with the 256 RAM and the above BIOS settings.

I then ran memtest on the 512 RAM with the BIOS setting described above. Memtest found no errors but experience an "unexpected interrupt" during test 7, pass 1.

Attached is a jpg photo of the Interrupt info provided by memtest, with Stack info, etc..
 
Most of your dumps are crashed with a hard drive error. The ones caused by csrss.exe are hard drive related, and the others are most likely RAM related. So, maybe it's the motherboard that's the problem.

BugCheck F4, {3, 837be020, 837be194, 805fa160}
Probably caused by : csrss.exe
Inpage operation failed at 75b8ffb9, due to I/O error c000000e
EXCEPTION_CODE: (NTSTATUS) 0xc000000e - A device which does not exist was specified.

BugCheck 7A, {e1a06410, c000000e, bf8f4ea2, a10c860}
Probably caused by : win32k.sys ( win32k!TimersProc+ad )
ERROR_CODE: (NTSTATUS) 0xc000000e - A device which does not exist was specified.
DISK_HARDWARE_ERROR: There was error with disk hardware

KERNEL_DATA_INPAGE_ERROR (7a)
The requested page of kernel data could not be read in. Typically caused by a bad block in the paging file or disk controller error. If the error status is 0xC000000E, it means the disk subsystem has experienced a failure.
 
Thanks peterdiva.

I agree. I think it's the motherboard. Or maybe the CPU? Something central. It's a major system crash. Successful start-ups are about 1 out of 50.

I think I'm simply going to replace this old machine with a new one rather than attempt to pin down the problem.
 
Status
Not open for further replies.
Back