Very odd behavior with BSOD & Memtest86+

Status
Not open for further replies.

thephatp

Posts: 9   +0
Hi everyone,

I apologize for the long post, but I'd like to give as much relevant info as possible up front. And I desperately need help, so please hang with me.

I just built a new system, and I'm having some very frustrating and debilitating problems. I'll try to be as concise as possible, but there's a lot to it.

Background, I've run both Windows XP Pro SP2 32-bit and Vista Business 64-bit with similar results. Currently, I'm running XP.

I started experiencing random BSODs about 6 weeks ago. I have since tried to decipher a large number of minidump files (unfortunately, I didn't save any from Vista), and I'm not sure what to think. About half of them are "IMAGE_NAME: memory_corruption" but not all of them. From reading through some of the dumps, I thought it was my sound card (which was an old SB Auditgy 2 Platinum card), because of many `kmixer.sys` problems in the minidumps. So, I finally replaced it, thinking it would solve the problem, but to no avail.

The trend is very strange. I shutdown (read, completely turn off) my PC. Give it a few minutes. Turn it back on. Now I get about 3 days to 1 week with no problems, no BSOD. Finally, it crashes somewhere in that timeframe. So, I reboot (read, do NOT shutdown or turn off), and then come back up and check out the dump. Now I get about another day or so (1-2 days) before another BSOD. Again, reboot, keep working, and then BSOD in some number of hours (say 6-12 hrs). Again, reboot, keep working, and then BSOD in about 1 hours. The time between crashes continues to shrink until I completely shut down the PC. If I shut it all off, give it a few minutes, then power it back on, I "reset" my time horizon and can get another 3 days to 1 week, but the process just repeats.

Now, a couple of weeks ago (when I was switching back and forth between Vista and XP Ghost images), I was getting to a point where I'd try to load ghost or the Vista repair CD, but both would provide an error saying "Kerner is corrupt, or missing". It took quite a while, but I finally got around this. FYI, all of my problems started happening after I downloaded and installed a number of CS3 trial apps (Photoshop, Illustrator, Acrobat, Dreamweaver). None of the trials worked, and gave an error. Mind you, this is on a brand new hard drive, and I started from scratch with windows installs (then made ghost images afterward), so there was never any Adobe software installed before. I've been really tempted to blame my problems on Adobe, but I'm trying to be rational. ;)

Ok, back to memory testing...When I was having crashes in vista and experiencing the "kernel" errors above...Once I got past the kernel error such that I could run the Windows Memory Diagnostic tool, it would report "hardware issues" withing about the first 30 seconds of testing. This was VERY consistent with this tool throughout about 2 weeks of crashing and testing. At the same time, Memtest86+ never reported a single error.

More recently...Last night I experienced my first crash after replacing my sound card. I rebooted and immediately ran Memtest86+ and lo and behold, it reported a large number of errors. I didn't know how to read the display and unfortunately, I didn't pay much attention to the errors except that I saw a lot of "ffffffff" passed but "0fffffff" (or something other than 'f' in the first place) failed. I thought, well, I'll just remove the 4 sticks (4x1GB) and test each one individually. I let each one go through at least the first pass (in the first DIMM slot) and no errors. It was really late, so I put them all back in and ran the test overnight, and 5 hours later, no problems whatsoever.

So, here's my list of questions:

  • Has anyone seen the consistent process of BSODs progressively occurring more often as I explained above?
  • Does restarting completely flush the RAM, or is power still supplied such that only a hibernate or shutdown would flush it? (I ask b/c perhaps this is why it gets progressively faster until I kill power.)
  • Could the answer to the question above relate to why I saw errors in memtest86+ immediately after I rebooted, but never after a shutdown (which is obviously required for removing RAM...ie., memory is flushed, so my time horizon starts over)?
  • Can I determine (from the output) which RAM module is faulty if memtest86+ shows an error? Or do I then have to start testing each individual stick?
  • Is there any other stress test I can use to try to test the RAM? (I'm tired of waiting day's in between crashes to troubleshoot...this is getting very drawn out.)
  • What else (what other hardware) could be causing this problem? And how would I test such hardware?
  • Would posting any of my numerous minidump files be of any benefit?
  • Any other suggestions on how to go about troubleshooting this issue?
  • Could Adobe really have anything to do with this? (I've thought about zeroizing my HD, but that would be a huge pain as well, and I have no real evidence to support trying that right now.)
  • Anything that you can think of that I didn't ask! ;)


Also, another FYI...I updated my video card drivers (posted Jan 8, 2008), as well as my sound card divers (beta version posted yesterday, Feb 13, 2008). I'll list my setup below.

Any help/suggestions/comments are GREATLY APPRECIATED!!! (And sorry for the long post.)

Thanks,

Chad


Core 2 Duo E6750
4GB (4x1GB) Crucial Ballistix DDR-800 RAM
Gigabyte P35-DS3R v2.0
PowerColor Radeon x1950pro 512MB
Creative X-Fi Fatality Platinum Sound Blaster
Seagate Barracuda 250GB SATA 7200 RPM
PC Power & Cooling Silencer 610 EPS12V EPS12V 610W Continuous
Antec Solo case w/ 3 fans + Zalman CNPS9500AT
 
My guess is the video card or the drivers for it are bad.
No you cannot tell which module, but you can remove the extra modules, and run MemTest86 on one module at a time... takes longer, but you will know for sure.
 
so just curious, why do you think it's the video card?

When I installed everything the first time, the video card game me problems, but that was because I installed the .NET Framework 3.0, which made it crash as soon as Windows would load (after login) and the Catalyst program would start. However, not updating with .NET 3.0 solve that problem.

Of course, I know it could be any of my hardware. I'm just curious to know what symptoms points to that. (FYI, I'm really new to troubleshooting hardware b/c I've never had a problem with any of the PCs I've built. I guess I should consider myself lucky that this is the first real issue. ;) )

Thanks!
 
Supplying your minidumps would be a big help. If you have errors in MemTest you have corrupted memory. The only thing you can do is replace said memory.
 
Ok, here are the dumps. Let me know if there is anything telling in them (because I've tried, but I don't really know how to read them).

Thanks!


FYI, I can't remember which dump file it is, but one of the dumps occurred in safe mode when I was trying to do the !analyze command in debugger tools.
 
I went through 10 of your 12 minidumps and anytime I see a slew of different Window drivers and executibles listed the first thing comes to mind is memory corruption. I speak from personal experience with error codes and probable causes all over the place but it came down to me having one bad stick out of 4.

Sure enough 4 out of 10 of your dumps specifically pointed to memory corruption.

As I said earlier, if you have errors in MemTest then you have bad RAM that needs to be replaced. It cannot be fixed.

I suggest running MemTest on each stick to see if one or all are bad. Let us know how it works out.
 
Route44 said:
I went through 10 of your 12 minidumps and anytime I see a slew of different Window drivers and executibles listed the first thing comes to mind is memory corruption. I speak from personal experience with error codes and probable causes all over the place but it came down to me having one bad stick out of 4.

Sure enough 4 out of 10 of your dumps specifically pointed to memory corruption.

As I said earlier, if you have errors in MemTest then you have bad RAM that needs to be replaced. It cannot be fixed.

I suggest running MemTest on each stick to see if one or all are bad. Let us know how it works out.

Thanks Route44!

What would be the best way to test them? I've run memtest86+ on each stick but not for many hours at a time. I'm currently running S&M per the recommendation of someone on another board. This runs in Windows as opposed to a bootable.

I was also considering running Prime95 which I found from another post on this forum, I believe.

However, I have no idea which tool would be best. Any suggestions are welcome!!

I'll post results later on the test I'm running now.

Thanks!
 
If you have run MemTest and have errors you don't need another test and you don't need to run it long if errors came up early. My one bad stick I told you about was having errors right from the beginning; therefor no need to run it more than you have to.

MemTest is an excellent tool. Where it can fail is when bad memory passes memTest, but not the other wy around.
 
Route44 said:
If you have run MemTest and have errors you don't need another test and you don't need to run it long if errors came up early. My one bad stick I told you about was having errors right from the beginning; therefor no need to run it more than you have to.

MemTest is an excellent tool. Where it can fail is when bad memory passes memTest, but not the other wy around.

Yeah, that's my problem. The only time I've seen it with errors was once, and that was when all 4 sticks where in. When I turned everything off to pull all sticks but one and start trading them out, I never saw an error again. I let each stick go through at least 1 pass, then put all 4 back in and ran for about 5 hours or so, but nothing showed an error. In fact, I've run memtest about 10 times in the past 6 weeks and only once have a seen an error. :(
 
I'm going to throw out the suggestion that it may be a bad motherboard. I think the increasing frequency of the BSODs with relation to how long the computer has been on indicate something other than bad RAM.
 
"Only with all 4 sitcks." Some motherboards won't run 4 sticks even though they say they will, too much load on the ram bus drivers. Some will run them if you relax the ram timings a step.
 
SNGX1275 said:
I'm going to throw out the suggestion that it may be a bad motherboard. I think the increasing frequency of the BSODs with relation to how long the computer has been on indicate something other than bad RAM.

So, how would I go about testing the motherboard? I really have no clue where to start, there.

Tarkus said:
"Only with all 4 sitcks." Some motherboards won't run 4 sticks even though they say they will, too much load on the ram bus drivers. Some will run them if you relax the ram timings a step.

I'll definitely try that too. It's bittersweet, really--I'm frustrated that it won't crash more often so that I can try out more things in a shorter timespan. However, it's nice that I can continue to fuctions on this PC in the meantime. Otherwise, it'd be really bad if I couldn't figure out what was wrong AND couldn't use my PC. ;)
 
Unfortunately there isn't any direct way to test the motherboard, you have to eliminate other possibilities. Is there any consistency in the Event viewer just before these BSODs? The minidumps probably contain more useful information than the event viewer, but you don't need special knowledge/tools to look at the event viewer.
 
More frustration

Ok, so I've done some pretty thorough memory tests on each module with memtest86+, and I haven't found anything yet. I'm going to keep trying, though.

In the meantime, I had a problem where, for some reason, the c: drive was no longer readable. I don't know what happened...came home from work one day and couldn't boot to c: drive. I tried chkdsk, but it failed multiple times at about 75%. I hooked it up via external USB encasing, and the old reliable PC couldn't read the drive (though it could read the second partition--the d: drive--and recover files from there). I was so worried about not being able to recover my critical files (which I did, of the d drive, thank goodness), that I have now set up RAID0 & RAID1. OS is speed, storage is redundant. :)

Admist the problems I was having above, it really seemed like HDD failure. I was hoping that replacing that would fix the problem. But, as you can guess, since I'm trying here...it didn't.

I've gotten my first dump from my new fresh Vista Business x64 install, and I have a new error that I haven't seen before:

CRITICAL_STRUCTURE_CORRUPTION(109)

DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
BAD_PAGES_DETECTED: 6


Also, I have a kernel dump for this one, but I don't know how to read it. I followed the same process listed on this forum for the minidumps, but it gave the exact same information. The kernel dump is half a gig, so I'm guessing there is more info there, but I can't find info on how to read it.

Oh, and I guess I should mention that this happened last night while I was asleep. Nothing was going on--the PC was just sitting there, doing nothing but crashing. :(

I've tried uploading the minidump file, but it says the file "exceeds the forum's limit of 100.0 KB for this filetype." Is there any other way I can upload it? If not, you can find it here:

http://www.chadmorris.net/minidump/Mini030808-01.dmp


Thanks in advance for the help!

Chad
 
Status
Not open for further replies.
Back