Random BSODs and other issues

Petch

Posts: 7   +0
Hello All,

I have been having issues with this machine since I built it in April of last year. The problems never seem to be too consistant, or maybe I just keep missing the underlying problem and keep fixing the symptoms of the real problem.

Back in April of last year, I built my new rig which consisted of:

Windows 7 Ultimate (64bit)
Qmicra Qv2E mATX Case
XFX HD-597A-CNF9 ATI Radeon HD 5970 2.0GB
ASUS Rampage II GENE X58 mATX Motherboard
Intel Core i7-930 Bloomfield 2.8GHz
6GB Mushkin Redline DDR3 2000
2x Intel X25-M Mainstream 160GB SSDs (RAID 0)
Seagate Barracuda 7200.12 1TB
Pioneer Blu-ray Writer BDR-205BKS
Creative SB X-Fi Titanium
PC Power & Cooling 750w Silencer
Thermalright Ultra-120 Extreme (True Black)

After using the system for a few months, I found that I had random lock ups and reboots that werent resulting in BSOD errors and started troubleshooting the gear. Troubleshooting for over a week showed no consistant problem, but it was still happening (sometimes during gaming sessions, sometimes during web browsing, or just watching a movie). At this point I thought it might be my power supply due to the fact I had re-used it from my old system and it might be going back. I replaced the power supply with a new Seasonic 750w modular PSU. The problems still persisted. I then opened up a case with XFX to start checking out my video card due to screen artifacts and random shutoffs. After a solid TWO MONTHS of troubleshooting with them they gave me an RMA number to send my card in to be tested. During the testing the card FAILED and they replaced my card with a refurb. Now, at this point I thought my problems were licked and I could be a happy gamer... I had no such luck.

After I got the replacement card I still was having problems (Now with BSODs not just random shut offs). I then proceeded to open a case with ASUS to check my motherboard out. I explained my issues to the tech over the phone and he promptly gave me an RMA on the spot. ASUS tested out my board and said it was also faulty. They replaced my board with an ASUS Rampage III Gene (USB 3.0, SATA 6GB). With my new board in hand, I reinstalled my system to get a fresh start with the new board. Everything seems awesome for a few days, then I get my next BSOD (I have minidumps from this point forward). During this time I also noticed some problems while playing games (L4D2, BF:BC2, TF2, Dirt2, etc). Specifically with BF:BC2, I noticed when the Bloom effect happens my screen starts going black... also before I spawn in to play (during your weapon selection or the start of the map) there is extreme lag like my processor/video card cant keep up.

Before I started RMAing parts, I ran Prime95 and Memtest86+ for 48 hours with no problems.... I ran them again after I swapped the motherboard with no issue. I have also checked all my memory timings/voltages and actually opened a tread to discuss everything over on the Mushkin forums (the link was included but I couldnt post with the thread with the link).... I have also noticed that my sound is glitching and hanging a lot, but I dont know if that is related to these issues. My latest BSOD was 0x00000101... before that I believe it was 0x0000009c...

My system specs as of today:

Windows 7 Ultimate (64bit)
Qmicra Qv2E mATX Case
XFX HD-597A-CNF9 ATI Radeon HD 5970 2.0GB
ASUS Rampage III GENE mATX
Intel Core i7-930 Bloomfield 2.8GHz
6GB Mushkin Redline DDR3 2000
2x Intel X25-M Mainstream 160GB SSDs (RAID 0) - OS/Game Drives
Seagate Barracuda 7200.12 1TB
Pioneer Blu-ray Writer BDR-205BKS
Creative SB X-Fi Titanium PCI-e 1x
SeaSonic X750 Gold 750W Modular PSU
Thermalright Ultra-120 Extreme (True Black)
4x COOLER MASTER Excalibur R4-EXBB-20PK-R0 120mm Case Fans (1 CPU, 1 Rear, 2 Front)

Minidumps attached, and thanks for the input in advance!
 

Attachments

  • Minidumps.zip
    85.9 KB · Views: 1
0x9C: MACHINE_CHECK_EXCEPTION
This is a hardware issue: an unrecoverable hardware error has occurred. The parameters have different meanings depending on what type of CPU you have but, while diagnostic, rarely lead to a clear solution. Most commonly it results from overheating, from failed hardware (RAM, CPU, hardware bus, power supply, etc.), or from pushing hardware beyond its capabilities (e.g., overclocking a CPU).

Are you doing any overclocking of your hardware? Specifically your cpu? In light of your 9C error three errors in your minidumps were all the same 0x101: CLOCK_WATCH_TIMEOUT

This indicates that an expected clock interrupt on a secondary processor, in a multi-processor system, was not received within the allocated interval. The specified processor is not processing interrupts. Typically, this occurs when the processor is nonresponsive or is deadlocked.

*** If your cpu is overclocked then set it at default. Do you gain stability?


The other error is 0x3B and these are usually caused by video card drivers. The dump specifically cited the video driver atikmdag.sys
 
0x9C: MACHINE_CHECK_EXCEPTION
This is a hardware issue: an unrecoverable hardware error has occurred. The parameters have different meanings depending on what type of CPU you have but, while diagnostic, rarely lead to a clear solution. Most commonly it results from overheating, from failed hardware (RAM, CPU, hardware bus, power supply, etc.), or from pushing hardware beyond its capabilities (e.g., overclocking a CPU).

Are you doing any overclocking of your hardware? Specifically your cpu? In light of your 9C error three errors in your minidumps were all the same 0x101: CLOCK_WATCH_TIMEOUT

This indicates that an expected clock interrupt on a secondary processor, in a multi-processor system, was not received within the allocated interval. The specified processor is not processing interrupts. Typically, this occurs when the processor is nonresponsive or is deadlocked.

*** If your cpu is overclocked then set it at default. Do you gain stability?


The other error is 0x3B and these are usually caused by video card drivers. The dump specifically cited the video driver atikmdag.sys



Thanks for the quick response. No, I havent OCed this machine at all yet. I wanted to get the system stable before doing any OCing, but I could never get the system stable enough to be comfortable OCing it.
 
For the video card drivers I suggest doing the following...

1. Download Driver Sweeper free version to your desktop screen and install.

2. Download the latest diver(s) for your video card but don't install them.

3. Uninstall your video card drivers and reboot your PC into Safe Mode. Run Driver Sweeper but ONLY for the video card drivers. I had someone use it on their chipset drivers! If it doesn't find any video card drivers that is quite okay; just leave all other drivers alone.

4. Reboot and install new video card drivers.


** I am wondering if your cpu is overheating.
 
Sorry for the delayed response.

Last night I removed my aftermarket heatsink/fan for my CPU, cleaned up the thermal paste, and replaced it with the stock heatsink/fan. After booting into Win7 I opened Steam, Chrome, and Pidgin. My machine was running for MAYBE 2-3 minutes and I got the BSOD 0x101 again. The system was no more than 50-60 C at that time, so I dont believe this to be an issue with overheating. In addition to that, I have been in 6 hour gaming session of BF:BC2 and (aside of the aforementioned issues) Ive had no problems with overheating. The aftermarket heatsink/fan combo I was using was one of the best air cooling solutions out there for a 1366 chipset.

I haven't cleaned out the video card drivers yet (again), but Ill give that another shot. In the past I have used Driver Sweeper and cleaned all my drivers, then followed it up with CCleaner to wipe all the registry entries for ATI, and reinstalled the drivers and still had the same issue.
 
No need to be sorry Petch. You had PC work to do. :)

Go ahead and attach those minidumps that occurred after your work on the heatsink, etc.
 
I'll attach them as soon as I get home from work.

Also, last night I used Real Temp 3.6 and Prime95 to check out how hot my stock cooler gets. My max temp leveled off at 89 C at full load for around 15ish minutes. This is much higher than when I tested before with the aftermarket cooler.


EDIT: I didnt get any BSODs while running the stress test. Right after I replaced the cooler and rebooted I got the BSOD. After the BSOD, I tweaked my memory timings and then ran the stress test.
 
Well this is odd.... I dont see a Minidump from the BSOD (0x00000101) that I had last night.

Maybe I restarted my machine to quickly after it took a dive?
 
89c = 192f. From the little research I have done 80c for i7 chips should be the max but I only have scant info.

You may want to look into this further. Heat will shut a system down fast.
 
My other heatsink (Thermalright Ultra 120) keeps the processor significantly cooler, but I am waiting for a new mounting setup to arrive in the mail. Last night (while gaming) my CPU reached a max temp of 62 C, so I am not running at 89 C on the regular.

Honestly I have had a feeling for some time that my CPU was faulty, so I am now going through the RMA process with Intel. I am still open for other things to try in the meantime though.
 
Im still using the stock cooler, as I dont expect the new mounting kit for my Thermalright to get here until Monday.

I forgot that I had the BIOS controlling my fan speeds. I got the new 4 pin fans, and was allowing my BIOS to alter my fan speed based on temp. Well, I guess it doesnt work too well, because at full load the CPU is now only getting up to 74 C. Again, I dont believe that this is the cause of the problems I was seeing, but this will give me time to test before my RMA CPU gets here.

I still dont like the fact that Im hitting 74 C at full load, but I know the temps were much lower when I was using my Thermalright.
 
Back