Fun Stuff: Part II (BSoDs, Mobos, and HDs)

Status
Not open for further replies.
I decided to continue this thread over here in "CPUS, Chipsets and Mobos" since it is becoming apparent to me that Windows XP is not the source of my troubles, at least not this time. A brief overview of the previous topic follows:

When I have two WD 74GB Raptors plugged in and running I get BSoDs every few hours telling me that windows is shutting down to prevent system damage. Upon rebooting I would get an "NTLRD is missing" error. To fix this error I simply reset the BIOS with the jumper on the motherboard. After a few days of this I decided to experiment and unplug my second Raptor. After getting a BSoD early in the morning and rebooting to the "NTLDR is missing" I powered down, unplugged the Raptor and booted up into Windows without any problems.

Someone in the previous thread suggested that I try a new SATA cable on my second Raptor, but after putting the new SATA cable in and running my computer for about an hour I received a BSoD that mentioned "DRIVER_RQL_NOT_LESS_OR_EQUAL" and something about an "NVATA." With that I unplugged the second Raptor and left it that way. Things seemed to go well from that point forward, although some strangs things happened such as me leaving the computer for a few hours and coming back to find that it had rebooted itself.

The first BSoD that I have witnessed after removing my second Raptor just recently occurred as I was sitting down surfing the Net. The BSoD only stayed a few seconds before rebooting, but it mentioned "KERNAL_STACK_INPAGE_ERROR" or some such. Upon rebooting I was greeted not by the usual NTLDR error but by "DISK BOOT FAILURE, INSERT SYSTEM DISK AND PRESS ENTER." I decided to reboot and go into the BIOS, and noticed that my first Raptor was now not being detected. I powered down and then back up and everything was back to normal. After a File System Consistency Check on C: Windows rebooted and is currently running normally.... for now.

I am beginning to think that perhaps my motherboard is at fault and not my hard drives, but I cannot really tell with any certainty. If anyone has any ideas what is going on, I would really love to hear them, thanks.

Relevant Information:
OS: Windows XP Home Edition
Mobo: ASUS A8N-SLI Premium
Proc: AMD FX-57
HDs: 74GB WD SATA Raptor x2 (No Raid)
Vid: eVGA 7800GTX x2
RAM: Corsair PC3200 1Gig Stick x2
 
Without knowing anything more than what you have reported I would suspect one of the two below to be the cause of your troubles:

1. What kind of PSU are you using? A weak or poor quality PSU will cause system problems that manifest themselves in strange ways.

2. Have you loaded the latest drivers for the mobo? Try the following:
Remove all the Nvidia drivers (including display) via the windows add/remove programs utility, restart in safe mode and run DrivercleanerPRO then restart and install the latest Nvidia driver set. Also, if you do this, do not load the Nvidia Armor Firewall. See if that stabilizes things.

Good luck mate and great flash movies BTW.
 
The power supply I bought for the computer is a Thermaltake Silent Pure Power W0049RUC 680W Power Supply.

I did as you suggested and deleted all my Nvidia drivers and then purged the rest using DrivercleanerPRO in safe mode. I currently have everything reinstalled and detected now, so I will plug in my second Raptor and see how things go from here. Thanks for the compliment and the assistance.
 
If possible I'd let it run for a day on a single Raptor. If everything is stable then plug the second in. What you want to do is determine exactly where the trouble is originating. Possibly you've solved it already. :grinthumb
 
I think it's memory issues. Make a Memtest86 cd and boot it and run it through some passes(an hour or so.) If you can raise Vdimm in BIOS it's worth a try.

The PSU may be undervolting, causing Vcore or Vdimm too go to low.

Rasing Vcore will give stability, I don't know your cooling but with stock HSF I'd use 1.5v or less.
 
I downloaded the Memtest86 ISO and then burned it to a CD and ran it for a few passes, but it did not detect any errors. Unfortunatly it seems I am still having issues, although the computer was running fine on two Raptors for quite a few hours. I was surfing the Internet and reading over some things when the computer screen went black and the computer rebooted. There was no BSoD this time but when the computer booted up I got the "NTLDR is missing" error again. I unplugged the second Raptor and as usual the computer booted up fine after a consistency check.

I guess I'll let the computer run for day or so on just one Raptor as you suggested and then see what happens.
 
Common reasons for this problem:

1) Wrong drivers for the controller. Use the latest NF4 drivers (for the A8N* series).
2) PSU problems on the 5V line. Monitor it ... 5% deviation is allowed. Ensure to connect each drive on its own line.
3) Drive Cable (SATA) is faulty. Try another one - buy a decent new one.
4) SATA connector has a dry connection - look at the scrape marks on ther copper to determine. Get a new cable.
5) The controller / mobo might be faulty - test the drive on another PC.
6) The drive is going south - due to overheating. Get a new HDD or get some cooling for it - although once damaged ...
 
Lot's of great suggestions above and all are worthwhile as they help eliminate what the problem isn't, which is the only way to troubleshoot sometimes, unfortunately. Replacing the SATA cables is an easy, cheap and worthwhile fix. I've had 3 or 4 die on me in the past. The red ones that Asus provides have proved to be the most reliable I've had so use them. Also, try another SATA port, if the cables don't help, as you may have a bad one.

Memtest86 is great and I run it whenever I play with RAM timigs or replace a DIMM. It doesn't, however, rule out your RAM as being the culprit. It shows that the modules themselves are sound but your timings may be causing instability while passing memtest. What timings are you using on the RAM side and what votages. Are the DIMMs matched? How many are you running (2 sticks or 4)?

Are you overclocking the FX57 at all? What voltages etc.

Do you have RAID enabled in BIOS? If you are plugged into the Nvidia SATA ports then make sure that the Nvidia SATA controller is disabled in Bios. The Nvidia ports are enabled all the time and the only control you have is over the Raid controller. Just make sure it is diabled in BIOS.
 
It seems the problem may have been somewhat solved. I removed my second Corsair 1Gig stick from the motherboard and then reconnected my second Raptor and have kept the computer on for about 48 hours now with no BSoDs or random reboots.

I seem to remember now that when I had my two sticks of RAM installed into the motherboard in the incorrect positions (Slot 1 and Slot 2) I had no issues except for reduced Video and CPU speeds. When I installed them into the correct positions as described in my motherboard's manual (Slot 1 and Slot 3) the problems began to occur even though my Video and CPU speeds increased.

The RAM timings I am currently using are as follows:
Frequency: 201.0 MHz
FSB:DRAM: CPU/14
CAS# Latency: 3.0 clocks
RAS# to CAS# Delay: 3 clocks
RAS# Exchange: 3 clocks
Cycle Time (Tras): 8 clocks
Bank Cycle Time (Trc): 11 clocks
DRAM Idle Timer: 16 clocks

I had them set to around 2/2/2/5 for awhile but stopped changing the timings from their defaults (above) because I was resetting the BIOS so often to get rid of the "NTLDR is missing" error every time it popped up. My FX-57 isn't over clocked, everything is set to the default rates and I am not using RAID.

Since my computer seems to be finally running stable with only the one stick of Corsair memory in it, is it most likely that the problem is that stick that I removed or a problem with the motherboard?
 
Well it seems I spoke too soon. After running beautifully with no errors for a few days the computer decided to rebel against me again and present me with a nice BSoD. The error this time on the BSoD was "KERNAL_STACK_INPAGE_ERROR" which I received a week or so ago as well. After rebooting I went through the entire process of getting the "NTLDR is missing" error and then unplugging my second Raptor and letting Windows run a consistency check on the first one before successfully booting up into Windows.
 
ssfso-
I realize that you have plenty of juice coming out of that PSU but it seems like power problems of one kind or another MAY be causing your issues. motherboard MAY be bad too or you MAY have to re-install windows entirely. A lot of MAY's there I know. Try the following before you RMA and check out the barebones setup I posted at the bottom:

1. Hook a mutimeter up to your main power plug and watch the voltages as you stess the system. With a unit of that size you should see minimal fluctuations. I tested mine by inserting the multimeter prongs directly into the backs of the wire sockets on the main 24 pin power plug. Remember black is ground, orange is 3.3v, red is 5v and yellow is 12v. This is admittedly dangerous as you are fooling with a live board with full voltages running through it. Alternativley you can plug the prongs right into an unused 4 pin molex and get a fairly reasonable idea of your PSU's output on the 12v and 5v rails (this is the safest route). Play a game with a browser open and word running etc. Really stress things out. This is a fairly easy test and a good skill to know. PM me if you need more instruction. If you don't have a multimeter you can get one for under $20 at Radio Shack or its equivalent. these are good tools to have regardless and you don't need a really good one. Just a basic Radio Shack portable is enough.

2. If voltages are rock solid then it is time to do a barebones setup. I'll cut and paste a little guide I made up at the bottom. this is more time consuming but is also a great skill to develop. It must be done in order and each step fully to make sure you look at all possible trouble spots.

3. Still no joy then slick a HDD and do a full reinstall of Windows including the RAID drivers (using the floppy/F6 step) and test again.

4. If the system is still buggy then it may be time to RMA the mobo.

Try the above, though, as RMA'ing a good mobo is a hassle, will leave the problem unresolved and will leave youiwthout your PC for a couple of weeks. Let us know how it works out.

BAREBONES SETUP:
You are going to have to do a barebones setup and test each component. This will read a lot harder than it actually is. The initial procedure takes only around 10-15 minutes. The follow on troubleshooting may take a lot longer though. Also, please do not skip steps. Do everything in order and as listed or your troubleshooting will be flawed.

Caution: Please remember that turning a PC off does not mean there is no power going through it. Modern systems maintain a trickle of power to keep the standby functions running. You either have to turn off the switch on the Power Supply Unit (PSU) itself or unplug the system from the wall. Unplugging is best. If you have a LED on the mobo that is lighted all the time. make sure it is out before proceeding. Also, be aware of static. Make sure you wear and ESD strap or discharge yourself on a steel part of the case before touching anything inside.

First, unplug the PC from the wall and then open it up. Disconnect all the drives (floppy, CDROM, DVD etc.)from the motherboard (mobo) and also disconnect your Hard Drive(s) from the mobo. Do not leave the hard drives connected. The system will boot into BIOS just fine with no hard drive attached. Unplug the power from all those drives you disconnected from the mobo. Remember to disconnect the front panel firewire and/or USB ports.

Next, remove all the RAM, except for one stick, from the mobo. Some mobos are very picky about where the RAM needs to be placed so make sure the one stick of RAM is in the correct slot as per your manual.

Now you are stripped down to a barebones system. The PSU, the mobo itself, 1 stick of RAM, the CPU/HSF and video card. Reset your CMOS/BIOS while the system is stripped down, unplugged and open. You do this by removing the battery and then moving a jumper near the battery around. Usually there are a set of three pins with two covered by a jumper. You move the jumper from pins 1&2 to pins 2&3 and let it set for a few minutes then reset the jumper to pins 1&2 and replace the battery. CMOS and BIOS will be back at default settings after doing this.

Now check that everything is seated correctly, both the 4 pin and 20 or 24 pin power is connected and secure and if so then plug the PC back into the wall and make sure that any LEDs that should be lighted on the mobo are lighted. If all is still well then turn it on. Hopefully she boots right back into BIOS.

If you get back into BIOS you can start troubleshooting by turning the PC off and unplugging it and reconnecting peripherals one at a time. The idea here is to connect and reboot until something hangs your system up This presumably is the bad piece of gear.

If you cant get into BIOS and have the same problem as before then you know it is either the PSU, the RAM, the CPU, the mobo itself or the video card. Change out each these until you get into BIOS. I would start at the PSU as it is usually the guilty party in a situation like this and is also easy to change in and out you are down to just 2 plugs now remember). Next up would be the video card and/or RAM and if still no luck then things get hard as you now have to consider either the CPU or the mobo.

Good luck and happy hunting.
 
If you have a old hard drive, you might try Windows out on it. A bad hard drives can cause freekish problems.
 
Status
Not open for further replies.
Back