How to troubleshoot a defective motherboard

By Jskid
Mar 6, 2015
Post New Reply
  1. Where I last worked we received a new server. I set it up and installed Linux, but each time the OS got corrupt. I searched for damaged files in the file system, I tried burning a new installation disk, I ran memtest and contacted the provider to make sure the server is designed to run the particular version of Linux. Eventually my supervisor told me "when all other trouble shooting fails it's safe to say it's the motherboard". So the motherboard came defective.

    Any suggestions on how to test for a faulty motherboard?

    I went to a job interview recently and was retelling the story and someone at the interview said "to take it a step lower, it was probably a heat problem where the controller was sending an invalid signal". Is this true and what does it mean?
  2. Tmagic650

    Tmagic650 TS Ambassador Posts: 17,244   +234

    "it was probably a heat problem where the controller was sending an invalid signal"... This may mean that the motherboard heat sensor is sending an invalid "overheat" signal to the CPU. This is a very rare problem though. New server, have the motherboard replaced
  3. jobeard

    jobeard TS Ambassador Posts: 9,315   +618

    Hmm; "motherboard" failures are rare. A chip may fail (eg onboard graphics controller, usb controllers and even the bios), but to induce the "OS corruption" (rather non-specific statement isn't it) is hard to believe.

    Without the specific failure information we can only lapse into gross generalities and I cite examples of
    PAGE-FAULT in non paging area, IRQ-NOT-EQUAL and a rash of other bizarre symptoms. These make you believe that the code has become 'corrupt' and certainly 'the code did really fail'. But the root cause of the code failing however is one of two unseen errors (in the order of most likely & frequent):
    1. bad memory
    2. hd write error
    Obviously software does not tire out, get brittle nor rust. If it ran once, it will run 2^32 epochs into the future.

    Because our Windows client systems are run by millions of naive and untrained people of all ages, Microsoft created tools to make software failures more easy to diagnose and analyze. Linux crash dumps are no cake walk and take lots of experience to even read, let alone analyze and tools are almost non-existent.

    As to HEAT being a root cause - - absolutely possible and frequently occurring. Poorly regulated power is another root cause. BTW: both of these also effect our client systems and no tools will ever find these as a root cause.

Similar Topics

Add New Comment

You need to be a member to leave a comment. Join thousands of tech enthusiasts and participate.
TechSpot Account You may also...