Games Crashing Frequently - Occasional BSOD - I'm out of ideas

My computer has been having issues for months. The gist of it comes down to Display driver nvlddmkm stopped responding and has successfully recovered. With the occasional VIDEO_SCHEDULER_INTERNAL_ERROR blue screen.

I've tried two different GPUs, both give the same problem and neither have problems on my friend's PC. I've even shipped my GPU back to GIGABYTE under an RMA and they stated nothing is wrong with it.

I have tried different motherboards (same model). I have tried different CPUs (same model). I have tried swapping between each of my sticks of RAM, using only one at a time (in addition to running memtest86 for several hours). I have re-installed Windows multiple times (using both Rufus to create a boot drive and trying Reset This PC). I have used DDU to clean my drivers and re-install. I have tried various versions of my BIOS and various GPU drivers. I have swapped out the stock cables on my PSU for aftermarket ones. I have not tried a different PSU at this time. I have also tried lowering the core clock and memory clock of my GPU a significant amount in increments to no avail. Setting the Power Management in Nvidia Control Panel to "Prefer maximum performance" did not resolve the issue.

I have absolutely no idea what to do at this point with the exception of trying a new PSU but I wanted to check here and see what your thoughts were before as I don't currently have the available funds to give that a shot. Please give me any advice you can and I'll happily provide any information you need.


Speccy: https://pastebin.com/4KeYu6Gp

Minidump: https://ufile.io/mokds5id

Sysnative: https://ufile.io/dsrgoq6h
 
This doesn't sound like your first build. It sounds like you are an experienced builder?

What version of Windows 10 you go there? 20H2?

DId you Google >> AMD X570 chipset driver and install that from AMD? Do that if you didn't.

DDU is a good idea from Safe Mode next and then Google>>Nvidia Driver to get your video card driver.

Did you type "Check for Updates" until it did them all? Do that.

Try SFC /scannow command from the command prompt.

Change out all USB devices and remove as much stuff and software as you can because you are running a stability test. You need this rig as basic as you can get it to be able to test.

Are you using a video card riser cable? Remove that if you are?

Are you running the monitor at higher refresh then 59.9HZ or 60HZ if you are..crank it back down for the test.

Is that RAM you are running on the QVL of the Gigabyte board? Is is rated to run that amount of ram>>> 64GB at 3200MHz? It might not be! Does the GSkill site say it will work? You could check that? You could also try running the RAM at slower speeds like 2133MHz (Just for test purposes!). Try running the rig on 1 or 2 sticks (I know you tried one) but I wasn't sure it will BSOD on one stick.

Grab HWInfo to monitor temps while you test.

PSU? Sure..it could be that.
 
This doesn't sound like your first build. It sounds like you are an experienced builder?

What version of Windows 10 you go there? 20H2?

DId you Google >> AMD X570 chipset driver and install that from AMD? Do that if you didn't.

DDU is a good idea from Safe Mode next and then Google>>Nvidia Driver to get your video card driver.

Did you type "Check for Updates" until it did them all? Do that.

Try SFC /scannow command from the command prompt.

Change out all USB devices and remove as much stuff and software as you can because you are running a stability test. You need this rig as basic as you can get it to be able to test.

Are you using a video card riser cable? Remove that if you are?

Are you running the monitor at higher refresh then 59.9HZ or 60HZ if you are..crank it back down for the test.

Is that RAM you are running on the QVL of the Gigabyte board? Is is rated to run that amount of ram>>> 64GB at 3200MHz? It might not be! Does the GSkill site say it will work? You could check that? You could also try running the RAM at slower speeds like 2133MHz (Just for test purposes!). Try running the rig on 1 or 2 sticks (I know you tried one) but I wasn't sure it will BSOD on one stick.

Grab HWInfo to monitor temps while you test.

PSU? Sure..it could be that.

Yep! I'd like to think I'm a pretty experienced builder and IT technician.

So in order:

I'm not sure for reasons I'll get to in a moment!

Had installed AMD Chipset Drivers.

Had done DDU for sure.

Had installed all updates (and at the behest of another thread, enrolled in Insider Program and installed those)

Did SFC /scannow with no negative results.

No video card risers.

I didn't know that about the refresh rate, but that's good info to have.

Temps don't seem to be a problem.



I had checked the drive health of all my drives, and the SSDs were in good health. I don't specifically recall seeing any issues with the HDD but I don't want to confirm it was good or bad.

I have actually somewhat resolved the problem. I took out all my drives and reinstalled windows on both SSDs and used them both individually for a while to verify no crashes. So far I've had no issues, though I still only have one of my SSDs in. So I've thought it was either a Windows update, an issue with the combination of drives. Or the HDD itself.


Someone else on another forum mentioned this:

1. all the failures are in dxgmms2.sys, and it seems very unlikely for arbitrary HDD corruption to cause that.

2. Disk Drives have error correction. If the data they read is corrupted, they won't actually send it as the result of I/O and will retry the operation. if they can't read the data without the ECC triggering they will eventually fail the read operation- At which point, I expect, there would be a BSOD surrounding the virtual memory manager.

Given your new information I think your assessment regarding the power supply could be correct. Removing drives would reduce the load on the power supply, and Graphics cards are the greatest consumer of Power in many modern systems. It could be the PSU just can't handle the system when under load and the Graphics card is the component that has trouble as a result.

It's an EVGA 1000W GQ that I bought in Dec. 2020, so I never really thought load might be the problem. Unless it's possible the power supply is just defective, but I assumed (seemingly incorrectly) I'd be seeing more severe issues if the power supply were the problem.


Do you have any thoughts?
 
Did you skip the part about RAM? Try that because it's cheaper than buying a PSU. You could look at the voltages of the PSU in BIOS but there is no load and if the PSU is acting up under load that would be impossible to determine without specialized equipment.

PSSSt>> I have had lots of problems with EVGA PSUs so I stopped re-selling them and went Seasonic/Corsair many years ago. In fairness to you my problems were with the 500w versions of their bronze PSUs. Your upper end 1000w might be perfectly fine but it is one variable you have to try swapping out.
 
Last edited:
First make sure the mainboard firmware is completely up to date or at least has the latest firmware installed. Do not use Beta versions unless recommended by manufacturer. For AMD processors the AGESA version makes a big difference.

Next make sure you have nothing other the bare minimum to install windows installed.
Be sure memory is not using XMP settings.

Then I'd completely wipe the boot drive or use a new one, use microsoft windows 10 media creator to get a completely up to date and fresh copy of windows 10.
Be sure the boot drive is the only drive connected.
No other usb devices connected besides mouse, keyboard, and Windows USB stick.

As soon as windows is installed install the latest nvidia drivers specific for the machine, mainboard drivers, etc.
I then use driver booster to get all other devices etc.
I next use patch my pc to get all the visual basic runtimes installed and the latest versions there of.

Finally once everything looks good install ONE game you're having issues with and play it. If it is fine, try another, etc keep going till the issues return. Once the trouble comes back the last game installed may be your culprit.

If none of the games show any issues try turn XMP back on and PBO etc. if you were using it.
If still no issues then great!
 
Back