Crashes caused by suspected hardware problems

zzz

Member
Joined
Mar 4, 2018
Posts
12
I am running Windows 10 Pro on an Intel i5-4670k CPU with an ASRock OC Formula Z87 motherboard and 16 GB of Corsair memory. I built this system exactly four years ago. Even though I overclocked the CPU to 4.7 GHz, for the first three years the system ran flawlessly, with not a single crash. CPU temps rarely exceeded 50° C, and were usually significantly lower.

A little over a year ago, I started getting occasional crashes, so I reduced the CPU speed from 4.7 to 4.6 GHz. The crashes immediately disappeared. But a few months ago, I started getting crashes again. I reduced the CPU speed from 4.6 to 4.5 GHz, with no effect. Finally, I reduced the CPU speed from 4.5 GHz to 4.0 GHz, with no effect. In fact, the crashes seem to be increasing in frequency.

There are two types of crashes that happen. The first is that the computer crashes while waking from sleep. It appears to wake up, but it wakes up into a reboot.

The second crash is actually a freeze. I/O appears to be frozen, and the system clock does not advance. The mouse pointer does not move. However, background processes appear to continue, since after I reboot my PC, I find that tasks that were scheduled to run without user interaction during the freeze did run successfully.

I am highly suspicious of my motherboard at this point. A number of USB ports on the motherboard have died over the years. I have also recently had a problem with my keyboard; see Computer periodically overwhelmed by phantom keypresses.

When I looked in the event log to try to ascertain why my computer was crashing, there were no warning messages before the crash. After the crash, there was a message to the effect that the computer had crashed because it had lost power unexpectedly. As a result, I thought there might be a problem with the power supply. Fortunately, I had a brand new power supply that I was able to swap in, but it made no difference at all. Again, this leaves me suspecting the motherboard.

I ran thorough tests on my memory, and they reported no errors.

I have made no major software changes to my PC, and I have not added any new hardware (aside from the power supply) or drivers since before this problem started. This really looks like a hardware problem to me, but I suppose it could be software. How do I find out? Any help will be greatly appreciated. Thanks!
 
Hello, have you experienced any BSODs? If yes, please attach minidump files.

Have you tested the HDD and have you tried replacing the RAM with another instead of testing it. Tests are never 100% accurate.
 
With multiple, apparently unrelated potential hardware issues, I always suspect power too. Since you replaced your PSU, that rules the PSU out, but not necessarily power. I always recommend every computer be run off a "good" UPS with AVR. This is because a surge and spike protector is little more than a fancy and expensive extension cord. They do absolutely nothing for low voltage power anomalies like dips (opposite of spikes) and sags (opposite of surges) or brownouts (long duration sags). While these events are not necessarily damaging like excessive surges and spikes can be, they can and do result in unstable operation - freezes, sudden shutdowns and reboots.

Also, "dirty" power can come from an improperly wired or grounded wall outlet. So every home and every computer user should have access to a AC Outlet Tester to ensure your outlet is properly wired and grounded. I recommend one with a GFCI (ground fault circuit interrupt) indicator as it can be used to test bathroom and kitchen outlets (outlets near water) too. These testers can be found for your type and voltage outlet, foreign or domestic, (like this one for the UK) at most home improvement stores, or even the electrical department at Wal-Mart. Use it to test all the outlets in the home and if a fault is shown, have it fixed by a qualified electrician.

I agree with softwaremaniac about BSODs. You might also check in Event viewer to see if you can see any errors just seconds before the crash occurs. No errors does not mean there are no errors - it can mean the crash was so sudden, there was no time for the OS to log the error.

You should reset all your clocks back the defaults while troubleshooting.

Also, there are several heat-sensitive components besides the CPU that can affect system stability too. So make sure the interior is clean of heat trapping dust and all case fans are spinning properly for a good front-to-back flow of cool air through the case. Consider blasting a desk fan into the open case if you suspect there may still be a heat issue.

You should probably test your RAM. I recommend MemTest86. Allow the diagnostics to run for several passes or even overnight. You should have no reported errors – not even one.

Note, however, that software-based RAM diagnostic tools are good, but none are conclusive. If they report any errors, even one, the RAM is bad. But it is not totally uncommon for them to report no problems, yet the RAM still fails in use, and/or when paired with other RAM. So, you might try running with just a single RAM stick to see if it fails. Repeat process with remaining modules, hopefully identifying the bad stick through a process of elimination. Just be sure to unplug the computer from the wall and touch bare metal of the case interior BEFORE reaching for the RAM to discharge any destructive static in your body.

At any rate, these are all things I suggest you check before spending any money on a new motherboard.

When you get a chance, please list your system specs, or fill out your system specs here.
 
Thanks for your help, guys! I appreciate it very much.
Hello, have you experienced any BSODs? If yes, please attach minidump files.

I have not experienced any BSODs - just the crashes and the freezes.
Have you tested the HDD and have you tried replacing the RAM with another instead of testing it. Tests are never 100% accurate.

I haven't done any specific tests on the disks. My main drives are SSDs, and I have some auxiliary HDDs for large storage. Would you recommend anything other than CHKDSK on these?

I don't have any extra RAM, but I have two 8 GB sticks installed, and I can get by quite fine temporarily using only one stick. So I have removed one stick and I am currently running on 8 GB; I'll see how it goes, and swap sticks if I still get errors.
With multiple, apparently unrelated potential hardware issues, I always suspect power too. Since you replaced your PSU, that rules the PSU out, but not necessarily power. I always recommend every computer be run off a "good" UPS with AVR.

I have been using a fairly new CyberPower CP1500AVRLCD, which has AVR.
Also, "dirty" power can come from an improperly wired or grounded wall outlet. So every home and every computer user should have access to a AC Outlet Tester to ensure your outlet is properly wired and grounded.

I've been using the same outlet to run all my computers for the past 15 years, and this is the first time I've had any problems of the sort that I've described. However, I know that new problems can arise with existing wiring, and that outlet tester costs almost nothing, so I'll order it and check things out. [I have now ordered it.]
I agree with softwaremaniac about BSODs. You might also check in Event viewer to see if you can see any errors just seconds before the crash occurs. No errors does not mean there are no errors - it can mean the crash was so sudden, there was no time for the OS to log the error.

I did this, and there are never any errors or even suspicious activity before the crashes. Everything seems to be going fine, and then the reboot process starts.
Also, there are several heat-sensitive components besides the CPU that can affect system stability too. So make sure the interior is clean of heat trapping dust and all case fans are spinning properly for a good front-to-back flow of cool air through the case. Consider blasting a desk fan into the open case if you suspect there may still be a heat issue.

When the problems started arising, I did a thorough cleaning of my case using compressed air. I also cleaned all the air filters. All fans were spinning properly before and after the cleaning. Temps in the case, CPU, and GPU have always been fine. I have a large case with lots of fans (Noctuas, so they're very quiet), so the inside case temperature is currently a cool 71° F. There is good front-to-back flow of cool air through the case.
You should probably test your RAM. I recommend MemTest86. Allow the diagnostics to run for several passes or even overnight. You should have no reported errors – not even one.

For my memory testing, I ran several passes of MemTest86 with no errors.
Note, however, that software-based RAM diagnostic tools are good, but none are conclusive. If they report any errors, even one, the RAM is bad. But it is not totally uncommon for them to report no problems, yet the RAM still fails in use, and/or when paired with other RAM. So, you might try running with just a single RAM stick to see if it fails. Repeat process with remaining modules, hopefully identifying the bad stick through a process of elimination. Just be sure to unplug the computer from the wall and touch bare metal of the case interior BEFORE reaching for the RAM to discharge any destructive static in your body.

This is what I am doing now.
When you get a chance, please list your system specs, or fill out your system specs here.

I have filled out my system specs.

Thanks again for your help!
 
Examine the electrolytic caps on the mobo. Look for obvious signs of distortion or heating. Check their tops for convex crowning by sight and by touch.
 
Please check HDD smart status with a tool like GSmartControl and SSDLife for checking SSDs (or SSD manufacturer tool).

I ran CrystalDiskInfo, which gives detailed SMART status on all my drives, including my two SSDs in RAID. Everything checked out perfectly. The highest temp was 32° C, with most temps clustering in the upper twenties.

Removing one stick of RAM did not affect the crashes, which are continuing to get more frequent. I take the increased frequency as another sign that this is a hardware rather than software problem, especially since no new problems are arising. I have switched my RAM sticks now. If I still have problems, it seems to me it must be the motherboard, or possibly the CPU. Any disagreements?
 

Has Sysnative Forums helped you? Please consider donating to help us support the site!

Back
Top