Hi,
We have multiple dump files and two consistent bug checks:
CLOCK_WATCHDOG_TIMEOUT (101)
This indicates that an expected clock interrupt on a secondary processor, in a multi-processor system, was not received within the allocated interval.
(We unfortunately need a Kernel dump to analyze *101 bug checks as not enough information is saved at the time of the crash in a Minidump for *101 analysis).
WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. This fatal error displays data from the Windows Hardware Error Architecture (WHEA).
If we run an !errrec on the 2nd parameter of the bugcheck (address of the WER structure) we get the following:
Code:
===============================================================================
Section 0 : Processor Generic
-------------------------------------------------------------------------------
Descriptor @ fffffa8009fad0a8
Section @ fffffa8009fad180
Offset : 344
Length : 192
Flags : 0x00000001 Primary
Severity : Fatal
Proc. Type : x86/x64
Instr. Set : x64
[COLOR=#ff0000][I][B]Error Type : Cache error[/B][/I][/COLOR]
Operation : Generic
Flags : 0x00
Level : 2
CPU Version : 0x00000000000206a7
Processor ID : 0x0000000000000000
Code:
===============================================================================
Section 2 : x86/x64 MCA
-------------------------------------------------------------------------------
Descriptor @ fffffa8009fad138
Section @ fffffa8009fad2c0
Offset : 664
Length : 264
Flags : 0x00000000
Severity : Fatal
[COLOR=#ff0000][I][B]Error : GCACHEL2_ERR_ERR (Proc 0 Bank 5)[/B][/I][/COLOR]
Status : 0xbe2000000005110a
Address : 0x0000000099b8f800
Misc. : 0x0000019082018086
Cache error + GCACHEL2_ERR_ERR - Looks like faulty L2 Cache on Processor 0 (first and primary core) and Cache Bank 5. In the 2nd attached *124, it's on Processor 3 and Cache Bank 5. A *101 mixed with a *124 is very indicative of a faulty CPU.
MODULE_NAME: hardware
IMAGE_NAME: hardware
FAILURE_BUCKET_ID: X64_0x124_GenuineIntel_PROCESSOR_CACHE
BUCKET_ID: X64_0x124_GenuineIntel_PROCESSOR_CACHE
^^ Implies that this was NOT caused by any sort of software complication (drivers, etc) but hardware. PROCESSOR_CACHE also furthers our diagnosis in the CPU itself (L2 Cache) possibly being faulty.
There is only so much you can do with a bugcheck like this until it comes down to a faulty processor that will need to be replaced. Start from 1 and work downward:
1. Ensure your temperatures are within standard and nothing's overheating. You can use a program such as Speccy if you'd like to monitor temps -
Speccy - System Information - Free Download
2. Clear your CMOS (or load optimized BIOS defaults) to ensure there's no improper BIOS setting -
How To Clear CMOS (Reset BIOS)
3. Ensure your BIOS is up to date.
4. The
only software conflict that can usually cause *124 bugchecks are OS to BIOS utilities from manufacturer's like Asus' AI Suite. If you have something like this software-wise, remove it ASAP.
5. If all of the above fail, the only left to do is replace your processor as it is faulty.
-- In the mean time, can you also set it to Kernel dump so the next crash we have just a bit more info for analyzing purposes?
Windows key + Pause key. This should bring up System. Click Advanced System Settings on the left > Advanced > Startup and Recovery > Settings > System Failure > Change from Small Memory Dump to Kernel Memory Dump.
Regards,
Patrick