Thanks very much!
As I hoped, it's of the 0x101 bug check I discussed above. Let's get into it!
CLOCK_WATCHDOG_TIMEOUT (101)
This indicates that an expected clock interrupt on a secondary processor, in a multi-processor system, was not received within the allocated interval.
BugCheck 101, {
61, 0,
803d1120, 1}
^^ 61 clock ticks in regards to the timeout.
803d1120 is the PRCB address of the hung processor, let's keep this address in mind.
Code:
0: kd> !prcb 1
PRCB for Processor 1 at 803d1120:
Current IRQL -- 0
Threads-- Current 803d51e0 Next 00000000 Idle 803d51e0
Number 1 SetMember 2
Interrupt Count -- c08b4267
Times -- Dpc 00000b48 Interrupt 00000aad
Kernel 0007e984 User 0000edbf
Here's our problematic processor (#1).
Hint: At times, the 4th parameter of the bug check will show you the responsible processor. For example, in your *101 here, it was correct as the 4th parameter was 1.
In most cases, you can also generally tell the amount of cores on the box by checking the bugcheck_string within WinDbg. However, in your case, very different - BUGCHECK_STR:
0x101_AMD_SP1. As we discussed above, this likely indicates your processor itself had an issue and called a bugcheck as opposed to Windows seeing there's a problem, telling the processor, and calling the bugcheck.
As this matches the 3rd parameter of the bug check, processor #1 is the responsible processor. Now with the information we have here thus far, we know that processor #1 reached 61 clock ticks without responding, therefore the system crashed. Before we go further, what is a clock tick? A clock interrupt is a form of interrupt which involves counting the the cycles of the processor core, which is running a clock on the processors to keep them all in sync. A clock interrupt is handed out to all processors and then they must report in, and when one doesn't report in, you then crash.
Let's now look at the stacks of the different processors to see what the threads were involved in:
We can use
knL and go through a grueling method of obtaining the trap frame, but we don't like having to put in more work, so let's use
kv instead on Processor 0. Thankfully you only have a dual core system (thank you.... really...):
Code:
0: kd> kv
ChildEBP RetAddr Args to Child
82d0cc30 82cc27ad 00000101 00000061 00000000 nt!KeBugCheckEx+0x1e
82d0cc64 82cc454d ffffff02 000000d1 82d0ccf8 nt!KeUpdateRunTime+0xd5
82d0cc64 88717a9c ffffff02 000000d1 82d0ccf8 nt!KeUpdateSystemTime+0xed (FPO: [0,2] TrapFrame @ 82d0cc74)
82d0cce4 88718c0b 82cc7c56 86a9a9f0 82d11790 amdk8!C1Halt+0x4 (FPO: [0,0,0])
82d0cce8 82cc7c56 86a9a9f0 82d11790 82d0cd50 amdk8!C1Idle+0x5 (FPO: [0,0,0])
82d0ccf8 82cc7ba4 868b68c0 00000000 86faad78 nt!PpmCallIdleHandler+0x2e
82d0cd50 82cbf741 00000000 0000000e 00000000 nt!PoIdle+0x2d1
82d0cd54 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0xd (FPO: [0,0,0])
There it is! Let's move forward:
Code:
0: kd> .trap 82d0cc74
ErrCode = 00000000
eax=86a9a9f0 ebx=00028775 ecx=86af7c40 edx=00000000 esi=868b68c0 edi=86a9aa0c
eip=88717a9c esp=82d0cce8 ebp=82d0ccf8 iopl=0 nv up ei ng nz ac po cy
cs=0008 ss=0010 ds=0003 es=0000 fs=0588 gs=1300 efl=00000293
amdk8!C1Halt+0x4:
88717a9c c3 ret
Let's dump the stack:
Code:
0: kd> knL
*** Stack trace for last set context - .thread/.cxr resets it
# ChildEBP RetAddr g
00 82d0cce4 88718c0b amdk8!C1Halt+0x4
01 82d0cce8 82cc7c56 amdk8!C1Idle+0x5
02 82d0ccf8 82cc7ba4 nt!PpmCallIdleHandler+0x2e
03 82d0cd50 82cbf741 nt!PoIdle+0x2d1
04 82d0cd54 00000000 nt!KiIdleLoop+0xd
^^ What's happening here in this stack is your C1 state kicks in by idling the processor, and then halting it. All x86 processors (32-bit) have an instruction known as HLT which is is 'Halt' in what we see in the stack here -
amdk8!C1Halt. This does exactly what the word 'halt' would describe, which is to idle the CPU and halt it from doing anything until it receives an interrupt. An interrupt is a signal sent by the hardware to the CPU that basically says "Hey, I need this done
right now, so stop what you're doing and take care of it".
Why is this done? Well, in simplest terms, to limit power cunsumption. It's a power management feature.
Let's take a look into Processor #1's call stack like we did Processor #0:
Code:
1: kd> kv
ChildEBP RetAddr Args to Child
803ecce4 88718c0b 82cc7c56 86af5578 803d2f90 amdk8!C1Halt+0x4 (FPO: [0,0,0])
803ecce8 82cc7c56 86af5578 803d2f90 803ecd50 amdk8!C1Idle+0x5 (FPO: [0,0,0])
803eccf8 82cc7ba4 86af4ab0 00000000 87549d78 nt!PpmCallIdleHandler+0x2e
803ecd50 82cbf741 00000000 0000000e 00000000 nt!PoIdle+0x2d1
803ecd54 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0xd (FPO: [0,0,0])
^^ Same as above, but we can see that this specific processor wakes up from its halt to respond to this hardware signal as I discussed earlier.
So, what's the summary? Well, it appears that your CPU is reporting a CLOCK_WATCHDOG_TIMEOUT during a C1 Halt on your CPU, which is
never supposed to happen during normal operation.
Both cores of your processor were asleep, but then #0 came out of its halt state to reply to a hardware signal, and said to #1 "Hey, wake up, I need you to do this".... but #1 kept snoozing.
Code:
1: kd> lmvm amdk8
start end module name
88716000 88726000 amdk8 (pdb symbols) c:\localsymbols\amdk8.pdb\05084BEF43034D28A25660D0222439C01\amdk8.pdb
Loaded symbol image file: amdk8.sys
Image path: \SystemRoot\system32\DRIVERS\amdk8.sys
Image name: amdk8.sys
Timestamp: Sat Jan 19 00:27:20 2008
^^ Your chipset is dated circa 2008, which appears to be the latest according to MSI's website.
What does this mean? Likely very strong indication of a faulty CPU. Given the age of this machine just by taking a look at its hardware, it unfortunately doesn't surpise me at all. We can try the following first, though:
1. Ensure your temperatures are within standard and nothing's overheating. You can use a program such as Speccy if you'd like to monitor temps -
Speccy - System Information - Free Download
2. Clear your CMOS (or load optimized BIOS defaults) to ensure there's no improper BIOS setting -
How To Clear CMOS (Reset BIOS)
3. Ensure your BIOS is up to date via MSI's website.
4. Disable C1 states in the BIOS.
5. If all of the above fail, the only left to do is replace your processor as it is faulty. Given the age of the system, it may be worth it more not only cost-wise, but in general just to go ahead and if possible build a new system entirely.
Regards,
Patrick