Thanks very much!
The attached kernel-dump is of the
CLOCK_WATCHDOG_TIMEOUT (101) bug check.
This indicates that an expected clock interrupt on a secondary processor, in a multi-processor system, was not received within the allocated interval.
BugCheck 101, {
19, 0,
fffff88003088180, 5}
^^ 19 clock ticks in regards to the timeout.
fffff88003088180 is the PRCB address of the hung processor, let's keep this address in mind.
Code:
0: kd> !prcb 5
PRCB for Processor 5 at fffff88003088180:
Current IRQL -- 0
Threads-- Current fffff880030930c0 Next fffffa8005503b50 Idle fffff880030930c0
Processor Index 5 Number (0, 5) GroupSetMember 20
Interrupt Count -- 0037aaea
Times -- Dpc 0000037a Interrupt 00000ae4
Kernel 0002fec7 User 0000099c
As this matches the 3rd parameter of the bug check, processor #5 is the responsible processor. Now with the information we have here thus far, we know that processor #5 reached 19 clock ticks without responding, therefore the system crashed. Before we go further, what is a clock tick? A clock interrupt is a form of interrupt which involves counting the the cycles of the processor core, which is running a clock on the processors to keep them all in sync. A clock interrupt is handed out to all processors and then they must report in, and when one doesn't report in, you then crash.
Let's now look at the stacks of the different processors to see what the threads were involved in:
We can use
knL and go through a grueling method of obtaining the trap frame, but we don't like having to put in more work, so let's use
kv instead on Processor 0:
Code:
0: kd> kv
Child-SP RetAddr : Args to Child : Call Site
fffff800`00b9bf98 fffff800`030daa4a : 00000000`00000101 00000000`00000019 00000000`00000000 fffff880`03088180 : nt!KeBugCheckEx
fffff800`00b9bfa0 fffff800`0308d6f7 : 00000000`00000000 fffff800`00000005 00000000`00002711 fffff880`0594697c : nt! ?? ::FNODOBFM::`string'+0x4e3e
fffff800`00b9c030 fffff800`035fd895 : fffff800`03623460 fffff800`00b9c1e0 fffff800`03623460 fffff800`00000000 : nt!KeUpdateSystemTime+0x377
fffff800`00b9c130 fffff800`03080113 : 00000000`0850f4d3 fffff800`00b9c1e0 fffffa80`06599820 00000000`00000002 : hal!HalpHpetClockInterrupt+0x8d
fffff800`00b9c160 fffff800`0306ac60 : 00000000`00000000 00000000`00000002 00000000`00000000 fffff800`031fee80 : nt!KiInterruptDispatchNoLock+0x163 (TrapFrame @ fffff800`00b9c160)
fffff800`00b9c2f0 fffff800`03078f2f : fffff800`031fee80 fffffa80`099c1758 fffffa80`099c1758 00000000`00000001 : nt!KxWaitForSpinLockAndAcquire+0x20
fffff800`00b9c320 fffff800`0307f45f : 00000000`00000002 00000000`00000002 fffffa80`06599820 fffff800`030646c6 : nt!KeAcquireSpinLockAtDpcLevel+0x6f
fffff800`00b9c370 fffff880`0569a901 : 00000000`00000000 fffff880`048f2b74 fffff800`00b9c4d0 fffffa80`05522db0 : nt!KeSynchronizeExecution+0x2f
fffff800`00b9c3b0 fffff880`0488178c : 00000000`00000000 00000000`00000000 fffff800`00b9c4d0 fffff800`00b9c4c0 : dxgkrnl!DpSynchronizeExecution+0xd5
fffff800`00b9c3f0 fffff880`048f2901 : fffff880`04881700 00000000`00000000 fffff800`00b9c4d9 00000000`00000002 : nvlddmkm+0x5d78c
fffff800`00b9c490 fffff880`048f27d6 : fffffa80`06b29850 00000000`00000001 fffffa80`09903500 00000000`00000005 : nvlddmkm+0xce901
fffff800`00b9c540 fffff800`0308e85c : 00000000`00000005 00000000`000000ff 00000002`40ce0a01 00000000`00000000 : nvlddmkm+0xce7d6
fffff800`00b9c570 fffff800`0308e6f6 : fffffa80`0744f718 fffffa80`0744f718 00000000`00000000 00000000`00000000 : nt!KiProcessTimerDpcTable+0x6c
fffff800`00b9c5e0 fffff800`0308e5de : 00000007`391768d4 fffff800`00b9cc58 00000000`000308cd fffff800`03202c28 : nt!KiProcessExpiredTimerList+0xc6
fffff800`00b9cc30 fffff800`0308e3c7 : 00000001`4d5753c4 00000001`000308cd 00000001`4d575344 00000000`000000cd : nt!KiTimerExpiration+0x1be
fffff800`00b9ccd0 fffff800`0307b8ca : fffff800`031fee80 fffff800`0320ccc0 00000000`00000001 fffff880`00000000 : nt!KiRetireDpcList+0x277
fffff800`00b9cd80 00000000`00000000 : fffff800`00b9d000 fffff800`00b97000 fffff800`00b9cd40 00000000`00000000 : nt!KiIdleLoop+0x5a
Code:
0: kd> .trap fffff800`00b9c160
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000002 rbx=0000000000000000 rcx=0000000000000001
rdx=fffff880048f2b74 rsi=0000000000000000 rdi=0000000000000000
rip=fffff8000306ac60 rsp=fffff80000b9c2f0 rbp=0000000000000002
r8=fffff80000b9c4d0 r9=0000000000000000 r10=fffff880048f2b74
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl nz na po nc
nt!KxWaitForSpinLockAndAcquire+0x20:
fffff800`0306ac60 488b0f mov rcx,qword ptr [rdi] ds:00000000`00000000=????????????????
Code:
0: kd> u @rip
nt!KxWaitForSpinLockAndAcquire+0x20:
fffff800`0306ac60 488b0f mov rcx,qword ptr [rdi]
fffff800`0306ac63 4885c9 test rcx,rcx
fffff800`0306ac66 75e8 jne nt!KxWaitForSpinLockAndAcquire+0x10 (fffff800`0306ac50) <--- looks like it may be in a loop
fffff800`0306ac68 f0480fba2f00 lock bts qword ptr [rdi],0
fffff800`0306ac6e 72e0 jb nt!KxWaitForSpinLockAndAcquire+0x10 (fffff800`0306ac50)
fffff800`0306ac70 8bc3 mov eax,ebx
fffff800`0306ac72 488b5c2430 mov rbx,qword ptr [rsp+30h]
fffff800`0306ac77 4883c420 add rsp,20h
Code:
0: kd> u fffff800`0306ac50 fffff800`0306ac6e
nt!KxWaitForSpinLockAndAcquire+0x10:
fffff800`0306ac50 ffc3 inc ebx
fffff800`0306ac52 851d70072500 test dword ptr [nt!HvlLongSpinCountMask (fffff800`032bb3c8)],ebx
fffff800`0306ac58 0f848f4dffff je nt! ?? ::FNODOBFM::`string'+0x5de0 (fffff800`0305f9ed)
fffff800`0306ac5e f390 pause
fffff800`0306ac60 488b0f mov rcx,qword ptr [rdi]
fffff800`0306ac63 4885c9 test rcx,rcx
fffff800`0306ac66 75e8 jne nt!KxWaitForSpinLockAndAcquire+0x10 (fffff800`0306ac50)
fffff800`0306ac68 f0480fba2f00 lock bts qword ptr [rdi],0
fffff800`0306ac6e 72e0 jb nt!KxWaitForSpinLockAndAcquire+0x10 (fffff800`0306ac50)
It appears at the time of the bug check, the thread was executing a pause (a CPU delay), and doing this in a loop waiting for a release. If we go further down the call stack I showed earlier, it looks like nvlddmkm.sys (nVidia video driver) is the driver that is causing the loop.
So, what's the summary so far? Processor #0 was the thread that created the bugcheck itself, and must have been interrupted by a clock interrupt in order to trigger the CLOCK_WATCHDOG_TIMEOUT bugcheck (which we can also see was the case in the call stack). It also appears that the nVidia video driver is causing a loop.
I'll spare you the post space/reading and let you know that processors 1-4 were idle.
Let's jump ahead to the problematic processor (5):
Code:
5: kd> kv
Child-SP RetAddr : Args to Child : Call Site
fffff880`030b01e0 fffff880`048e88d9 : fffff880`02300000 00000000`00000000 00000000`00000001 fffff880`030b02c0 : nvlddmkm+0xc42de
fffff880`030b0250 fffff880`04944eca : fffffa80`06c5c000 00000000`00000000 00000000`00000000 00000000`00000000 : nvlddmkm+0xc48d9
fffff880`030b0290 fffff800`0307fa1c : fffff880`04944e4c fffffa80`06ac6610 fffff880`030b03a0 fffffa80`08050f00 : nvlddmkm+0x120eca
fffff880`030b0320 fffff880`049bcca5 : fffff880`04c09508 fffffa80`06ac6610 fffffa80`0805b210 fffffa80`0805b210 : nt!KiInterruptDispatch+0x16c (TrapFrame @ fffff880`030b0320)
fffff880`030b04b8 fffff880`04c09508 : fffffa80`06ac6610 fffffa80`0805b210 fffffa80`0805b210 00000000`0041a614 : nvlddmkm+0x198ca5
fffff880`030b04c0 fffff880`04a326fb : fffffa80`06af4ca0 00000000`0041a614 00000000`00000000 fffffa80`07b9c000 : nvlddmkm+0x3e5508
fffff880`030b0500 fffff880`04a35093 : fffffa80`07b9c000 fffffa80`07b9c540 fffffa80`06ac6610 00000000`0041a614 : nvlddmkm+0x20e6fb
fffff880`030b0560 fffff880`04b6c99e : 00000000`00000000 00000000`00409614 00000000`00000000 fffffa80`07b9c000 : nvlddmkm+0x211093
fffff880`030b05c0 fffff880`04b6cf31 : 00000000`00100d74 00000000`00000002 00000000`00000003 fffffa80`07b9c000 : nvlddmkm+0x34899e
fffff880`030b0620 fffff880`0522a2fb : fffffa80`07b9c000 00000000`00000000 fffffa80`0805a000 fffffa80`000033bf : nvlddmkm+0x348f31
fffff880`030b0670 fffff880`04a8f35f : fffffa80`0805a000 fffffa80`07b9c540 00000000`00000000 00000000`00000010 : nvlddmkm!nvDumpConfig+0x3d5f3b
fffff880`030b0710 fffff880`04bfa10a : fffffa80`0805a000 fffffa80`05521d20 fffffa80`07b9c000 00000000`00000010 : nvlddmkm+0x26b35f
fffff880`030b0740 fffff880`04b95acb : fffffa80`07b9c000 fffffa80`0696f000 fffffa80`0805a000 00000000`0000578c : nvlddmkm+0x3d610a
fffff880`030b07a0 fffff880`04b97294 : 00000000`00000010 00000000`00000000 fffffa80`07b9c000 fffff880`030b08c0 : nvlddmkm+0x371acb
fffff880`030b0800 fffff880`04a89001 : 00000000`00000000 00000000`00000000 fffffa80`080565b0 fffffa80`08089f90 : nvlddmkm+0x373294
fffff880`030b0840 fffff880`04a895bf : fffffa80`080565b0 fffff880`030b0910 00000000`00000040 00000000`00000040 : nvlddmkm+0x265001
fffff880`030b0890 fffff880`04b96de2 : 00000000`00000040 fffffa80`07b9c000 fffffa80`07b9c000 fffffa80`080565b0 : nvlddmkm+0x2655bf
fffff880`030b0920 fffff880`04bfd875 : fffffa80`07b9c000 00000000`00000013 00000000`00000001 00000000`00000000 : nvlddmkm+0x372de2
fffff880`030b0960 fffff880`04bfd23c : fffffa80`0805a000 00000000`00000013 00000000`00000006 fffffa80`07b9c000 : nvlddmkm+0x3d9875
fffff880`030b0990 fffff880`04bfd468 : fffffa80`05f04930 00000000`00000014 00000000`00080000 fffffa80`080d61b0 : nvlddmkm+0x3d923c
fffff880`030b09e0 fffff880`04bfdf5d : 00000000`00000000 00000000`00080000 fffffa80`0805a000 fffff880`04a0598b : nvlddmkm+0x3d9468
fffff880`030b0a20 fffff880`04c32a48 : fffffa80`07b9c000 fffff880`030b0b39 fffffa80`07b9c38c 00000000`00000100 : nvlddmkm+0x3d9f5d
fffff880`030b0a50 fffff880`049bd065 : fffffa80`07b9c000 fffff880`030b0b39 fffffa80`07b9c38c fffffa80`07b9c38c : nvlddmkm+0x40ea48
fffff880`030b0a80 fffff880`048e8642 : fffffa80`06c5c000 00000000`00000000 fffffa80`06c5c0d0 00000000`00000000 : nvlddmkm+0x199065
fffff880`030b0ba0 fffff880`04944d4f : fffff880`048e85c3 fffffa80`06c5c000 fffffa80`06c5c000 00000000`00000000 : nvlddmkm+0xc4642
fffff880`030b0c40 fffff800`0308e30c : fffff880`04944cd6 fffff880`03088180 00000001`4d49a300 00000000`000000ad : nvlddmkm+0x120d4f
fffff880`030b0cd0 fffff800`0307b8ca : fffff880`03088180 fffff880`030930c0 00000000`00000000 fffff880`04945654 : nt!KiRetireDpcList+0x1bc
fffff880`030b0d80 00000000`00000000 : fffff880`030b1000 fffff880`030ab000 fffff880`030b0d40 00000000`00000000 : nt!KiIdleLoop+0x5a
Yep, looks like the nVidia video driver is in quite a loop.
Ensure you have the latest video card drivers. If you are already on the latest video card drivers, uninstall and install a version or a few versions behind the latest to ensure it's not a latest driver only issue. If you have already experimented with the latest video card driver and many previous versions, please give the beta driver for your card a try.
Regards,
Patrick