Ah, great! Let's begin.
CLOCK_WATCHDOG_TIMEOUT (101)
This indicates that an expected clock interrupt on a secondary processor, in a multi-processor system, was not received within the allocated interval.
Code:
BugCheck 101, {[COLOR=#ff0000]19[/COLOR], 0, [COLOR=#4b0082]fffff880038f7180[/COLOR], 6}
19 clock ticks in regards to the timeout.
fffff880038f7180 is the PRCB address of the hung processor, let's keep this address in mind.
Code:
0: kd> !prcb 6
PRCB for Processor 6 at [COLOR=#4b0082]fffff880038f7180[/COLOR]:
Current IRQL -- 0
Threads-- Current fffff880039020c0 Next fffffa8005955750 Idle fffff880039020c0
Processor Index 6 Number (0, 6) GroupSetMember 40
Interrupt Count -- 003627cd
Times -- Dpc 0000001d Interrupt 00000025
Kernel 000bc5cc User 0000754d
For reference, I did not do !prcb 0 through 4. That would have been very tedious. Instead, you can use
!running -it. The "i" argument causes it to display idle processors too, and "t" displays the stack trace for the thread running on each processor. If we run that extension, it shows the is an
8 core box.
Hint: At times, the 4th parameter of the bug check will show you the responsible processor. For example, in your *101 here, it was correct as the 4th parameter was 6.
Hint #2: You can also generally tell the amount of cores on the box by checking the bugcheck_string - FAILURE_BUCKET_ID: X64_CLOCK_WATCHDOG_TIMEOUT_8_PROC_ANALYSIS_INCONCLUSIVE
As this matches the 3rd parameter of the bug check, processor #6 is the responsible processor. Now with the information we have here thus far, we know that processor #6 reached 19 clock ticks without responding, therefore the system crashed. Before we go further, what is a clock tick? A clock interrupt is a form of interrupt which involves counting the the cycles of the processor core, which is running a clock on the processors to keep them all in sync. A clock interrupt is handed out to all processors and then they must report in, and when one doesn't report in, you then crash.
Let's now look at the stacks of the different processors to see what the threads were involved in:
Code:
0: kd> kv
Child-SP RetAddr : Args to Child : Call Site
fffff800`00b9c088 fffff800`030e0a4a : 00000000`00000101 00000000`00000019 00000000`00000000 fffff880`038f7180 : nt!KeBugCheckEx
fffff800`00b9c090 fffff800`030936f7 : 0000057f`00000000 fffff800`00000006 00000000`00002710 fffffa80`077ad8f0 : nt! ?? ::FNODOBFM::`string'+0x4e3e
fffff800`00b9c120 fffff800`03603895 : fffff800`03629460 fffff800`00b9c2d0 fffff800`03629460 fffffa80`00000000 : nt!KeUpdateSystemTime+0x377
fffff800`00b9c220 fffff800`03086113 : 00000000`1ac652cf fffff800`00b9c2d0 fffffa80`06f21128 fffff800`00b9cc58 : hal!HalpHpetClockInterrupt+0x8d
fffff800`00b9c250 fffff800`03070c50 : 00000000`00000000 00000000`000000ff 00000000`00000000 00000000`00000801 : nt!KiInterruptDispatchNoLock+0x163 (TrapFrame @ [COLOR=#ff0000]fffff800`00b9c250[/COLOR])
fffff800`00b9c3e0 fffff800`03092d19 : 00000000`00000002 00000000`00000000 fffffa80`04f34c58 fffff800`0307fdfe : nt!KxWaitForSpinLockAndAcquire+0x10
fffff800`00b9c410 fffff880`089a1f66 : fffffa80`076f1900 fffff800`00b9cc58 fffff800`00b9c4c0 fffffa80`06f201a0 : nt!KeAcquireSpinLockRaiseToDpc+0x89
fffff800`00b9c460 fffff880`089a315d : fffffa80`071c51a0 fffff880`089b1f11 fffffa80`095512a0 fffffa80`06f20050 : USBPORT!USBPORT_AcquireEpListLock+0x2e
fffff800`00b9c490 fffff880`089bc83f : fffffa80`095512a0 fffffa80`06f201a0 fffffa80`06f20050 fffffa80`08d3d800 : USBPORT!USBPORT_ReferenceEndpoint+0x29
fffff800`00b9c4e0 fffff880`089c5454 : fffffa80`08d3d800 fffffa80`08d3d800 00000000`00000000 fffffa80`08d3d800 : USBPORT!USBPORT_Ev_Rh_IntrEp_Invalidate+0xf3
fffff800`00b9c540 fffff800`0309485c : fffff800`00b9c600 00000000`00000000 00000000`00000001 fffff800`00b9c600 : USBPORT!USBPORT_AsyncTimerDpc+0xb8
fffff800`00b9c570 fffff800`030946f6 : fffffa80`08d3d820 00000000`000c3b64 00000000`00000000 00000000`00000000 : nt!KiProcessTimerDpcTable+0x6c
fffff800`00b9c5e0 fffff800`030945de : 0000001d`1debc6da fffff800`00b9cc58 00000000`000c3b64 fffff800`03207f08 : nt!KiProcessExpiredTimerList+0xc6
fffff800`00b9cc30 fffff800`030943c7 : 00000007`f1271bc1 00000007`000c3b64 00000007`f1271b5d 00000000`00000064 : nt!KiTimerExpiration+0x1be
fffff800`00b9ccd0 fffff800`030818ca : fffff800`03204e80 fffff800`03212cc0 00000000`00000000 fffff880`0899edb0 : nt!KiRetireDpcList+0x277
fffff800`00b9cd80 00000000`00000000 : fffff800`00b9d000 fffff800`00b97000 fffff800`00b9cd40 00000000`00000000 : nt!KiIdleLoop+0x5a
There it is! Let's move forward:
Code:
0: kd> .trap fffff800`00b9c250
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000002 rbx=0000000000000000 rcx=0000000000000001
rdx=000000004f444648 rsi=0000000000000000 rdi=0000000000000000
[COLOR=#ff0000]rip=fffff80003070c50[/COLOR] rsp=fffff80000b9c3e0 rbp=fffffa8006f21128
r8=000000004f444648 r9=0000000000000000 r10=fffff80003205801
r11=0000000000000002 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl nz na pe nc
[COLOR=#0000cd]nt!KxWaitForSpinLockAndAcquire+0x10[/COLOR]:
[COLOR=#ff0000]fffff800`03070c50 [/COLOR]ffc3 inc ebx
Here we can find the stored registers and the stack at the time of the interrupt.
This is where we're going to do some instruction disassembling:
Code:
0: kd> u @rip
nt!KxWaitForSpinLockAndAcquire+0x10:
fffff800`03070c50 ffc3 inc ebx
fffff800`03070c52 851d70072500 test dword ptr [nt!HvlLongSpinCountMask (fffff800`032c13c8)],ebx
fffff800`03070c58 0f848f4dffff je nt! ?? ::FNODOBFM::`string'+0x5de0 (fffff800`030659ed)
fffff800`03070c5e f390 [COLOR=#006400]pause[/COLOR]
fffff800`03070c60 488b0f mov rcx,qword ptr [rdi]
fffff800`03070c63 4885c9 test rcx,rcx
fffff800`03070c66 75e8 [COLOR=#ff0000]jne[/COLOR] [COLOR=#ff0000]nt!KxWaitForSpinLockAndAcquire+0x10[/COLOR] (fffff800`03070c50)
fffff800`03070c68 f0480fba2f00 lock bts qword ptr [rdi],0
Disassembling the first few instructions reveals a jump if not zero (jne) that is back up in
KxWaitForSpinLockAndAcquire+0x10. It appears at the time of the bug check, the thread was executing a pause (a CPU delay), and doing this in a loop waiting for a release. Why? Let's keep going to find out!
So, what's the summary so far? Processor #0 was the thread that created the bugcheck itself, and must have been interrupted by a clock interrupt in order to trigger the CLOCK_WATCHDOG_TIMEOUT bug check.
Let's take a look into Processor #1's call stack like we did Processor #0:
Code:
1: kd> kv
Child-SP RetAddr : Args to Child : Call Site
fffff880`0371bc98 fffff800`03092709 : 00000000`00299ead fffffa80`071f13f8 fffffa80`06f77a02 fffffa80`06f77a18 : intelppm!C1Halt+0x2
fffff880`0371bca0 fffff800`0308189c : fffff880`009e9180 fffff880`00000000 00000000`00000000 fffff880`0899edb0 : nt!PoIdle+0x52a
fffff880`0371bd80 00000000`00000000 : fffff880`0371c000 fffff880`03716000 fffff880`0371bd40 00000000`00000000 : nt!KiIdleLoop+0x2c
Processor #1 was idle/asleep.
Let's check Processor #2:
Code:
2: kd> kv
Child-SP RetAddr : Args to Child : Call Site
fffff880`0e5f14e0 fffff800`030a9251 : 00000000`00000000 00000000`00000008 00000000`00000001 fffff880`02025f00 : nt!KeFlushMultipleRangeTb+0x260
fffff880`0e5f15b0 fffff800`030abc98 : 00000000`00000008 fffff880`0e5f1700 fffff8a0`0eba6000 00000000`00000001 : nt!MiFlushTbAsNeeded+0x1d1
fffff880`0e5f16c0 fffff800`031baf86 : 00000000`00008000 fffffa80`04e77000 00000000`00000009 fffff800`0308ca8a : nt!MiAllocatePagedPoolPages+0x4cc
fffff880`0e5f17e0 fffff800`030a99b0 : 00000000`00008000 fffffa80`04e77000 00000000`00000009 20206553`0307f5f2 : nt!MiAllocatePoolPages+0x906
fffff880`0e5f1920 fffff800`031be43e : 00000000`00000000 fffff880`07bf2090 00000000`00000000 00000000`00008000 : nt!ExpAllocateBigPool+0xb0
fffff880`0e5f1a10 fffff800`0309cf56 : 00000000`00000000 00000000`00000009 fffff8a0`025ab060 fffff800`03375c5f : nt!ExAllocatePoolWithTag+0x82e
fffff880`0e5f1b00 fffff800`032f5f86 : 00000000`00000000 00000000`00008000 00000000`00000000 00000000`00000001 : nt!ExAllocatePoolWithQuotaTag+0x56
fffff880`0e5f1b50 fffff800`0334db94 : fffff8a0`0b80f800 fffff800`00008000 fffff880`0e5f1c01 fffff800`03557da0 : nt!PiControlGetInterfaceDeviceList+0x92
fffff880`0e5f1bd0 fffff800`03088e53 : fffffa80`092773b0 00000000`014aede0 fffff880`0e5f1ca0 00000000`014aee68 : nt!NtPlugPlayControl+0x100
fffff880`0e5f1c20 00000000`7778230a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`0e5f1c20)
00000000`014aeda8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x7778230a
It looks like Processor #2 is waiting to flush the translation lookaside buffer cache. Why? A/some TLB entries likely became invalid due to whatever's going wrong here, therefore the next step is to flush the cache.
Let's check Processor #3:
Code:
3: kd> kv
Child-SP RetAddr : Args to Child : Call Site
fffff880`037fbc98 fffff800`03092709 : 00000000`00299ead fffffa80`071f1738 00000000`ffffffed 00001fc8`b0289b80 : intelppm!C1Halt+0x2
fffff880`037fbca0 fffff800`0308189c : fffff880`037d3180 fffff880`00000000 00000000`00000000 fffff880`00dd9f78 : nt!PoIdle+0x52a
fffff880`037fbd80 00000000`00000000 : fffff880`037fc000 fffff880`037f6000 fffff880`037fbd40 00000000`00000000 : nt!KiIdleLoop+0x2c
Processor #3 was idle/asleep.
Let's check Processor #4:
Code:
4: kd> kv
Child-SP RetAddr : Args to Child : Call Site
fffff880`009fcc98 fffff800`03092709 : 00000000`00299ead fffffa80`072031a8 00000000`ffffffed 00001fc8`ca2157d7 : intelppm!C1Halt+0x2
fffff880`009fcca0 fffff800`0308189c : fffff880`009b1180 fffff880`00000000 00000000`00000000 fffff800`03141430 : nt!PoIdle+0x52a
fffff880`009fcd80 00000000`00000000 : fffff880`009fd000 fffff880`009f7000 fffff880`009fcd40 00000000`00000000 : nt!KiIdleLoop+0x2c
Processor #4 was idle/asleep.
Let's check Processor #5:
Code:
5: kd> kv
Child-SP RetAddr : Args to Child : Call Site
fffff880`0d481670 fffff800`03345bdf : 00000000`00000000 fffff880`0d481ca0 00000000`00000000 00000000`00000000 : nt!KeFlushProcessWriteBuffers+0x65
fffff880`0d4816e0 fffff800`03395416 : 00000000`032a0090 fffff800`00010400 fffff880`0d481870 00000000`00000000 : nt!ExpGetProcessInformation+0x7f
fffff880`0d481830 fffff800`03395e6d : 00000000`032a0090 fffff960`001555e3 00000000`00000005 00000000`0018f828 : nt!ExpQuerySystemInformation+0xfb4
fffff880`0d481be0 fffff800`03088e53 : fffffa80`08ab3340 00000000`7efdb000 00000000`00000020 00000000`00000000 : nt!NtQuerySystemInformation+0x4d
fffff880`0d481c20 00000000`7778161a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`0d481c20)
00000000`0008e218 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x7778161a
Processor #5 appears to be waiting to flush the write queue of each processor that is running a thread of the current process. The reason it cannot do this quite yet is evidently due to the fact that we have two processors waiting so far for Processor #0 to do its job, but it's not.
Let's now take a look at the problematic processor (#6):
Code:
6: kd> kv
Child-SP RetAddr : Args to Child : Call Site
00000000`00000000 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0
Code:
6: kd> r
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000
r8=0000000000000000 r9=0000000000000000 r10=0000000000000000
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up di pl nz na pe nc
cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000
00000000`00000000 ?? ???
We have a zerod stack + registers, so this will be problematic. Usually this occurs on the problem processor because the IRQL is too high,
OR the processor was too hung at the time of the crash to report its information, etc. We will need to get the raw stack.
With this said, we're going to need to dump the raw stack:
Code:
6: kd> !pcr
KPCR for Processor 6 at fffff880038f7000:
Major 1 Minor 1
NtTib.ExceptionList: fffff88003902640
NtTib.StackBase: fffff880038fc040
NtTib.StackLimit: 000000000038f0f8
NtTib.SubSystemTib: fffff880038f7000
NtTib.Version: 00000000038f7180
NtTib.UserPointer: fffff880038f77f0
NtTib.SelfTib: 000000007efad000
SelfPcr: 0000000000000000
Prcb: fffff880038f7180
Irql: 0000000000000000
IRR: 0000000000000000
IDR: 0000000000000000
InterruptMode: 0000000000000000
IDT: 0000000000000000
GDT: 0000000000000000
TSS: 0000000000000000
[COLOR=#ff0000] CurrentThread: fffff880039020c0[/COLOR]
NextThread: fffffa8005955750
IdleThread: fffff880039020c0
DpcQueue:
Code:
6: kd> !thread [COLOR=#ff0000]fffff880039020c0[/COLOR]
THREAD fffff880039020c0 Cid 0000.0000 Teb: 0000000000000000 Win32Thread: 0000000000000000 RUNNING on processor 6
Not impersonating
DeviceMap fffff8a000008ca0
Owning Process fffff80003213180 Image: Idle
Attached Process fffffa8004f03040 Image: System
Wait Start TickCount 0 Ticks: 802058 (0:03:28:32.184)
Context Switch Count 7064479 IdealProcessor: 6
UserTime 00:00:00.000
KernelTime 03:18:57.804
Win32 Start Address nt!KiIdleLoop (0xfffff80003081870)
Stack Init fffff8800391fdb0 Current fffff8800391fd40
Base [COLOR=#006400]fffff88003920000 [/COLOR]Limit [COLOR=#4b0082]fffff8800391a000 [/COLOR]Call 0
Priority 16 BasePriority 0 UnusualBoost 0 ForegroundBoost 0 IoPriority 0 PagePriority 0
Child-SP RetAddr : Args to Child : Call Site
00000000`00000000 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0
Code:
fffff880`0391ec68 fffff880`045982ac [COLOR=#ff0000]klim6+0x22ac[/COLOR]
fffff880`0391eca0 fffffa80`080ec800
fffff880`0391eca8 fffff880`01a024d4 ndis!NdisFSendNetBufferLists+0x64
fffff880`0391ecd0 fffffa80`092792a0
fffff880`0391ecd8 fffff880`01f68800 tcpip!Ipv6Global
fffff880`0391ece0 fffffa80`08d9f520
fffff880`0391ece8 fffff880`04571199 pacer!PcFilterSendNetBufferLists+0x29
fffff880`0391ed10 fffffa80`08b121a0
fffff880`0391ed18 fffff880`01a02419 ndis!ndisSendNBLToFilter+0x69
fffff880`0391ed20 00000000`00000028
fffff880`0391ed70 00000000`00000000
fffff880`0391ed78 fffff880`01abe5d5 ndis!NdisSendNetBufferLists+0x85
fffff880`0391ed80 00000000`00000000
fffff880`0391ede0 fffffa80`08b121a0
fffff880`0391ede8 fffff880`01a02419 ndis!ndisSendNBLToFilter+0x69
fffff880`0391edf0 00000000`0000d140
fffff880`0391edf8 00000000`00000003
fffff880`0391ee00 fffff800`0323f180 nt!MiSystemPteInfo
fffff880`0391ee08 00000000`00010600
fffff880`0391ee10 00000000`0000d140
fffff880`0391ee18 fffff800`030a46d7 nt!MmMapLockedPagesSpecifyCache+0x50c
fffff880`0391ee40 00000000`00000000
fffff880`0391ee48 fffff880`01abe5d5 ndis!NdisSendNetBufferLists+0x85
In the raw stack, we can see a lot of network stuff going on, topping off with a
klim6.sys call which is the Kaspersky Lab Intermediate Network driver. Kaspersky may very well be causing NETBIOS conflicts, and holding a lock, preventing the CPU from doing its work, and causing the rest to be held up. However, it may be false and just so happened to be in the stack.
As for Processor #7, we also had a zeroed stack, which is not good. When you have 2 zeroed cores on a processor, it's usually bad news.
Code:
fffff880`0398d968 fffff880`07acc6c1*** ERROR: Symbol file could not be found. Defaulted to export symbols for nvlddmkm.sys -
nvlddmkm+0xc76c1
fffff880`0398d970 00000000`00000000
fffff880`0398d978 fffff880`089513fd dxgmms1!VidSchiProcessIsrVSync+0xc9
fffff880`0398da00 fffff880`0398dc80
fffff880`0398da08 fffff880`08951083 dxgmms1!VidSchDdiNotifyInterruptWorker+0xef
fffff880`0398da50 fffff880`0398dc80
fffff880`0398da58 fffff880`08950f82 dxgmms1!VidSchDdiNotifyInterrupt+0x9e
fffff880`0398da80 fffff880`0398dc80
fffff880`0398da88 fffff880`0885813f dxgkrnl!DxgNotifyInterruptCB+0x83
fffff880`0398da90 fffffa80`06f26480
fffff880`0398da98 fffff880`0895769d dxgmms1!VidSchiUpdateCurrentIsrFrameTime+0x95
fffff880`0398daa0 00000000`02060000
fffff880`0398daa8 fffffa80`07075e28
fffff880`0398dab0 00000000`00000000
fffff880`0398dab8 fffff880`07acc6c1 nvlddmkm+0xc76c1
We have various DirectX Kernel/MMS calls, topping off with an nVidia video driver call.
1. Remove and replace Kaspersky with Microsoft Security Essentials for temporary troubleshooting purposes as it may very likely be causing NETBIOS conflicts:
Kaspersky removal - Service articles
MSE - Microsoft Security Essentials - Microsoft Windows
2. Ensure you have the latest video card drivers. If you are already on the latest video card drivers, uninstall and install a version or a few versions behind the latest to ensure it's not a latest driver only issue. If you have already experimented with the latest video card driver and many previous versions, please give the beta driver for your card a try.
3. If the above fails, please uninstall your video card drivers, and then physically remove the video card and use either integrated graphics, or a secondary video card (if available).
4. If the above fails, it's still a faulty processor to me given the two zeroed stacks.
Regards,
Patrick