DPC_WATCHDOG_TIMEOUT Debugging

Jared · Sep 14, 2014

Patrick made a post about this a while ago, I found a Kernel dump file about this which I thought was interesting.
I'm not going into detail about DPCs or interrupts as I have made blog posts on these in the past.

Code:

DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000000, [COLOR="#FF0000"]A single DPC or ISR exceeded its time allotment. The offending
	component can usually be identified with a stack trace.[/COLOR]
Arg2: 0000000000000[COLOR="#FF0000"]501[/COLOR], The DPC time count (in ticks).
Arg3: 0000000000000[COLOR="#800080"]500[/COLOR], The DPC time allotment (in ticks).
Arg4: 0000000000000000

So here it states that we encountered a DPC which exceeded the allocated time for it to finish executing. The problem is that it went over this time, and as stated before DPCs can hold up the system when taking too long to execute which can result in lagging, a slow system or even sound cutting out.

So lets look at our stack trace.

Code:

ffffd001`50c93c98 fffff800`9238bcc2 : 00000000`00000133 00000000`00000000 00000000`00000501 00000000`00000500 : [COLOR="#0000FF"]nt!KeBugCheckEx[/COLOR]
ffffd001`50c93ca0 fffff800`92271115 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff801`dceabf17 : nt! ?? ::FNODOBFM::`string'+0x18b12
ffffd001`50c93d30 fffff800`929a07b5 : ffffe001`00400a02 fffff800`922fcae6 fffff801`daed3cf8 ffffe001`00008201 : [COLOR="#800080"]nt!KeClockInterruptNotify+0x95[/COLOR]
ffffd001`50c93f40 fffff800`922e80e3 : ffffd001`50c93f60 00000000`00000008 ffff5377`5487cf7d 00000000`0000000c : [COLOR="#800080"]hal!HalpTimerClockIpiRoutine+0x15[/COLOR]
ffffd001`50c93f70 fffff800`9236412a : ffffe001`9c600500 ffffe001`9e8de1a0 00000000`00000000 00000000`00000000 : [COLOR="#800080"]nt!KiCallInterruptServiceRoutine+0xa3[/COLOR]
ffffd001`50c93fb0 fffff800`92364a9b : 44454c49`4146203a 696c6564`206f7420 6e657665`20726576 20212121`20352074 : [COLOR="#800080"]nt!KiInterruptSubDispatchNoLockNoEtw+0xea[/COLOR]
ffffd001`50c853a0 fffff800`922e8383 : ffffe001`9e92d030 ffffe001`9e968030 00000000`02290a8d 00000000`00000018 : [COLOR="#800080"]nt!KiInterruptDispatchNoLockNoEtw+0xfb[/COLOR]
ffffd001`50c85530 fffff801`dcfa5751 : ffffe001`9e66e7a0 ffffe001`00000000 ffffe001`9e92fbe0 00000000`fffff850 : [COLOR="#FF0000"]nt!KeAcquireSpinLockRaiseToDpc+0x13[/COLOR]
ffffd001`50c85560 fffff801`dcfa531d : ffffe001`9e96b840 fffff801`dcf2c48f ffffe001`9eb68490 fffff801`dcf2c550 : [COLOR="#FF0000"]athwbx+0x161751[/COLOR]
ffffd001`50c855f0 fffff801`dcf60c42 : ffffe001`9e96b840 ffffd001`50c85650 ffffd001`50c85654 00000000`00000000 : [COLOR="#FF0000"]athwbx+0x16131d[/COLOR]
ffffd001`50c85630 fffff801`dcf33472 : ffffe001`9e9bf030 fffff801`00000000 ffffd001`50c856d0 fffff801`dd074319 : [COLOR="#FF0000"]athwbx+0x11cc42[/COLOR]
ffffd001`50c85680 fffff801`dd0c129f : ffffe001`9e9bf030 ffffffff`ffffffff ffffe001`9e6d97e8 fffff801`dd011189 : [COLOR="#FF0000"]athwbx+0xef472[/COLOR]
ffffd001`50c856f0 fffff801`dd08679e : ffffe001`9e968030 00000000`00000000 00000000`00000000 00000000`00000000 : [COLOR="#FF0000"]athwbx+0x27d29f[/COLOR]
ffffd001`50c85720 fffff801`dae9e81e : ffffe001`9e961030 00000000`00000000 ffffd001`50c85790 00000000`00000000 : [COLOR="#FF0000"]athwbx+0x24279e[/COLOR]
ffffd001`50c85760 fffff800`92252130 : ffffd001`50c85b00 00000000`00000000 00000000`00000200 fffff800`92274ae0 : [COLOR="#FF0000"]ndis!ndisInterruptDpc+0x269ce[/COLOR]
ffffd001`50c85860 fffff800`9225134b : ffffd001`50c5c180 ffffe001`9e8f4010 ffffe001`9c46b900 ffffe001`a12f3080 : [COLOR="#008000"]nt!KiExecuteAllDpcs+0x1b0[/COLOR]
ffffd001`50c859b0 fffff800`923667ea : ffffd001`50c5c180 ffffd001`50c5c180 ffffd001`50c682c0 ffffe001`9dbbb540 : [COLOR="#008000"]nt!KiRetireDpcList+0xdb[/COLOR]
ffffd001`50c85c60 00000000`00000000 : ffffd001`50c86000 ffffd001`50c80000 00000000`00000000 00000000`00000000 : [COLOR="#808080"]nt!KiIdleLoop+0x5a[/COLOR]

So in this callstack we see our processor in an idle loop, when idle it tends to execute any DPCs if there are any waiting in the DPC queue.
It begins to execute all the DPCs in the queue (also known as draining) when get execute an ndis dpc interrupt, this begins to call network functions and then acquire a spinlock and raise to DPC/Dispatch IRQL level if it hasn't already (this is the standard routine that is used, I can't remember if it is required), we then recieve more interrupts followed by a clock interrupt and a bugcheck.

Okay so we know that we bugchecked because a DPC was taking too long to finish executing and risk holding up the system, especially where spinlocks are concerned.

The main thing that interests me is why is there a clock interrupt?

Code:

3: kd> [COLOR="#008000"]!dpcs[/COLOR]
CPU Type      KDPC       Function
 3: Normal  : 0xffffe0019e66e880 0xfffff801dae78eb0 [COLOR="#800080"]ndis!ndisMTimerObjectDpc[/COLOR]
 3: Normal  : 0xffffd00150c61668 0xfffff80092327b28 nt!PpmPerfAction
 3: Normal  : 0xffffd0015589a280 0xfffff80092258854 nt!PopExecuteProcessorCallback
 3: Threaded: 0xffffd00150c617c0 0xfffff8009231a0a0 nt!KiDpcWatchdog

I believe the ndis dpc interrupt is related to this timer object but I may be wrong, if it is related then the clock interrupt makes sense as the system requires intervals for clock interrupts to take place in order to keep track of system time and logical run time for threads and timers. Processes can modify the clock interrupt interval for their needs to process timers much quicker, I'll not go into detail as I will talk about timers another time.

The only problem is that I ran into a dead end, I couldn't find anything related to the network driver in terms of modifying the clock interrupt timer.

Code:

3: kd> !list "-e -x \"dt nt!_EPROCESS @$extret-@@(#FIELD_OFFSET(nt!_EPROCESS, TimerResolutionLink)) ImageFileName SmallestTimerResolution RequestedTimerResolution\" nt!ExpTimerResolutionListHead"
dt nt!_EPROCESS @$extret-@@(#FIELD_OFFSET(nt!_EPROCESS, TimerResolutionLink)) ImageFileName SmallestTimerResolution RequestedTimerResolution 
   +0x438 ImageFileName            : [15]  "???"
   +0x638 RequestedTimerResolution : 0x9c3d2000
   +0x63c SmallestTimerResolution  : 0xffffe001

dt nt!_EPROCESS @$extret-@@(#FIELD_OFFSET(nt!_EPROCESS, TimerResolutionLink)) ImageFileName SmallestTimerResolution RequestedTimerResolution 
   +0x438 ImageFileName            : [15]  "svchost.exe"
   +0x638 RequestedTimerResolution : 0
   +0x63c SmallestTimerResolution  : 0x2710

So I thought I'd look into this a bit more.

Code:

3: kd> [COLOR="#008000"]u ndis!ndisInterruptDpc+0x269ce[/COLOR]
ndis!ndisInterruptDpc+0x269ce:
fffff801`dae9e81e 488b75a7        mov     rsi,qword ptr [rbp-59h]
fffff801`dae9e822 e9e297fdff      jmp     [COLOR="#FF0000"]ndis!ndisInterruptDpc+0x1b9[/COLOR] (fffff801`dae78009)
fffff801`dae9e827 33d2            xor     edx,edx
fffff801`dae9e829 488d4dd7        lea     rcx,[rbp-29h]
fffff801`dae9e82d 448d420d        lea     r8d,[rdx+0Dh]
fffff801`dae9e831 e8160c0000      call    [COLOR="#800080"]ndis!ndisPcwEndCycleCounter [/COLOR](fffff801`dae9f44c)
fffff801`dae9e836 90              nop
fffff801`dae9e837 e9d797fdff      jmp     [COLOR="#FF0000"]ndis!ndisInterruptDpc+0x1c3[/COLOR] (fffff801`dae78013)

It appears the interrupt routine is looping for some reason.

I can't find anything on the cycle counter function as it is undocumented but I'll take a guess and say that it's keeping track of the time the interrupt has been executing, AFAIK this is don't by using a counter on the currently executing thread to see how long it's running.

Code:

3: kd> [COLOR="#008000"]u nt!KeAcquireSpinLockRaiseToDpc+0x13[/COLOR]
nt!KeAcquireSpinLockRaiseToDpc+0x13:
fffff800`922e8383 f605fcac270021  test    byte ptr [nt!PerfGlobalGroupMask+0x6 (fffff800`92563086)],21h
fffff800`922e838a 751f            jne     [COLOR="#FF0000"]nt!KeAcquireSpinLockRaiseToDpc+0x3b [/COLOR](fffff800`922e83ab)
fffff800`922e838c f0480fba2900    lock bts qword ptr [rcx],0
fffff800`922e8392 7209            jb      [COLOR="#FF0000"]nt!KeAcquireSpinLockRaiseToDpc+0x2d[/COLOR] (fffff800`922e839d)
fffff800`922e8394 0fb6c3          movzx   eax,bl
fffff800`922e8397 4883c420        add     rsp,20h
fffff800`922e839b 5b              pop     rbx
fffff800`922e839c c3              ret

Here we can see the same DPC interrupt routine trying to acquire a spinlock yet it's not managing to do it and therefore looping all whilst it is still running at DPC level and therefore preventing normal thread execution.

Eventually it seems it managed to acquire the spinlock and then call a clock interrupt in order to perform some operation, I suspect updating the system time in order to service the network driver with higher response times.
The system realised that it was taking too long to complete and therefore bugchecked.

Code:

3: kd> [COLOR="#008000"]lmvm athwbx[/COLOR]
start             end                 module name
fffff801`dce44000 fffff801`dd1ff000   athwbx     (no symbols)           
    Loaded symbol image file: athwbx.sys
    Image path: \SystemRoot\system32\DRIVERS\athwbx.sys
    Image name: athwbx.sys
    Timestamp:        [COLOR="#FF0000"]Thu Oct 17 10:46:01 2013[/COLOR] (525FB1D9)
    CheckSum:         003BC161
    ImageSize:        003BB000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

So the network driver was quite outdated, he updated it and the blue screens stopped so it looks like it was an easy fix.

If you have anything to add or correct then feel free to post but I think I've got it correct.
Thanks for reading.

DPC_WATCHDOG_TIMEOUT Debugging

Jared

Sysnative Staff, BSOD Kernel Dump Expert