- May 7, 2013
- 10,400
I actually debugged this particular crash a while ago now and the write up has been on my blog for a considerable amount of time (Jan 2021), although, I thought it would useful to have a copy within the Academy too.
The !dpcwatchdog command is used to dump the DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK structure as shown in the fourth parameter of the bugcheck.
Immediately, we can see which DPC took the longest to execute and therefore lead to the bugcheck. Unfortunately, in most cases, this almost always points to a generic system function and thus we need to do some further investigation work in order to find the true culprit.
As we can see from the call stack, we’re spending an arbitrarily large amount of time processing an I/O request. Since the request is created by a WDF driver, let’s examine the stack frame of the WdfRequestComplete function.
The WdfRequestComplete function takes two arguments: the completion status of the request and a handle to the WdfRequest object. As we can see quite clearly, the completion status which was passed is STATUS_CANCELLED.
Now, let’s dump the WdfRequest object.
Wait? Why didn’t the command work? The answer is due to the fact, the register contains a pointer rather than a handle which the command expects, and therefore we need to convert the pointer to a handle. Fortunately, there is a debugger command which will do this for us automatically. Let’s try again.
We now have the address of the handle we can now pass to the !wdfrequest debugger command.
The associated IRP is null, which makes sense, since when the WDF framework completes a request, it will dispose of the associated IRP as well. So, from what we’ve found so far, we know that a request was issued and then subsequently cancelled. Presumably, there was a timeout condition added to the request which caused the request to be cancelled after a certain amount of time has elapsed.
Let’s examine the IRP and see who may be responsible for holding it up.
From the above stack frame, we can see that the driver is checking to see if the request has completed yet and can be disposed of. Let’s dump the IRP and see who was involved.
As we can see, there is one third-party driver in the last I/O stack location, which then called the next function which see in our call stack. This the completion routine registered for the IRP for that particular driver. Since this is the last driver which touched the IRP before completion, I suspect that is the most likely culprit.
As we can see, the IRP attempts to be cancelled, however, the pending flag is set and the driver’s completion routine checks for this, and delays cancelling the IRP.
I’m not certain what has happened here, but it would seem that the pending returned flag has been set, however, the stack location becomes corrupt and leads to the IRP being in some limbo state unable to be returned to I/O Manager.
From what the user has mentioned, the issue appeared to lie with their webcam. After the webcam was replaced, the user has not experienced any more crashes.
Rich (BB code):
DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000000, A single DPC or ISR exceeded its time allotment. The offending
component can usually be identified with a stack trace.
Arg2: 0000000000000501, The DPC time count (in ticks).
Arg3: 0000000000000500, The DPC time allotment (in ticks).
Arg4: fffff806434fa320, cast to nt!DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK, which contains
additional information regarding this single DPC timeout
The !dpcwatchdog command is used to dump the DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK structure as shown in the fourth parameter of the bugcheck.
Rich (BB code):
0: kd> !dpcwatchdog
All durations are in seconds (1 System tick = 15.625000 milliseconds)
Circular Kernel Context Logger history: !logdump 0x2
DPC and ISR stats (total since boot): !intstats /d
DPC and ISR stats (during DPC watchdog period): !intstats /w
--------------------------------------------------
CPU#0
--------------------------------------------------
Current DPC: Wdf01000!FxDpc::FxDpcThunk (Normal DPC)
Debugger Saved IRQL: 13
Cumulative DPC Time Limit: 120.000 seconds
Current Cumulative DPC Time: 20.000 seconds
Single DPC Time Limit: 20.000 seconds
Current Single DPC Time: 20.016 seconds
dt nt!_ISRDPCSTATS fffff806434f38f0
IsrActive: TRUE
Pending DPCs:
----------------------------------------
CPU Type KDPC Function
0: Normal : 0xfffff80641226290 0xfffff80642a6e680 nt!PpmPerfAction
0: Normal : 0xffffa002729e6bd8 0xfffff80648b876f0 NDIS!ndisInterruptDpc
0: Normal : 0xffffa002726bf740 0xfffff80647da3850 Wdf01000!FxInterrupt::_InterruptDpcThunk
0: Normal : 0xffff84073bd85740 0xfffff80642a05010 nt!PopExecuteProcessorCallback
0: Normal : 0xffff84074118e740 0xfffff80642a05010 nt!PopExecuteProcessorCallback
0: Normal : 0xffffa0027226fe58 0xfffff8064849da84 iaStorA
0: Normal : 0xffffa002723a7998 0xfffff806490aeaa0 tcpip!TcpPeriodicTimeoutHandler
Immediately, we can see which DPC took the longest to execute and therefore lead to the bugcheck. Unfortunately, in most cases, this almost always points to a generic system function and thus we need to do some further investigation work in order to find the true culprit.
Rich (BB code):
0: kd> knL
# Child-SP RetAddr Call Site
00 fffff806`497fae18 fffff806`42c3a988 nt!KeBugCheckEx
01 fffff806`497fae20 fffff806`42a6f9a3 nt!KeAccumulateTicks+0x1c8788
02 fffff806`497fae80 fffff806`42a6f48a nt!KeClockInterruptNotify+0x453
03 fffff806`497faf30 fffff806`42b27ef5 nt!HalpTimerClockIpiRoutine+0x1a
04 fffff806`497faf60 fffff806`42bf722a nt!KiCallInterruptServiceRoutine+0xa5
05 fffff806`497fafb0 fffff806`42bf7797 nt!KiInterruptSubDispatchNoLockNoEtw+0xfa
06 fffff806`497ec0c0 fffff806`47eb1259 nt!KiInterruptDispatchNoLockNoEtw+0x37
07 fffff806`497ec250 fffff806`474e8347 WppRecorder!WppAutoLogTrace+0x129
08 fffff806`497ec2c0 fffff806`474e93b6 BTHUSB!WPP_RECORDER_AND_TRACE_SF_DDd+0xc3
09 fffff806`497ec330 fffff806`474e80c1 BTHUSB!BthUsb_ScoReadUrbDone+0x1c6
0a fffff806`497ec3e0 fffff806`474e915e BTHUSB!BthUsb_ScoUrbComplete+0xfd
0b fffff806`497ec450 fffff806`42a4885e BTHUSB!BthUsb_ScoReadTransferCallback+0xbe
0c fffff806`497ec4c0 fffff806`42a48727 nt!IopfCompleteRequest+0x11e
0d fffff806`497ec5b0 fffff806`47da811a nt!IofCompleteRequest+0x17 << Complete the IRP
0e (Inline Function) --------`-------- Wdf01000!FxIrp::CompleteRequest+0x13
<< Wrapper for IoCompleteRequest
0f fffff806`497ec5e0 fffff806`47da5bbf Wdf01000!FxRequest::CompleteInternal+0x23a
<< IOCTL Request
10 (Inline Function) --------`-------- Wdf01000!FxRequest::Complete+0x31
11 fffff806`497ec670 fffff806`5a382180 Wdf01000!imp_WdfRequestComplete+0x8f << Cancel I/O
12 fffff806`497ec6d0 fffff806`5a3809b7 USBXHCI!Isoch_Transfer_CompleteCancelable+0x214
13 fffff806`497ec730 fffff806`5a38162d USBXHCI!Isoch_CompleteTransfers+0x19b
14 fffff806`497ec790 fffff806`5a3824ba USBXHCI!Isoch_ProcessTransferCompletion+0xed
15 fffff806`497ec7f0 fffff806`47db940a USBXHCI!Isoch_WdfEvtDpcForTransferCompletion+0x2a << Driver's DPC callback function
16 (Inline Function) --------`-------- Wdf01000!FxDpc::DpcHandler+0x50
17 fffff806`497ec820 fffff806`42a0781e Wdf01000!FxDpc::FxDpcThunk+0x5a << Our DPC
18 fffff806`497ec860 fffff806`42a06b04 nt!KiExecuteAllDpcs+0x30e
19 fffff806`497ec9d0 fffff806`42bf92ee nt!KiRetireDpcList+0x1f4
1a fffff806`497ecc60 00000000`00000000 nt!KiIdleLoop+0x9e
As we can see from the call stack, we’re spending an arbitrarily large amount of time processing an I/O request. Since the request is created by a WDF driver, let’s examine the stack frame of the WdfRequestComplete function.
Rich (BB code):
0: kd> .frame /r 11
11 fffff806`497ec670 fffff806`5a382180 Wdf01000!imp_WdfRequestComplete+0x8f [minkernel\wdf\framework\shared\core\fxrequestapi.cpp @ 436]
rax=000000000509884d rbx=ffffa0027c621530 rcx=ffffa0027daa2ba0
rdx=0000000000000028 rsi=00000000c0000120 rdi=ffffa002727df020
rip=fffff80647da5bbf rsp=fffff806497ec670 rbp=0000000000000000
r8=ffffa0027ea22b4c r9=ffffa0027ea220a8 r10=ffffa0027ea22b74
r11=ffffa0027ea22000 r12=0000000000000001 r13=ffffa0027c6216d0
r14=00000000ffffffff r15=ffffa0027c621750
iopl=0 nv up ei pl zr na po nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000246
Wdf01000!imp_WdfRequestComplete+0x8f:
fffff806`47da5bbf 488b7c2470 mov rdi,qword ptr [rsp+70h] ss:0018:fffff806`497ec6e0=ffffa0027c6216d0
The WdfRequestComplete function takes two arguments: the completion status of the request and a handle to the WdfRequest object. As we can see quite clearly, the completion status which was passed is STATUS_CANCELLED.
Rich (BB code):
0: kd> !error 00000000c0000120
Error code: (NTSTATUS) 0xc0000120 (3221225760) - The I/O request was canceled.
Now, let’s dump the WdfRequest object.
Rich (BB code):
0: kd> !wdfrequest ffffa0027c621530
Treating handle as a UMDF handle!
UMDF host process not found in the module list! Either symbols for it are not loaded or the process context is not correct
Wait? Why didn’t the command work? The answer is due to the fact, the register contains a pointer rather than a handle which the command expects, and therefore we need to convert the pointer to a handle. Fortunately, there is a debugger command which will do this for us automatically. Let’s try again.
Rich (BB code):
0: kd> !wdfobject ffffa0027c621530
The type for object 0xffffa0027c621530 is FxRequest
State: FxObjectStateDisposed (0x2)
!wdfhandle 0x00005ffd839deac8
dt Wdf01000!FxRequest 0xffffa0027c621530
Contexts:
context: dt 0xffffa0027c6216d0 USBXHCI!REQUEST_DATA (size is 0x300 bytes)
<no associated attribute callbacks>
Owning device: !wdfdevice 0x00005ffd8d820fd8
We now have the address of the handle we can now pass to the !wdfrequest debugger command.
Rich (BB code):
0: kd> !wdfrequest 0x00005ffd839deac8
Treating handle as a KMDF handle!
!irp 0x0000000000000000
irp is NULL, the remaining results may not be correct(Reserved Requests may have a NULL IRP)...
State: Completed, Allocated by WDF for incoming IRP
The associated IRP is null, which makes sense, since when the WDF framework completes a request, it will dispose of the associated IRP as well. So, from what we’ve found so far, we know that a request was issued and then subsequently cancelled. Presumably, there was a timeout condition added to the request which caused the request to be cancelled after a certain amount of time has elapsed.
Let’s examine the IRP and see who may be responsible for holding it up.
Rich (BB code):
0: kd> .frame /r 0c
0c fffff806`497ec4c0 fffff806`42a48727 nt!IopfCompleteRequest+0x11e
rax=000000000509884d rbx=ffffa00280777010 rcx=ffffa0027daa2ba0
rdx=0000000000000028 rsi=0000000000000000 rdi=ffffa002807775f3
rip=fffff80642a4885e rsp=fffff806497ec4c0 rbp=fffff806497ec549
r8=ffffa0027ea22b4c r9=ffffa0027ea220a8 r10=ffffa0027ea22b74
r11=ffffa0027ea22000 r12=ffffffffc0000120 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl zr na po nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000246
nt!IopfCompleteRequest+0x11e:
fffff806`42a4885e 3d160000c0 cmp eax,0C0000016h
Rich (BB code):
0: kd> !error 0C0000016
Error code: (NTSTATUS) 0xc0000016 (3221225494) - {Still Busy} The specified I/O request packet (IRP) cannot be disposed of because the I/O operation is not complete.
From the above stack frame, we can see that the driver is checking to see if the request has completed yet and can be disposed of. Let’s dump the IRP and see who was involved.
Rich (BB code):
0: kd> !irp ffffa00280777010
Irp is active with 20 stacks 20 is current (= 0xffffa00280777638)
No Mdl: No System Buffer: Thread 00000000: Irp stack trace. Pending has been returned
cmd flg cl Device File Completion-Context
[N/A(0), N/A(0)]
0 2 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 ffffffffc0000120
[N/A(0), N/A(0)]
0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[...]
Args: 00000000 00000000 00000000 00000000
[N/A(0), N/A(0)]
0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[IRP_MJ_INTERNAL_DEVICE_CONTROL(f), N/A(0)]
0 2 ffffa002727dda80 00000000 00000000-00000000
\Driver\USBXHCI
Args: 00000000 00000000 0x0 00000000
[IRP_MJ_INTERNAL_DEVICE_CONTROL(f), N/A(0)]
0 0 ffffa002728b2dd0 00000000 00000000-00000000
\Driver\USBXHCI
Args: 00000000 00000000 0x0 00000000
[IRP_MJ_INTERNAL_DEVICE_CONTROL(f), N/A(0)]
0 0 ffffa002728b2dd0 00000000 00000000-00000000
\Driver\USBXHCI
Args: 00000000 00000000 0x0 00000000
[IRP_MJ_INTERNAL_DEVICE_CONTROL(f), N/A(0)]
0 0 ffffa0027cfd8de0 00000000 00000000-00000000
\Driver\USBHUB3
Args: 00000000 00000000 0x0 00000000
[IRP_MJ_INTERNAL_DEVICE_CONTROL(f), N/A(0)]
0 0 ffffa0027dae9030 00000000 fffff806474e90a0-ffffa0027b4376e0
\Driver\bcbtums BTHUSB!BthUsb_ScoReadTransferCallback
Args: 00000000 00000000 0x0 00000000
>[N/A(0), N/A(0)]
0 0 00000000 00000000 00000000-00000000
Args: ffffa00279892510 00000000 00000000 00000000
As we can see, there is one third-party driver in the last I/O stack location, which then called the next function which see in our call stack. This the completion routine registered for the IRP for that particular driver. Since this is the last driver which touched the IRP before completion, I suspect that is the most likely culprit.
As we can see, the IRP attempts to be cancelled, however, the pending flag is set and the driver’s completion routine checks for this, and delays cancelling the IRP.
Rich (BB code):
0: kd> dt _IO_STATUS_BLOCK ffffa00280777010+0x30
nt!_IO_STATUS_BLOCK
+0x000 Status : 0n-1073741536 << ffffffff`c0000120
+0x000 Pointer : 0x00000000`c0000120 Void
+0x008 Information : 0
Rich (BB code):
0: kd> !error c0000120
Error code: (NTSTATUS) 0xc0000120 (3221225760) - The I/O request was canceled.
I’m not certain what has happened here, but it would seem that the pending returned flag has been set, however, the stack location becomes corrupt and leads to the IRP being in some limbo state unable to be returned to I/O Manager.
From what the user has mentioned, the issue appeared to lie with their webcam. After the webcam was replaced, the user has not experienced any more crashes.