Hello everyone! In this post, we're going to be discussing
DPC_WATCHDOG_VIOLATION (0x133) bug check debugging. I'll go as in detail as I can regarding the different parameters, why the bug check itself occurs, why it came to be, etc, as my personal knowledge permits. As always, if you have any information to add that you see I may have missed, or a correction, please do
not hesitate to tell me!
Disclaimer: This is not a 'live' debugging tutorial, as in, it's a case I worked on weeks ago, and I no longer have access to the kernel-dump. The information I have here is purely what I managed to save for reference.
Before we get into the actual debugging of the bug check itself, let's talk about
why this bug check 'came to be', etc, so you have a little more than a basic understanding. I guess you could call it a little history lesson! This is something I believe I am going to try and do with all of my future debugging posts. Not only is it neat & fun for me to get to talk about it, but it's always great to learn not just how to solve the bugcheck, but
why it came to be an actual bugcheck.
Starting with Windows 8 (Server 2012, etc) we had the famous (or infamous depending on your stance with Windows 8!) touch-optimized user interface (GUI) that was based off of Microsoft's Metro. We all know the good & bad that has been said about Metro, so let's not get into that. Regardless, that's not what we're really here to discuss anyway. What we
can say regarding the Metro GUI is that is was developed primarily in mind to improve the tablet-based computing experience as Microsoft was competing with mobile OS' such as Android and iOS.
Now, with that said (we'll get back to it) let's move on to the Interrupt handler, or otherwise known as ISR (Interrupt Service Routine). In its simplest definition, this is a callback subroutine in microcontroller firmware, operating system or device driver whose execution is triggered by the reception of an interrupt. Interrupt handlers have a multitude of functions, which vary based on the reason the interrupt was generated and the speed at which the interrupt handler completes its task.
ISR runs at a very high level (15 levels on x64-based processor, and 32 levels on x86-based processor). Programs/Applications run at
0, DPCs run at
1, and interrupts will go ahead and fire at a higher level than that. Let's say for example that you have IRQ 4 and IRQ 5 waiting to be serviced. IRQ 5 holds a higher priority than IRQ 4 does, therefore it will shove its own function to the top of the stack and continue running. Afterwards, it will come back, unwind, and then IRQ 4 will continue having
no idea that it was stopped entirely. With this said, interrupts will interrupt each other (where the word
interrupt comes from).
ISRs will queue work, because hogging of the CPU is not good. Obviously the CPU needs to address all of the interrupts, work its assigned, etc, however, it needs to be 'fair'. This queued work is turned into what is known as a DPC (Deferred Procedure Call). This mechanism allows high-priority tasks to defer required but lower-priority tasks for later execution from the ISR. This permits device drivers and other low-level event consumers to perform the high-priority part of their processing quickly, and schedule non-critical additional processing for execution at a lower priority.
DPCs block
all applications from running (or being serviced by the CPU) as they take the absolute highest priority in being serviced. Absolutely no applications whatsoever (not even
explorer.exe) can run when a DPC is
PENDING and waiting to be serviced. Now, let's use a neat example for fun! I would go as far to say that if you've used a computer in your life for an extended period of time, you've at least
once had a system hang, but you were still able to move your mouse cursor. However, even though you were able to use your mouse cursor,
NO applications worked. You weren't able to
Ctrl+Shift+Esc to bring up Task Manager, you weren't able to do anything at all regarding any applications. I always laugh when I think of this, because I just imagine tons and tons of frustration and attempting to open Task Manager over and over again, or trying to open the Start Menu to Restart the system.
Anyway, the reason no applications work, but the mouse cursor still does and is not frozen with the applications, is because the mouse cursor is actually an interrupt that's firing a scheduled DPC. With this said, we still have the drawing of the mouse cursor. So, with that said, if you've ever wondered why your system froze but you still had a moving mouse cursor, now you know why. Some pretty fun food for thought, I think!
Now that we're (hopefully) smiling about mouse cursors still moving during DPC hangs, we can return back to why we discussed Metro as a GUI. Remember, as I said, Metro was developed primarily in mind to improve the tablet-based computing experience. Now, on a tablet-based device, you don't have a keyboard. Okay, we don't have a keyboard on a tablet, big deal. Well, because we don't have a keyboard, if a DPC hang occurs, we cannot force a system crash (and/or break-out) to call a bugcheck and ultimately safely bring down the system. You may be asking "Couldn't I just power down the tablet"? Yes, absolutely. However, if you power down the tablet, you risk data corruption.
-- If you're interested in reading about forcing a bugcheck with the keyboard, take a look at this article - Forcing a System Crash from the Keyboard (Windows Debuggers)
Due to the inability to force a bugcheck if a DPC hang occurs on a tablet-based device due to the lack of a keyboard, the wonderful
DPC_WATCHDOG_VIOLATION bug check was created. If we didn't have this bug check, if a DPC hang occurred on your tablet-based device, you'd either inevitably wait for the device to lose battery and simply turn off because it'd hang forever, or the DPC(s)
MIGHT (depending - and I use might and depending very loosely) be serviced and Windows will continue.
To my knowledge, the rules that apply to calling the bugcheck in a DPC hang-based scenario are as follows:
1. If one specific DPC has been running for ~10 seconds, the system will bugcheck.
2. If an accumulation of DPCs have been running for ~15 seconds, the system will bugcheck.
(Microsoft recommends that DPCs should not run longer than 100 microseconds and ISRs should not run longer than 25 microseconds, however the actual timeout values on the system are set much higher).
-- I'd also like to go and make note that depending on your tablet device, there are sometimes alternative methods of performing an ACPI 'reset' by usually hitting two buttons simultaneously.
Fantastic! Now that we know all of the cool information above, let's proceed with the debugging. Remember, as with all bug checks, there is no same culprit. You will see 0x133's being caused by all sorts of things! With this said, just because you see what you see here in my debugging example, doesn't mean you'll see it in another! This is not a 'how to solve' tutorial, but more-so a look into how to generally debug this type of bug check.
Code:
DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000000, A single DPC or ISR exceeded its time allotment. The offending
component can usually be identified with a stack trace.
Arg2: 0000000000000501, The DPC time count (in ticks).
Arg3: 0000000000000500, The DPC time allotment (in ticks).
^^ Here we have the basic bug check information. First off, the
DPC_WATCHDOG_VIOLATION bug check can be triggered in two ways:
1. If a single DPC exceeds a specified number of ticks, the system will bugcheck with 0x133, and the first parameter of the bug check will be set to
0. If this is the case, the system's time limit for single DPC will be in the third parameter of the bug check, with the number of ticks taken by this DPC in the 2nd parameter of the bug check.
2. If the system exceeds a larger timeout of time spent cumulatively in all DPCs since the IRQL was raised to DPC level, the system will bugcheck with a 0x133, and the first parameter will be set to
1.
BugCheck 133, {
0,
501,
500, 0}
^^ In this case, the first parameter = 0. We'd refer to #1. This specific DPC has run for 0x501 ticks, when the limit was 0x500.
In the case of a stop 0x133 with the first parameter set to 0, the call stack should contain the offending driver. Rather than running a
kv as we usually do, let's go ahead and run
knL:
Code:
0: kd> knL
# Child-SP RetAddr Call Site
00 fffff803`a2459408 fffff803`a044bf4b nt!KeBugCheckEx
01 fffff803`a2459410 fffff803`a0310774 nt! ?? ::FNODOBFM::`string'+0x145a4
02 fffff803`a2459490 fffff803`a0228eca nt!KeUpdateTime+0x2ec
03 fffff803`a2459670 fffff803`a02c501e hal!HalpTimerClockInterrupt+0x86
04 fffff803`a24596a0 fffff880`18564dd2 nt!KiInterruptDispatchLBControl+0x1ce
05 fffff803`a2459830 fffff803`a02f51ea usb80236!CancelSendsTimerDpc+0xa6
06 fffff803`a2459870 fffff803`a02f3655 nt!KiProcessExpiredTimerList+0x22a
07 fffff803`a24599a0 fffff803`a02f5668 nt!KiExpireTimerTable+0xa9
08 fffff803`a2459a40 fffff803`a02f4a06 nt!KiTimerExpiration+0xc8
09 fffff803`a2459af0 fffff803`a02f59ba nt!KiRetireDpcList+0x1f6
0a fffff803`a2459c60 00000000`00000000 nt!KiIdleLoop+0x5a
k = Displays the call stack of the given thread.
n = Displays frame numbers.
L (important you capitalize it) = Hides source lines in the display.
This makes a nice, neat, and informative call stack for 0x133 debugging.
As we can see above in the call stack, the offending driver is
usb80236.sys, which is the Remote NDIS USB Driver. It calls into
KiInterruptDispatchLBControl.
Let’s view the driver’s unassembled DPC routine:
Code:
0: kd> ub fffff880`18564dd2
usb80236!CancelSendsTimerDpc+0x81:
fffff880`18564dad 488bcd mov rcx,rbp
fffff880`18564db0 c6465001 mov byte ptr [rsi+50h],1
fffff880`18564db4 ff1556230000 call qword ptr [usb80236!_imp_KeReleaseSpinLockFromDpcLevel (fffff880`18567110)]
fffff880`18564dba 488bcb mov rcx,rbx
fffff880`18564dbd ff15a5220000 call qword ptr [usb80236!_imp_IoCancelIrp (fffff880`18567068)]
fffff880`18564dc3 488bcd mov rcx,rbp
fffff880`18564dc6 ff15e4220000 call qword ptr [usb80236!_imp_KeAcquireSpinLockAtDpcLevel (fffff880`185670b0)]
fffff880`18564dcc ff8708030000 inc dword ptr [rdi+308h]
1. The driver started by calling the
KeAcquireSpinLockAtDpcLevel routine which acquires a spin lock when the caller is already running at
IRQL >= DISPATCH_LEVEL.
The caller should release the spin lock with
KeReleaseSpinLockFromDpcLevel as quickly as possible.
2. The driver then called the
IoCancelIrp routine which sets the cancel bit in a given IRP and calls the cancel routine for the IRP if there is one.
-- If the IRP has a cancel routine,
IoCancelIrp sets the cancel bit and calls the cancel routine.
3. The driver then called the
KeReleaseSpinLockFromDpcLevel routine (as mentioned above) which releases an executive spin lock without changing the IRQL.
Going a bit deeper...
Code:
0: kd> u fffff880`18564dd2
usb80236!CancelSendsTimerDpc+0xa6:
fffff880`18564dd2 488b36 mov rsi,qword ptr [rsi]
fffff880`18564dd5 493bf6 cmp rsi,r14
fffff880`18564dd8 75a9 jne usb80236!CancelSendsTimerDpc+0x57 (fffff880`18564d83)
fffff880`18564dda 4c8d87b0020000 lea r8,[rdi+2B0h]
fffff880`18564de1 488d8f70020000 lea rcx,[rdi+270h]
fffff880`18564de8 48c7c2001f0afa mov rdx,0FFFFFFFFFA0A1F00h
fffff880`18564def ff1553220000 call qword ptr [usb80236!_imp_KeSetTimer (fffff880`18567048)]
fffff880`18564df5 eb12 jmp usb80236!CancelSendsTimerDpc+0xdd (fffff880`18564e09)
1. The driver called the
CancelSendsTimerDpc routine. I do not know exactly what this routine does, however, it's certainly something in regards to a timer on and/or for a DPC. According to Harry (x BlueRobot), he believes that the driver may use a Custom DPC associated with a Timer object.
2. The driver then calls the
KeSetTimer routine which sets the absolute or relative interval at which a timer object is to be set to a signaled state and, optionally, supplies a
CustomTimerDpc routine to be executed when that interval expires.
3. The driver then calls the
CancelSendsTimerDpc routine again. As far as I know, what should be going on here is the
CustomTimerDpc routine should be called, but
CancelSendsTimerDpc may be in a loop.
Overall, what seems to be occurring is the DPC may be looping itself by gathering a Spinlock at DPC Level, cancelling the Timer, and then finally releasing the Spinlock again. This is happening over and over again, therefore we have a loop.
If we run a !pcr to show us the queued DPCs for the processor:
-- Essentially, !pcr will display the current status of the Processor Control Region (PCR) on a specific processor.
Code:
0: kd> !pcr
KPCR for Processor 0 at fffff803a056a000:
Major 1 Minor 1
NtTib.ExceptionList: fffff803a2452000
NtTib.StackBase: fffff803a2453080
NtTib.StackLimit: 0000000004ccee48
NtTib.SubSystemTib: fffff803a056a000
NtTib.Version: 00000000a056a180
NtTib.UserPointer: fffff803a056a7f0
NtTib.SelfTib: 000000007ef44000
SelfPcr: 0000000000000000
Prcb: fffff803a056a180
Irql: 0000000000000000
IRR: 0000000000000000
IDR: 0000000000000000
InterruptMode: 0000000000000000
IDT: 0000000000000000
GDT: 0000000000000000
TSS: 0000000000000000
CurrentThread: fffff803a05c4880
NextThread: fffffa80036e8040
IdleThread: fffff803a05c4880
DpcQueue: 0xfffffa80073595e0 0xfffff8800400c960 [Normal] dxgkrnl!DpiFdoDpcForIsr
0xfffffa8007be9b68 0xfffff88001efb380 [Normal] ndis!ndisInterruptDpc
0xfffff803a04fcfe0 0xfffff803a027c71c [Normal] nt!PpmCheckPeriodicStart
0xfffff803a0545d60 0xfffff803a031e45c [Normal] nt!KiBalanceSetManagerDeferredRoutine
1. ndis.sys:
(Network Driver Interface Specification driver) interrupt routine. The Network Driver Interface Specification (NDIS) is an application programming interface (API) for network interface cards (NICs). The NDIS forms the Logical Link Control (LLC) sublayer, which is the upper sublayer of the OSI data link layer (layer 2). Therefore, the NDIS acts as the interface between the Media Access Control (MAC) sublayer, which is the lower sublayer of the data link layer, and the network layer (layer 3).
The NDIS is a library of functions often referred to as a "wrapper" that hides the underlying complexity of the NIC hardware and serves as a standard interface for level 3 network protocol drivers and hardware level MAC drivers. Another common LLC is the Open Data-Link Interface (ODI).
2. dxgkrnl.sys - Direct X Kernel.
So, with all of this said, we know that something is causing
usb80236.sys to call into a loop, and it may be anything that's working with and/or possibility interfering with Windows' networking, or Direct X. We'll need to do some detective work to determine what is causing this, as it's a system driver and is being faulted by something else. At this point, since we're at quite the wall, I recommend enabling Driver Verifier so we can see what's going on. The user enabled DV, and sure enough, they had an 0xC4 crash! Let's have a look below!
Code:
DRIVER_VERIFIER_DETECTED_VIOLATION (c4)
A device driver attempting to corrupt the system has been caught. This is
because the driver was specified in the registry as being suspect (by the
administrator) and the kernel has enabled substantial checking of this driver.
If the driver attempts to corrupt the system, bugchecks 0xC4, 0xC1 and 0xA will
be among the most commonly seen crashes.
Arguments:
Arg1: 0000000000001011, Invariant MDL buffer contents for Read Irp were modified during dispatch or buffer backed by dummy pages.
Arg2: fffffa8006219060, Device object to which the Read IRP was issued.
Arg3: fffff980098a8c60, The address of the IRP.
Arg4: fffff8801a5b3000, System-Space Virtual Address for the buffer that the MDL describes.
Here we have the basic bug check info, with the second/third parameter highlighted as they will be useful later on. Let's go ahead and take a look at the call stack first:
Code:
0: kd> kv
Child-SP RetAddr : Args to Child : Call Site
fffff880`1af96888 fffff803`07a31fcc : 00000000`000000c4 00000000`00001011 fffffa80`06219060 fffff980`098a8c60 : nt!KeBugCheckEx
fffff880`1af96890 fffff803`07ebfa51 : 00000000`00000000 fffff880`1af96a00 fffff980`098a8c60 fffff803`07cba000 : nt!MdlInvariantPreProcessing1+0x200
fffff880`1af96900 fffff803`07ebdc51 : 00000000`00000000 fffffa80`0dd877d0 fffffa80`06218060 00000000`00000001 : nt!IovpCallDriver1+0x1cd
fffff880`1af96a60 fffff803`07eb4cde : fffff980`098a8c60 00000000`00000002 fffff880`1af96c40 00000000`c0000089 : nt!VfBeforeCallDriver+0x141
fffff880`1af96a90 fffff880`012ccdab : fffff980`098a8c60 fffffa80`06219060 fffffa80`062191b0 fffffa80`0dd877d0 : nt!IovCallDriver+0x35e
fffff880`1af96ae0 fffff880`012ccb7e : fffff980`098a8c60 fffff880`1af96ba0 fffff980`098a8e50 fffffa80`06219060 : intmsd+0x3dab
fffff880`1af96b20 fffff880`012d4402 : 00000000`00000000 fffff880`1af96c40 00000000`00000000 fffff803`07928d44 : intmsd+0x3b7e
fffff880`1af96b60 fffff880`012ccaf3 : fffff980`098a8c60 fffffa80`062191b0 fffffa80`06219060 00000000`00000000 : intmsd+0xb402
fffff880`1af96bf0 fffff803`07eb4d66 : fffff980`098a8c60 00000000`00000002 fffffa80`0bf1b790 fffffa80`06219b10 : intmsd+0x3af3
fffff880`1af96cc0 fffff803`07eb4d66 : fffff980`098a8c60 fffffa80`06219b10 00000000`00000002 fffffa80`0bf1b790 : nt!IovCallDriver+0x3e6
fffff880`1af96d10 fffff880`0127b14e : fffffa80`05356de0 fffffa80`05356c90 00000000`00000002 fffffa80`0d745ba0 : nt!IovCallDriver+0x3e6
fffff880`1af96d60 fffff803`07eb4d66 : 00000000`00000002 fffff980`098a8c60 fffffa80`05356c90 fffffa80`0dd66e00 : volmgr!VmReadWrite+0x13e
fffff880`1af96da0 fffff880`01f63faa : fffff980`098a8c60 00000001`94c3f000 fffff980`098a8c60 fffffa80`0dd66e00 : nt!IovCallDriver+0x3e6
fffff880`1af96df0 fffff880`01f64236 : fffff980`098a8c60 00000001`94c3f000 fffff980`098a8c60 fffff803`07ebdc51 : fvevol!FveReadWrite+0x3e
fffff880`1af96e30 fffff803`07eb4d66 : fffffa80`06227040 00000000`00000002 fffffa80`0d9eee40 fffff880`01c18329 : fvevol!FveFilterRundownReadWrite+0x1b6
fffff880`1af96e80 fffff880`01c01af2 : fffffa80`06228190 fffffa80`06228040 00000000`00000002 fffffa80`0d9eee40 : nt!IovCallDriver+0x3e6
fffff880`1af96ed0 fffff803`07eb4d66 : fffff980`098a8c60 00000000`00000002 00000000`00000030 00000000`00000000 : volsnap!VolSnapReadFilter+0x112
fffff880`1af96f00 fffff880`01817b69 : fffff880`1af79828 fffff880`1af798f0 fffffa80`0c543080 fffffa80`045faf40 : nt!IovCallDriver+0x3e6
fffff880`1af96f50 fffff803`078c8b67 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000001 : Ntfs!NtfsStorageDriverCallout+0x16
fffff880`1af96f80 fffff803`078c8b2d : 00000000`00000000 00000000`00000000 00000000`00000002 fffff803`07923ad8 : nt!KxSwitchKernelStackCallout+0x27 (TrapFrame @ fffff880`1af96e40)
fffff880`1af79690 fffff803`07923ad8 : 00000000`00000006 00000000`00000000 00000000`00000006 00000000`00000000 : nt!KiSwitchKernelStackContinue
fffff880`1af796b0 fffff803`07926405 : fffff880`01817b54 fffff880`1af79828 00000000`00055000 00000000`00000001 : nt!KeExpandKernelStackAndCalloutInternal+0x218
fffff880`1af797b0 fffff880`01817aac : fffff880`1af79c60 fffff980`098a8c60 fffff8a0`01299b90 00000000`0005d000 : nt!KeExpandKernelStackAndCalloutEx+0x25
fffff880`1af797f0 fffff880`0181646a : fffff880`1af798a0 fffff880`1af79c60 fffff8a0`01299b90 fffff880`018770b0 : Ntfs!NtfsMultipleAsync+0xac
fffff880`1af79860 fffff880`01825b26 : 00000000`00000000 fffff880`1af79df0 fffff8a0`01299be8 fffff8a0`01299b90 : Ntfs!NtfsNonCachedIo+0x26a
fffff880`1af79a70 fffff880`0182742b : fffff880`1af79c60 fffff980`098a8c60 fffffa80`0c169701 00000000`00000001 : Ntfs!NtfsCommonRead+0x896
fffff880`1af79c30 fffff803`07eb4d66 : fffff980`098a8c60 fffff980`098a8c60 00000000`00000002 fffffa80`0dd800c0 : Ntfs!NtfsFsdRead+0x1db
fffff880`1af79e70 fffff880`013924ee : fffffa80`0fb303f0 fffff880`1af79f00 fffff980`098a8c60 fffffa80`0dd800c0 : nt!IovCallDriver+0x3e6
fffff880`1af79ec0 fffff880`013900b6 : fffffa80`0538bb00 00000000`00000002 fffff980`098a8c60 fffffa80`0d579e28 : FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x25e
fffff880`1af79f60 fffff803`07eb4d66 : fffff980`098a8c60 00000000`00000002 fffffa80`0d5f47b0 fffffa80`0398e598 : FLTMGR!FltpDispatch+0xb6
fffff880`1af79fc0 fffff803`0794b43e : fffff980`098a8c60 00000000`00000000 fffffa80`0d5f4878 fffffa80`0d579d80 : nt!IovCallDriver+0x3e6
fffff880`1af7a010 fffff803`0794fa61 : fffffa80`0d5f47b0 fffffa80`0d5f47d8 fffffa80`0d5f4818 fffffa80`0d5f4808 : nt!IoPageRead+0x21e
fffff880`1af7a060 fffff803`0794ab20 : 00000000`00000002 fffff880`1af7a0d0 fffffa80`0c543080 fffffa80`0d5f47b0 : nt!MiIssueHardFaultIO+0xc9
fffff880`1af7a0a0 fffff803`07908d8f : fffffa80`0c543080 fffff803`07bc5f40 00000000`c0033333 00000000`00000000 : nt!MiIssueHardFault+0x170
fffff880`1af7a130 fffff803`0792369b : 00000000`00000000 fffffa80`0c543080 8de00001`15efb900 00000000`00000000 : nt!MmAccessFault+0x81f
fffff880`1af7a270 fffff803`07939a37 : 00000000`00000000 00000000`00051000 00000000`00055000 fffff880`1af7a438 : nt!MmCheckCachedPageStates+0x8db
fffff880`1af7a400 fffff803`07938e62 : fffffa80`0c16b010 0000003c`a4737c90 fffff880`1af7a550 fffff880`00000000 : nt!CcMapAndCopyInToCache+0x397
fffff880`1af7a4f0 fffff880`018d4070 : 00000000`000551b8 fffffa80`0538bb00 fffffa80`0c5acbe8 fffffa80`0c1697b0 : nt!CcCopyWriteEx+0x1b2
fffff880`1af7a590 fffff880`01394415 : 00000000`00000000 00000000`000002b0 fffff880`1af7a9b0 00000000`00000000 : Ntfs!NtfsCopyWriteA+0x290
fffff880`1af7a7f0 fffff880`01394b53 : fffff880`1af7a8e0 00000000`00000000 fffffa80`0c169700 fffffa80`0c5aca90 : FLTMGR!FltpPerformFastIoCall+0x155
fffff880`1af7a850 fffff880`013bcabd : 00000000`000000d8 00000000`00000000 00000000`00000000 00000000`00000000 : FLTMGR!FltpPassThroughFastIo+0xc3
fffff880`1af7a8b0 fffff803`07caa249 : fffffa80`0c1697b0 00000000`00000000 00000000`00000000 00000000`00000000 : FLTMGR!FltpFastIoWrite+0x19d
fffff880`1af7a960 fffff803`078cd453 : fffffa80`0c543001 00000000`00000000 00000000`00000000 00000000`00000000 : nt!NtWriteFile+0x5b8
fffff880`1af7aa90 000007f8`a88c2c6a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`1af7ab00)
0000003c`a744fa28 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x000007f8`a88c2c6a
As we can see, we have
many different file system related routines (Ntfs, FLTMGR, etc). Why? Well, as we move up the stack, we eventually see four
intmsd.sys calls. This is the IntelliMemory Storage Filter Driver from Condusiv Technologies.
Now, let's go ahead and refer to the 2nd/3rd parameters as I've highlighted above!
If we run an !irp on the 3rd parameter:
Code:
[ 3, 0] 10 e0 fffffa8006219060 00000000 fffff880012ccbc8-fffffa8006219060 Success Error Cancel
\Driver\intmsd intmsd
Args: 00008000 00000000 1cec3f000 00000000
>[ 3, 0] 10 e1 fffffa8006219060 00000000 fffff880012183c0-00000000 Success Error Cancel pending
\Driver\intmsd partmgr!PmIoCompletion
If we run a !devobj on the 2nd parameter:
Code:
0: kd> !devobj fffffa8006219060
Device object (fffffa8006219060) is for:
intmsd0 \Driver\intmsd DriverObject fffffa80049241b0
Current Irp 00000000 RefCount 0 Type 00000007 Flags 00000850
Vpb fffffa800534d8f0 Dacl fffff9a10052d360 DevExt fffffa80062191b0 DevObjExt fffffa8006219930 Dope fffffa800534d880
ExtensionFlags (0x80000800) DOE_DEFAULT_SD_PRESENT, DOE_DESIGNATED_FDO
Characteristics (0x00000100) FILE_DEVICE_SECURE_OPEN
AttachedDevice (Upper) fffffa8006219b10 \Driver\partmgr
AttachedTo (Lower) fffffa8006218060 \Driver\disk
Device queue is not busy.
That's why we're seeing so many file system and storage related routines being called. After this was found, I recommend disabling and/or preferably uninstalling IntelliMemory. After uninstalling IntelliMemory, the crashes ceased. Why?
First off, IntelliMemory is an intelligent data caching technology that provides faster access to frequently used files. IntelliMemory is supposed to improve latency and throughput by reducing disk I/O requests as active files are predicatively cached within the server to preempt round trips between VMs and network storage.
Remember how we saw various network related routines, etc, during the 0x133 debugging? Well, it's because IntelliMemory was the driver that was causing the loop.
I hope this helps!
-- I will also discuss scenario #2 (when the first parameter is set to
1) when I have debugged such a situation myself!