Vir Gnarus
BSOD Kernel Dump Expert
- Mar 2, 2012
- 474
This is a simple article for those unfamiliar with 0x9F crashdumps, aka DRIVER_POWER_STATE_FAILURE.
9F bugchecks are particularly easy diagnose in most cases. Usually they involve a subcode of 3, which is a device object has been holding up an IRP for too long and things aren't moving forward. I could go into details about what an IRP is or a device object but I'll leave that for another scenario (I might explain here via request). If you want to dive into it, check out Chapter 7 of Windows Internals book, entitled, I/O System. It'll give you the beef on IRPs, IRQLs, etc. Anyways, let's take an example case:
Ok, so right now the analysis engine blames usbccgp. This is an MS standard driver for USB, so we figure this is rather inaccurate. Instead, let's look at the IRP that it's complaining is being held up. Fortunately it retained the address of it as Arg 4 of the bugcheck, so take that and use the !irp extension to get the nitty gritty on it:
Viola. So we see a third-party driver being suspect here. The carrot (>) symbol shows what was active at the time of the crash. To add, the symbols error actually helps us determine the exact driver name that is holding things up. In a kernel dump, you could also use !devobj on Arg 2 as well as !devstack on the same argument to get more info on it, but minidumps only have partial information and would not help us much.
Also, for those perceptive, you'll realize our answer has also been given to us via the bucket ID in the !analyze -v output, despite the engine thinking it's usbccgp:
So there you go. Quite a quick one, eh? I'll explain a bit more on what is actually going on here if people are interested. Have fun!
Bonus:
for another far more interesting and exciting adventure into IRPs that are being held up for whatever reason, you wanna take a look at this thread that was the result of assistance offered by a Mr. Snrub and our good ole friend Cluberti. This is a very good example of how easy it is to run far off track from the real cause by not interpreting all the data available.
It starts with a misconception that the drive is bad because there's an I/O error that is occurring during a paging operation. Commonly, it is accurate to say that this is caused by some sort of hardware failure, especially attributing it to a disk or disk controller problem. That's because an operation during a paging operation involves shoving stuff from memory into disk (specifically the paging file). However, Mr. Snrub catches that the subcode for the error is actually a resource exhaustion error. He and Cluberti then work together with the client to discover what resource is exhausted, and finds nonpaged pool being exhausted (note that this is dealing with Windows XP x86, which puts a restrictive hard limit on nonpaged pool, compared to its newer x64 brethren). Then they work through it to find what's filling up nonpaged pool, and ends up discovering that unfinished IRPs, a lot of them, are filling it up. This is where Cluberti comes in and starts diving through the IRPs to conclude that an outdated AV driver wasn't finishing its job, and therefore the IRPs waiting on it keeps piling up till there's no room left. It's definitely worth a read through the whole thing, and is just overall good solid debugging put to use.
I want to finish by mentioning that this could not be possible through minidumps. You're lucky to even get a single IRP out a minidump let alone other information like memory allocations and whatnot. It is imperative that if you look through a minidump and find hints of something that needs further investigating, don't hesitate to ask for a kernel dump!
Btw, for those who haven't already read Mark Russinovich's blog article on nonpaged pool and are too lazy to check my link on BSOD Method & Tips, it's available here.
9F bugchecks are particularly easy diagnose in most cases. Usually they involve a subcode of 3, which is a device object has been holding up an IRP for too long and things aren't moving forward. I could go into details about what an IRP is or a device object but I'll leave that for another scenario (I might explain here via request). If you want to dive into it, check out Chapter 7 of Windows Internals book, entitled, I/O System. It'll give you the beef on IRPs, IRQLs, etc. Anyways, let's take an example case:
Code:
0: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
DRIVER_POWER_STATE_FAILURE (9f)
A driver is causing an inconsistent power state.
Arguments:
Arg1: 0000000000000003, A device object has been blocking an Irp for too long a time
Arg2: fffffa800ba6c060, Physical Device Object of the stack
Arg3: fffff80005cc23d8, Functional Device Object of the stack
Arg4: fffffa800fa37b40, The blocked IRP
Debugging Details:
------------------
DRVPOWERSTATE_SUBCODE: 3
IMAGE_NAME: usbccgp.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 4d8c0aea
MODULE_NAME: usbccgp
FAULTING_MODULE: fffff88008200000 usbccgp
CUSTOMER_CRASH_COUNT: 1
DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
BUGCHECK_STR: 0x9F
PROCESS_NAME: System
CURRENT_IRQL: 2
STACK_TEXT:
fffff800`05cc2388 fffff800`030eacd2 : 00000000`0000009f 00000000`00000003 fffffa80`0ba6c060 fffff800`05cc23d8 : nt!KeBugCheckEx
fffff800`05cc2390 fffff800`030885fc : fffff800`05cc24d8 fffff800`05cc24d8 00000000`00000000 00000000`00000009 : nt! ?? ::FNODOBFM::`string'+0x34a90
fffff800`05cc2430 fffff800`03088496 : fffffa80`0b782148 fffffa80`0b782148 00000000`00000000 00000000`00000000 : nt!KiProcessTimerDpcTable+0x6c
fffff800`05cc24a0 fffff800`0308837e : 00000016`cc7d230b fffff800`05cc2b18 00000000`000993e8 fffff800`031f6f88 : nt!KiProcessExpiredTimerList+0xc6
fffff800`05cc2af0 fffff800`03088167 : 00000004`e1eef2cb 00000004`000993e8 00000004`e1eef22f 00000000`000000e8 : nt!KiTimerExpiration+0x1be
fffff800`05cc2b90 fffff800`0307496a : fffff800`031f2e80 fffff800`03200cc0 00000000`00000002 fffff880`00000000 : nt!KiRetireDpcList+0x277
fffff800`05cc2c40 00000000`00000000 : fffff800`05cc3000 fffff800`05cbd000 fffff800`05cc2c00 00000000`00000000 : nt!KiIdleLoop+0x5a
STACK_COMMAND: kb
FOLLOWUP_NAME: MachineOwner
FAILURE_BUCKET_ID: X64_0x9F_3_cxcir64_IMAGE_usbccgp.sys
BUCKET_ID: X64_0x9F_3_cxcir64_IMAGE_usbccgp.sys
Followup: MachineOwner
Ok, so right now the analysis engine blames usbccgp. This is an MS standard driver for USB, so we figure this is rather inaccurate. Instead, let's look at the IRP that it's complaining is being held up. Fortunately it retained the address of it as Arg 4 of the bugcheck, so take that and use the !irp extension to get the nitty gritty on it:
Code:
0: kd> !irp fffffa800fa37b40
Irp is active with 8 stacks 6 is current (= 0xfffffa800fa37d78)
No Mdl: No System Buffer: Thread 00000000: Irp stack trace.
cmd flg cl Device File Completion-Context
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[B]>[ 16, 2] 0 e0 fffffa800ba6d060 00000000 fffff880082362ec-fffffa800ba6d1b0 Success Error Cancel
*** WARNING: Unable to verify timestamp for cxcir64.sys
*** ERROR: Module load completed but symbols could not be loaded for cxcir64.sys
\Driver\CXIR HIDCLASS!HidpFdoPowerCompletion
Args: 00014400 00000001 00000004 00000002[/B]
[ 16, 2] 0 e1 fffffa800ba6d060 00000000 fffff80003069430-fffffa800f8bf870 Success Error Cancel pending
\Driver\CXIR nt!IopUnloadSafeCompletion
Args: 00014400 00000001 00000004 00000002
[ 0, 0] 0 0 00000000 00000000 00000000-fffffa80079d9ce0
Args: 00000000 00000000 00000000 00000000
Viola. So we see a third-party driver being suspect here. The carrot (>) symbol shows what was active at the time of the crash. To add, the symbols error actually helps us determine the exact driver name that is holding things up. In a kernel dump, you could also use !devobj on Arg 2 as well as !devstack on the same argument to get more info on it, but minidumps only have partial information and would not help us much.
Also, for those perceptive, you'll realize our answer has also been given to us via the bucket ID in the !analyze -v output, despite the engine thinking it's usbccgp:
Code:
FAILURE_BUCKET_ID: X64_0x9F_3_cxcir64_IMAGE_usbccgp.sys
BUCKET_ID: X64_0x9F_3_cxcir64_IMAGE_usbccgp.sys
So there you go. Quite a quick one, eh? I'll explain a bit more on what is actually going on here if people are interested. Have fun!
Bonus:
for another far more interesting and exciting adventure into IRPs that are being held up for whatever reason, you wanna take a look at this thread that was the result of assistance offered by a Mr. Snrub and our good ole friend Cluberti. This is a very good example of how easy it is to run far off track from the real cause by not interpreting all the data available.
It starts with a misconception that the drive is bad because there's an I/O error that is occurring during a paging operation. Commonly, it is accurate to say that this is caused by some sort of hardware failure, especially attributing it to a disk or disk controller problem. That's because an operation during a paging operation involves shoving stuff from memory into disk (specifically the paging file). However, Mr. Snrub catches that the subcode for the error is actually a resource exhaustion error. He and Cluberti then work together with the client to discover what resource is exhausted, and finds nonpaged pool being exhausted (note that this is dealing with Windows XP x86, which puts a restrictive hard limit on nonpaged pool, compared to its newer x64 brethren). Then they work through it to find what's filling up nonpaged pool, and ends up discovering that unfinished IRPs, a lot of them, are filling it up. This is where Cluberti comes in and starts diving through the IRPs to conclude that an outdated AV driver wasn't finishing its job, and therefore the IRPs waiting on it keeps piling up till there's no room left. It's definitely worth a read through the whole thing, and is just overall good solid debugging put to use.
I want to finish by mentioning that this could not be possible through minidumps. You're lucky to even get a single IRP out a minidump let alone other information like memory allocations and whatnot. It is imperative that if you look through a minidump and find hints of something that needs further investigating, don't hesitate to ask for a kernel dump!
Btw, for those who haven't already read Mark Russinovich's blog article on nonpaged pool and are too lazy to check my link on BSOD Method & Tips, it's available here.