This is a simple article for those unfamiliar with 0x9F crashdumps, aka DRIVER_POWER_STATE_FAILURE.
9F bugchecks are particularly easy diagnose in most cases. Usually they involve a subcode of 3, which is a device object has been holding up an IRP for too long and things aren't moving forward. I could go into details about what an IRP is or a device object but I'll leave that for another scenario (I might explain here via request). If you want to dive into it, check out Chapter 7 of Windows Internals book, entitled, I/O System. It'll give you the beef on IRPs, IRQLs, etc. Anyways, let's take an example case:
0: kd> !analyze -v
* Bugcheck Analysis *
A driver is causing an inconsistent power state.
Arg1: 0000000000000003, A device object has been blocking an Irp for too long a time
Arg2: fffffa800ba6c060, Physical Device Object of the stack
Arg3: fffff80005cc23d8, Functional Device Object of the stack
Arg4: fffffa800fa37b40, The blocked IRP
FAULTING_MODULE: fffff88008200000 usbccgp
fffff800`05cc2388 fffff800`030eacd2 : 00000000`0000009f 00000000`00000003 fffffa80`0ba6c060 fffff800`05cc23d8 : nt!KeBugCheckEx
fffff800`05cc2390 fffff800`030885fc : fffff800`05cc24d8 fffff800`05cc24d8 00000000`00000000 00000000`00000009 : nt! ?? ::FNODOBFM::`string'+0x34a90
fffff800`05cc2430 fffff800`03088496 : fffffa80`0b782148 fffffa80`0b782148 00000000`00000000 00000000`00000000 : nt!KiProcessTimerDpcTable+0x6c
fffff800`05cc24a0 fffff800`0308837e : 00000016`cc7d230b fffff800`05cc2b18 00000000`000993e8 fffff800`031f6f88 : nt!KiProcessExpiredTimerList+0xc6
fffff800`05cc2af0 fffff800`03088167 : 00000004`e1eef2cb 00000004`000993e8 00000004`e1eef22f 00000000`000000e8 : nt!KiTimerExpiration+0x1be
fffff800`05cc2b90 fffff800`0307496a : fffff800`031f2e80 fffff800`03200cc0 00000000`00000002 fffff880`00000000 : nt!KiRetireDpcList+0x277
fffff800`05cc2c40 00000000`00000000 : fffff800`05cc3000 fffff800`05cbd000 fffff800`05cc2c00 00000000`00000000 : nt!KiIdleLoop+0x5a
Ok, so right now the analysis engine blames usbccgp. This is an MS standard driver for USB, so we figure this is rather inaccurate. Instead, let's look at the IRP that it's complaining is being held up. Fortunately it retained the address of it as Arg 4 of the bugcheck, so take that and use the !irp extension to get the nitty gritty on it:
Viola. So we see a third-party driver being suspect here. The carrot (>) symbol shows what was active at the time of the crash. To add, the symbols error actually helps us determine the exact driver name that is holding things up. In a kernel dump, you could also use !devobj on Arg 2 as well as !devstack on the same argument to get more info on it, but minidumps only have partial information and would not help us much.
Also, for those perceptive, you'll realize our answer has also been given to us via the bucket ID in the !analyze -v output, despite the engine thinking it's usbccgp:
So there you go. Quite a quick one, eh? I'll explain a bit more on what is actually going on here if people are interested. Have fun!
for another far more interesting and exciting adventure into IRPs that are being held up for whatever reason, you wanna take a look at this thread that was the result of assistance offered by a Mr. Snrub and our good ole friend Cluberti. This is a very good example of how easy it is to run far off track from the real cause by not interpreting all the data available.
It starts with a misconception that the drive is bad because there's an I/O error that is occurring during a paging operation. Commonly, it is accurate to say that this is caused by some sort of hardware failure, especially attributing it to a disk or disk controller problem. That's because an operation during a paging operation involves shoving stuff from memory into disk (specifically the paging file). However, Mr. Snrub catches that the subcode for the error is actually a resource exhaustion error. He and Cluberti then work together with the client to discover what resource is exhausted, and finds nonpaged pool being exhausted (note that this is dealing with Windows XP x86, which puts a restrictive hard limit on nonpaged pool, compared to its newer x64 brethren). Then they work through it to find what's filling up nonpaged pool, and ends up discovering that unfinished IRPs, a lot of them, are filling it up. This is where Cluberti comes in and starts diving through the IRPs to conclude that an outdated AV driver wasn't finishing its job, and therefore the IRPs waiting on it keeps piling up till there's no room left. It's definitely worth a read through the whole thing, and is just overall good solid debugging put to use.
I want to finish by mentioning that this could not be possible through minidumps. You're lucky to even get a single IRP out a minidump let alone other information like memory allocations and whatnot. It is imperative that if you look through a minidump and find hints of something that needs further investigating, don't hesitate to ask for a kernel dump!
Btw, for those who haven't already read Mark Russinovich's blog article on nonpaged pool and are too lazy to check my link on BSOD Method & Tips, it's available here.
Thank you very much. I want to forewarn though that not all cases are as easy as this. There's obviously going to be some which a 3rd-party driver does not show up (or that it's not exactly the one causing the crashes) in the IRP stacks, in which case you most likely would need to start deep diving (which will most likely require a kernel dump at least). In the original thread that discussed on this article JC and others have noticed that the WER reports were actually more perceptive in figuring out a culprit driver than the analyze engine in Windbg. A prime example is from usasma, which reported:
Bump. I added a most delicious piece of bonus material that I found perusing the internets, made possible by none other than good ole Cluberti. I highly advise reading it, as it covers a lot on the necessity of going deeper to find a cause. Suffice to say, if they were stuck just looking at the minidump or the initial output given by the client, that person may have ended up replacing hardware without ever fixing the problem! That's why we're here to learn about forensic troubleshooting and debugging: to avoid going caveman on problems and discover and to instead fix em with pinpoint accuracy.
Note: if anyone needs clarification on what is taking place in the thread I linked too, such as "why did Cluberti do this?" or "What does he mean by this?" I'd be happy to help.
I'm not sure if it's of any use in a training scenario but here's one that was 'sorted' without any assistance from WER (wouldn't start), I almost called here for reinforcements by post #38/39, luckily the OP was pretty patient and kept track of what I was trying to do: http://forums.majorgeeks.com/showthread.php?t=258111
This was a brilliant article in diagnosing the root cause of this fault. I was able to help a friend, from a provided minidump and sysdata file, find their regular blue screen on a completely patched and up-to-date Win7/x86 box. And I'm a programmer but I'm no driver/windbg expert. I did find a 'gotcha' though: My !irp command didn't identify the driver that was causing it (ditto with the bucket). It identified 0x9F_3_IMAGE_usbccgp.sys as the fault. Because I only had a minidump and not a fulldump, I almost gave up at that point (the instructions implied I wouldn't get much further without the full dump), but I thought I'd try the two commands also mentioned (!devobj and !devstack). While !devobj just brought up nonsense (I assume because of the lack of information), !devstack identified the vendor and product ID of the faulty device. At that point it was simple enough to go through the sysdata.xml and figure out the true cause. Lesson to learn: don't give up because you don't have a full dump. !devstack may prove particularly useful even if the only information it can give you is the vendor and product ID -- these are listed in the sysdata file. - Grotty.
Thanks, grotlek, and glad to hear it worked out for ya! Yes, this particular case was very minimal and only the !irp was necessary. You were lucky to see anything coming from either !devobj or !devstack for the devobjs on the IRP stacks because that information is not always present in a minidump (more often absent than present) so they aren't very reliable. That's kinda the reason why I didn't go as far as to note those in this particular article. Thank you for covering them, though!