I find it frustrating, due to my lack of knowledge, that when a user only gets Blue screens when playing modern games, I cannot dig deeply enough to try and understand what the game is trying to do. Then the reason for the crash.
Does anyone have any pointers of things more things to read or try (generally speaking).
I have the Windows Internals books as well as I am reading Root kit Arsenal. I know when people see that title they get concerned, but my theory is that hackers must have a great grasp of drivers in order to get their malicious programs to work.
This full memory dump below is an example from a user that uses Saitek keyboard and some other game controller that, as far as I can workout crashes HID. But what I cannot understand is why. BTW he has advised that ALL drivers, firmware, BIOS and applications are up to date. As well he computer is within the spec of the game.
Very generally if the BSOD only comes out during gaming and it's not the game itself, often it's hardware related, games push hardware temps rise, fulling loading components drags the voltage down. If temps and voltages look good look to video drivers.
Finding symbols for games is like finding symbols for 3rd party drivers impossible to do.
I am no BSOD expert but I have had my system crash multiple times when overclocking due to voltage drop...
I know John dug some information up in my dumps I sent him long ago but I have no clue where :grin1:
As much as it pains me to say it... If everything checks out the PSU may be the culprit... It may not be able to properly feed the system under full load...On a side note, you could have then run tests like Prime95 or furmark to test under load.
Yes, forensic analysis books such as what you have there is excellent material to get into. OS Design books are also preferable. I unfortunately have no personal preferences at this time to give you.
Unless you have the know-how to figure out to a T all that caused the crash, you'll eventually reach a point where you have to make an educated guess on what it could be. While this isn't the most optimal solution, the good thing is that your current knowledge should at least be able to explain what it could not be. To someone without knowledge, they would look at a BSOD and have no idea what could've generated it. However, you have been able to determine as far as it being related to the keyboard. As broad an estimate that is, it's still quite a improvement over being completely clueless! Someone else could be tinkering with their video card, not realizing it has absolutely nothing to do with this BSOD. So don't work yourself out just because you've only been able to go so far on your estimate. You've accomplished one step successfully, now go the other step by figure out just what happened with the I/O involving this keyboard that got all out of whack.
What I did is start with the DV error, in that some I/O problem occurred that DV caught, in that it detected a misbehaving driver. Ok, what did DV catch? Check the minor error code in Arg1 of the bugcheck: 0x23b, or "The caller has changed the status field of an IRP it does not understand." Hold it! what does this mean? This is where understanding IRP handling is important, which you should read extensively from both the MSDN articles pertaining to IRPs as well as the Windows Internals book on I/O. Basically, IRPs pertaining to I/O gets passed down a stack of drivers relevant to that I/O, such as disk drivers and filter drivers (like A/V drivers) for file I/O. One of the elements of an IRP is the status field, which tells the drivers handling it the current status of the I/O that this IRP pertains too. Here's a breakdown of an IRP for ya:
Both Pointer and Status are the same value, just presented differently (one in decimal, one in hex). It's obvious we're dealing with a valid error status code, because the format of it fits the pattern ("c000XXXX"). Let's look it up:
2: kd> !error c0000010
Error code: (NTSTATUS) 0xc0000010 (3221225488) - The specified request is not a valid operation for the target device.
Ah ha, so the status field that ultimately got passed was that the request was not valid for the target device. This is where we hit a crossroads in our analysis; we can either figure out why the request was not valid, or what the device was that didn't like it. Let's start with the device first, by looking up the IRP using !irp:
I added "1" in order to give us extra details on the IRP, which is really just providing us nothing different then what we did previously by dumping the !_IRP data structure. You can tell the IoStatus.Status subfield is easily visible here. Evidently this is the easier option to check this out, but I went the other route at first to show how to view data structures properly.
Now, what we're looking for is the device, which is shown in green. You'll discover there's two devices here. Let's start from the bottom one:
2: kd> !devobj fffffa800a389aa0
Device object (fffffa800a389aa0) is for:
\DRIVER\VERIFIER_FILTER DriverObject fffffa8008583cc0
Current Irp 00000000 RefCount 0 Type 00000022 Flags 00002010
DevExt fffffa800a389bf0 DevObjExt fffffa800a389c30
ExtensionFlags (0xc0000800) DOE_BOTTOM_OF_FDO_STACK, DOE_DESIGNATED_FDO
Unknown flags 0x00000800
AttachedTo (Lower) fffffa800a38c060 \Driver\SaiMini
Device queue is not busy.
This is the VERIFIER_FILTER device object, so we evidently shouldn't be worried about this. It's obvious this is what started the IRP since Driver Verifier was involved, and it also shows that this IRP is probably a fake IRP created by DV to test drivers for bugs. We'll extrapolate more on that later. For now, telling from the "Current IRP" being null, it's no longer involved. Let's move to the next device object. Note that it is listed as the lower device in the device stack in relation to the one we're looking at now, meaning it's "closer" to the actual device inside the OS kernel doing the I/O than this DV device object is. Remember that an OS is designed to be the medium between which applications (software) can interact with hardware to accomplish stuff. Anyways, let's move on:
2: kd> !devobj fffffa800a38c060
Device object (fffffa800a38c060) is for:
_HID00000001 \Driver\SaiMini DriverObject fffffa800a389e70
Current Irp fffffa800a83fcf0 RefCount 0 Type 00000022 Flags 00002050
Dacl fffff9a100083df1 DevExt fffffa800a38c1b0 DevObjExt fffffa800a38c648
ExtensionFlags (0xe0000800) DOE_RAW_FDO, DOE_BOTTOM_OF_FDO_STACK,
Unknown flags 0x00000800
AttachedDevice (Upper) fffffa800a389aa0 \DRIVER\VERIFIER_FILTER
AttachedTo (Lower) fffffa800a3893d0 \DRIVER\VERIFIER_FILTER
Device queue is not busy.
Getting warmer. We can tell this device object is for the SaiMini drive and is currently handling the faulting IRP. We can also tell from the "_HID00000001" that it's dealing with a HID (Human Interface Device) like mouse or keyboard. Now, for a bit of a bigger picture. Let's look at the entire device stack this is involved with. Using any of the device object addresses aforementioned will do:
All right! Just from looking at the device node info, we can easily determine this is a keyboard we're dealing with and not a mouse. We can also get the grand picture on the entire device stack involved with this IRP and how I/O flows. From the top, DV has a filter driver involved to watch out activity between SaiMini and user land (usermode environment: applications and services); then we have SaiMini which I guess is a minifilter driver (read in Windows Internals), followed by yet another DV filter driver to verify activity between SaiMini and the lowest driver in the stack, SaiNtBus.
Ok, now that we figured out the device involved, it's time to determine what the request was and why it was invalid. So we'll have to look back at the IRP, reprinted here for convenience:
The major (MJ) function code and the minor (MN) function code are what we are looking for, which is printed on the left of the !irp output, that being 17,ff, for major and minor, respectively. Let's look them up in the Windbg help manual for !irp. The MJ function code turns out to be IRP_MJ_SYSTEM_CONTROL. Looking up in MSDN you'll find the respective article here. Judging by the description, it's a WMI function, so look back at the Windbg manual for !irp again but in the WMI minor function code table (description of its items are here. Now this is funny, our current MN code value of 0xff is not in the list. What's the deal? Time to look at that article in the previous link that's on the WMI minor IRP codes for an idea. It's from here we discover:
Drivers that do not register as WMI data providers must forward all WMI requests to the next-lower driver.
If the driver receives an IRP containing any other IRP minor function code [outside of what's listed], it should forward the IRP to the next-lower driver.
The second line there is what's really important. Since the MN code of 0xff doesn't fit the bill with any listed options, any driver handling the IRP should not fiddle with it and just pass it down the line in the device stack. However, remember the c0000010 error concerning the device not being able to understand the request? This is the request it's referring too. What appears to be happening is DV is sending this fake request that should be passed down the line untouched but SaiMini is interfering with it by saying it doesn't understand it, and therefore is passing it down with the altered IoStatus. That's exactly what is explained in the bugcheck explanation. So, apparently, the driver wasn't coded properly to conform to WMI standards that it should not be tampering with IRPs that it has no right to do so, and therefore DV caught it and issued a bugcheck. Whether it's actually responsible for what's involved with the person's dilemma, I'm not sure, but it is a valid bug in the code of this driver, and looking at this and other related Saitek drivers being from 2010, I'd say it's probably out of date, too. If there's no updates for it, the only option probably is to either have them contact Saitek on this, or just uninstall the drivers and Saitek software for the keyboard and just rely on the basic Windows drivers, which should suffice in most cases.
Hope that helps. I too didn't understand much of this bugcheck until I did a little bit of research. Took me a couple hours but now I know a bit more about IRP handling which will help me in the future (unless I'm misguided somehow!).
I've heard that those who are trained to detect and root out counterfeit currency do so by studying every minute detail of the legit thing. By doing so, they are able to detect any and all discrepancies that show up in the fake copies.
It's just the same here. The whole intent is to study how the OS operates and how things should be functioning, rather than trying to figure out every which way they could fault. So yes, it's quite like learning to be a driver developer. You have to figure out how it should happen, rather than how it wouldn't. Of course, I personally couldn't code to save my grandmother, but I at least am able to learn the concepts and mechanics of everything without trying to figure out how to actually write code to put it to use.