Okay, so for 124's we can !errrec and for 9F's we can !irp, etc. On 24's, I've noticed it mentions the possibility to do a .cxr and then kb to obtain more info. First of all, what the hell does this mean? Second of all, how do I go about figuring this out?
If you see NtfsExceptionFilter on the stack then the 2nd and 3rd
parameters are the exception record and context record. Do a .cxr
on the 3rd parameter and then kb to obtain a more informative stack
So since there is no NtfsExceptionFilter in the stack, I would assume this is not a dump in which we can perform the following commands noted in the dump? If so, regardless, can anyone show me in a dump with an exception filter, what you would do and what would be the outcome?
The .cxr command displays the registers for the Context Switch I believe. You can use the !thread extension with the dps command to obtain a raw stack for the entire thread.
You are correct about reading the stack from bottom to top.
.cxr Remarks from Windows Debugger -
The information from a context record can be used to assist in debugging a system halt where an unhandled exception has occurred and an exact stack trace is not available. The .cxr command displays the important registers for the specified context record.
This command also instructs the debugger to use the specified context record as the register context. After this command is executed, the debugger will have access to the most important registers and the stack trace for this thread. This register context persists until you allow the target to execute or use another register context command (.thread, .ecxr, .trap , or .cxr again). In user mode, it will also be reset if you change the current process or thread. See Register Context for details.
The .cxr command is often used to debug bug check 0x1E. For more information and an example, see Bug Check 0x1E (KMODE_EXCEPTION_NOT_HANDLED).
It's not really any different than any other typical bugcheck, in that an exception occurs, the context/exception records of the occurrence are saved (cxr and ecxr for short, respectively), and then the bugcheck is triggered. The context record saves the most recent registers and stack trace of the event that triggered the exception (the place where it went wrong). The stacktrace !analyze -v starts you off with is just what happened in response to that event (the event, btw, being an illegal instruction in this case). So you were going in the right direction with the context record, but of course that's on the first step in the journey really.
As for this certain instance, in case you wish to know, I personally can't see anything problematic with the cmp instruction it faulted on. I would venture to guess that the CPU actually executed something else besides what the code says, which means a CPU/Mobo/PSU issue.
That call stack should be within the context of the context switch now. Run the .thread command to return to the thread, and then do a kb, the call stack should be slightly different. That's how I tell if I've done something right.
With the call stack, I usually try and search what the routines are doing, and then attempt to work out, what went wrong within the thread.
Ntfs!NtfsPrepareFcbForRemoval refers to the File Control Block structure, therefore I'm guessing this structure is being removed by the filesystem. The FCB is used to maintain any open files, I think it's used to perform file operations and allow a program to have as many files open as it wants, it's maintained within the address space for the program.
Ntfs!NtfsRemoveScb and Ntfs!NtfsDeleteScb refer to the Stream Control Block, and is used to store the address of that file stream I believe, a file stream is used to hold data about a particular open file. Each open file has it's own file stream. I'm guessing that the file stream was closed, and therefore the structure wasn't needed anymore.
Going back to the other information you stated earlier, a access violation occured, so I'm wondering what if the address of the file which has been closed, has been referenced?
I've had the user enable Verifier and there's nothing really of value. A this point in the analysis, I am assuming it's a hard drive issue itself even though it passed WD diagnostics, sfc, chkdsk, etc. The user has also done a clean install.