I've discussed some 0xD1 debugging here, but I figured I'd also go into a different 0xD1 scenario here, and just show it from different angles by using NotMyFault to force a bug check.
Download NotMyfault here.
DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
This indicates that a kernel-mode driver attempted to access pageable memory at a process IRQL that was too high.
We're all familiar with this bug check, so let's move on to what I wanted to talk about.
Let's go ahead and do an !analyze -v
fffff8a0066eb800 was the memory that was referenced. It's either invalid or it was at an IRQL that was too high.
Using our handy !pte command which shows page table and directory entry for an address, we can see that it is not a valid address despite appearing to be one based on a first glance. Why is it not valid? As we can see above, and as I highlighted in purple, it's because this address is currently on the pagefile.
Why can't we just page it in? As we know, this is not how the Windows memory manager works regarding kernel-mode and its rules. If we're at IRQL (2) or higher (which we are, see argument 2), we cannot page anything in, therefore we bug check.
Great, so we know why the system crashed. However, what caused it?
Let's go ahead and dump the stack:
We start out with something in user-mode that we don't have the symbols for, and this is why it's 0x76df138a as opposed to a resolved name that we can understand. Why did I make the 7 in the address red, and how did I know we started out with something going on in user-mode? Good question! When the first digit of an address like that is 7 or lower, it's a user-mode address.
This is also due to the fact that this is a kernel-dump, which we can see towards the top of our crash dump within WinDbg:
With that said, we cannot see what the application was doing outside of when it went down into kernel-mode.
So we know that some application (0x76df138a) did something, and called down into kernel-mode. Everything above 0x76df138a is now kernel-mode. On x64, you can tell because the addresses start with fffff880`032f4a00 under Child-SP which implies kernel-mode.
We can see it goes through a few functions, and then ends up in myfault. Shortly afterwards, we hit a pagefault (trying to page in memory from the pagefile -- big no no).
If we take a look at the trap frame:
The first very important thing to note is the note about the trap frame not containing all registers, and how they may be either zeroed out or incorrect. The big question is why? Well, trap frame code generation on x64 versions of Windows does not save the contents of registers that are non-volatile.
With that said, registers such as rbx, rdi, rsi, etc, are either zeroed out or incorrect. This is due to the fact that on x64, any code that runs after the generation of a trap frame will properly hand it and restore it to its own frame. It's seen as an unnecessary step in a hot path within the kernel.
Extremely detailed article with much more info here.
Moving on, what happened with the instruction we failed on, we were setting the eax register to the value stored in/at address rbx:
Uh oh, rbx is zeroed out. With that said, we can't !pte the register address to double check it, etc. We just need to assume that this all occurred because of myfault attempted to access memory that was either paged out or invalid (which it did).
If you wanted any extra proof or to see if NotMyFault was the crash, you could dump all of the processes at the time of the crash to see if there was any correlation. In this case, you'd use !process 0 0. Flags are important in this case, and you can as always check the WinDbg help file for info, or use MSDN.
We can see we did indeed have a NotMyFault process running at the time of the crash, so we can at this point assume that this is very likely the accurate cause of the crash.
Hope you enjoyed reading!
Download NotMyfault here.
DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
This indicates that a kernel-mode driver attempted to access pageable memory at a process IRQL that was too high.
We're all familiar with this bug check, so let's move on to what I wanted to talk about.
Let's go ahead and do an !analyze -v
Code:
DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: [COLOR=#ff0000]fffff8a0066eb800[/COLOR], memory referenced
Arg2: 000000000000000[COLOR=#0000ff]2[/COLOR], IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Arg4: fffff88002af7385, address which referenced memory
fffff8a0066eb800 was the memory that was referenced. It's either invalid or it was at an IRQL that was too high.
Code:
kd> !pte fffff8a0066eb800
VA fffff8a0066eb800
PXE at FFFFF6FB7DBEDF88 PPE at FFFFF6FB7DBF1400 PDE at FFFFF6FB7E280198 PTE at FFFFF6FC50033758
contains 000000007AC84863 contains 000000000367B863 contains 000000006B4C6863 contains 00003B5000000000
pfn 7ac84 ---DA--KWEV pfn 367b ---DA--KWEV pfn 6b4c6 ---DA--KWEV [COLOR=#ff0000]not valid[/COLOR]
[COLOR=#4b0082]PageFile: 0[/COLOR]
Offset: 3b50
Protect: 0
Using our handy !pte command which shows page table and directory entry for an address, we can see that it is not a valid address despite appearing to be one based on a first glance. Why is it not valid? As we can see above, and as I highlighted in purple, it's because this address is currently on the pagefile.
Why can't we just page it in? As we know, this is not how the Windows memory manager works regarding kernel-mode and its rules. If we're at IRQL (2) or higher (which we are, see argument 2), we cannot page anything in, therefore we bug check.
Great, so we know why the system crashed. However, what caused it?
Let's go ahead and dump the stack:
Code:
kd> k
Child-SP RetAddr Call Site
fffff880`032f4448 fffff800`02a912a9 nt!KeBugCheckEx
fffff880`032f4450 fffff800`02a8ff20 nt!KiBugCheckDispatch+0x69
fffff880`032f4590 fffff880`02af7385 [COLOR=#4b0082]nt!KiPageFault+0x260[/COLOR] [COLOR=#008000]<-- Calling into a pagefault.[/COLOR]
fffff880`032f4720 fffff880`02af7727 [COLOR=#ff0000]myfault+0x1385[/COLOR] [COLOR=#008000]<-- Same as before.[/COLOR]
fffff880`032f4870 fffff800`02dac127 [COLOR=#ff0000]myfault+0x1727[/COLOR] [COLOR=#008000]<-- Ending up in myfault.[/COLOR]
fffff880`032f48d0 fffff800`02dac986 nt!IopXxxControlFile+0x607 [COLOR=#008000]<--- Same as before.[/COLOR]
fffff880`032f4a00 fffff800`02a90f93 nt!NtDeviceIoControlFile+0x56 [COLOR=#008000]<--- Going through this function in kernel-mode.[/COLOR]
fffff880`032f4a70 00000000`76df138a nt!KiSystemServiceCopyEnd+0x13 [COLOR=#008000]<--- Calling down into kernel-mode.[/COLOR]
00000000`0023edc8 00000000`00000000 [COLOR=#0000ff]0x[/COLOR][COLOR=#ff0000]7[/COLOR][COLOR=#0000ff]6df138a[/COLOR] [COLOR=#008000]<-- Something in user-mode.[/COLOR]
We start out with something in user-mode that we don't have the symbols for, and this is why it's 0x76df138a as opposed to a resolved name that we can understand. Why did I make the 7 in the address red, and how did I know we started out with something going on in user-mode? Good question! When the first digit of an address like that is 7 or lower, it's a user-mode address.
This is also due to the fact that this is a kernel-dump, which we can see towards the top of our crash dump within WinDbg:
Code:
Kernel Summary Dump File: [COLOR=#ff0000]Only kernel address space is available[/COLOR]
With that said, we cannot see what the application was doing outside of when it went down into kernel-mode.
So we know that some application (0x76df138a) did something, and called down into kernel-mode. Everything above 0x76df138a is now kernel-mode. On x64, you can tell because the addresses start with fffff880`032f4a00 under Child-SP which implies kernel-mode.
We can see it goes through a few functions, and then ends up in myfault. Shortly afterwards, we hit a pagefault (trying to page in memory from the pagefile -- big no no).
If we take a look at the trap frame:
Code:
kd> .trap 0xfffff880032f4590
[COLOR=#ff0000]NOTE: The trap frame does not contain all registers.[/COLOR]
[COLOR=#ff0000]Some register values may be zeroed or incorrect.[/COLOR]
rax=0000000005000000 [COLOR=#4b0082]rbx=0000000000000000[/COLOR] rcx=0000000000002481
rdx=fffffa8001810000 rsi=0000000000000000 rdi=0000000000000000
rip=fffff88002af7385 rsp=fffff880032f4720 rbp=fffff880032f4b60
r8=0000000000012408 r9=0000000000000810 r10=fffff80002a12000
r11=0000000000000002 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei ng nz na po nc
myfault+0x1385:
fffff880`02af7385 8b03 mov eax,dword ptr [rbx] ds:00000000`00000000=????????
The first very important thing to note is the note about the trap frame not containing all registers, and how they may be either zeroed out or incorrect. The big question is why? Well, trap frame code generation on x64 versions of Windows does not save the contents of registers that are non-volatile.
With that said, registers such as rbx, rdi, rsi, etc, are either zeroed out or incorrect. This is due to the fact that on x64, any code that runs after the generation of a trap frame will properly hand it and restore it to its own frame. It's seen as an unnecessary step in a hot path within the kernel.
Extremely detailed article with much more info here.
Moving on, what happened with the instruction we failed on, we were setting the eax register to the value stored in/at address rbx:
Code:
[COLOR=red]mov[/COLOR] eax,dword [COLOR=blue]ptr[/COLOR] [[COLOR=purple]rbx[/COLOR]]
Uh oh, rbx is zeroed out. With that said, we can't !pte the register address to double check it, etc. We just need to assume that this all occurred because of myfault attempted to access memory that was either paged out or invalid (which it did).
If you wanted any extra proof or to see if NotMyFault was the crash, you could dump all of the processes at the time of the crash to see if there was any correlation. In this case, you'd use !process 0 0. Flags are important in this case, and you can as always check the WinDbg help file for info, or use MSDN.
Code:
PROCESS fffffa80040a7060
SessionId: 1 Cid: 0654 Peb: 7fffffd4000 ParentCid: 0708
DirBase: 670ea000 ObjectTable: fffff8a00666c330 HandleCount: 68.
Image: [COLOR=#ff0000]NotMyfault.exe[/COLOR]
We can see we did indeed have a NotMyFault process running at the time of the crash, so we can at this point assume that this is very likely the accurate cause of the crash.
Hope you enjoyed reading!