Jared
Sysnative Staff, BSOD Kernel Dump Expert
- Feb 3, 2014
- 1,591
This is an interesting crash as I caused it by using a non paged memory leak with NotMyFault.
The bugcheck is a double fault meaning a fault occurred when the handler for the previous exception was called.
In this example, a guard page is hit which monitors memory access, the Kernel then tries to push a trap frame to catch the fault but because all the memory has been used, so there is no stack left a Kernel stack overflow is produced which causes the double fault. Most of the time double faults can be handled but in this example it always causes a bugcheck.
I'm not 100% familiar with how to deal with this types of errors so if anyone has any suggestions then feel free to post them.
First of all the call stack is interesting, the cause is my Nvidia driver.
So here, we see the Nvidia driver allocating a lot of memory, there are lots of lines of allocations, although they all have the same offset from 0a until ff which in hexadecimal is a lot, it looks like it then tried to allocate more memory and because the non paged memory has ran dry it can no longer allocate anything else, causing the double fault which invokes the bugcheck function.
For more information on how the bugcheck is actually called then I recommend reading Patrick's post here.
https://www.sysnative.com/forums/bs...e-screen-of-death-actually-works-why-etc.html
So lets get back to this...
Here we can see the non paged memory has been completely exhausted, and with that 724 pool allocations failed, so the reason for the bugcheck is pretty obvious.
I'm not great with memory management at this moment in time so I'm still stumped how the Nvidia driver is being blamed, memory leakage is caused by pages being allocated (by a driver in this case) and not being released afterwards.
Basically when a driver allocates memory for use it must release the memory afterwards for other allocations, in order for the memory manager to clean the memory it must first be free by the program using it.
What I think has happened is NotMyFault has been allocating memory and not freeing it, the Nvidia driver comes along and tries to allocate the memory but it can't as it's in use so it fails and causes the bugcheck.
I might be wrong so if I am please correct me on that.
I decided to do a little register dissembling which doesn't help in finding the cause but can help (I find) in understanding the cause.
Lets look at the trap frame that was pushed.
Here we can see a push instruction which is used to add data to the stack which is what is happening with the memory leak, because the pool has ran dry it's tried to expand the non paged pool but eventually failed.
On a different note, one thing I have noticed and I remember asking Patrick a long time ago when I first started debugging: "Why is the faulting instruction always in the rip register?" I believe the answer lies within the Acronym for RIP, (No, not Rest In Peace!) it stands for Reference Instruction Pointer which I believe contains the last failing instruction which is why it's always the rip register.
It appears to be trying to add data onto different stacks.
I can't find any indication at all towards NotMyFault causing the problem, I know it was definitely the cause but why it isn't showing up I'm not too sure, anyone know why?
If anyone wants the dump file I could upload it to my OneDrive.
EDIT: Looking at Mark Russinovich's blog on memory leaks I can see that the pooltag 'Leak' is associated with NotMyFault, I must have forgot to post the pools used.
There it is, we can see it's allocated nearly 4GB of RAM which is what I have on my system.
On a different note, I do intend on setting up a virtual machine to do this on at some point as NotMyFault can cause data loss but it's quite rare, you should close and applications when running it. The risk of data loss varies from system and the type of error you cause.
Code:
BugCheck 7F, {[COLOR=#ff0000]8[/COLOR], 80050033, 406f8, fffff80002e69f2c}
The bugcheck is a double fault meaning a fault occurred when the handler for the previous exception was called.
In this example, a guard page is hit which monitors memory access, the Kernel then tries to push a trap frame to catch the fault but because all the memory has been used, so there is no stack left a Kernel stack overflow is produced which causes the double fault. Most of the time double faults can be handled but in this example it always causes a bugcheck.
I'm not 100% familiar with how to deal with this types of errors so if anyone has any suggestions then feel free to post them.
First of all the call stack is interesting, the cause is my Nvidia driver.
Code:
3: kd> [COLOR=#008000]knL[/COLOR]
# Child-SP RetAddr Call Site
00 fffff880`02fddce8 fffff800`02ec7169 [COLOR=#ff8c00]nt!KeBugCheckEx[/COLOR]
01 fffff880`02fddcf0 fffff800`02ec5632 [COLOR=#ff8c00]nt!KiBugCheckDispatch+0x69[/COLOR]
02 fffff880`02fdde30 fffff800`02e69f2c [COLOR=#800080]nt!KiDoubleFaultAbort+0xb2[/COLOR]
03 fffff880`009ab000 fffff800`02ff947c [COLOR=#0000ff]nt!MiExpandNonPagedPool+0x14[/COLOR]
04 fffff880`009ab020 fffff800`02ffbf26 [COLOR=#0000ff]nt!MiAllocatePoolPages+0xdfd[/COLOR]
05 fffff880`009ab160 fffff880`04a1ea55 [COLOR=#0000ff]nt!ExAllocatePoolWithTag+0x316[/COLOR]
06 fffff880`009ab250 fffff880`04a1b6e8 [COLOR=#ff0000]nvlddmkm+0x1bfa55[/COLOR]
07 fffff880`009ab280 fffff880`04ae392a [COLOR=#ff0000]nvlddmkm+0x1bc6e8[/COLOR]
08 fffff880`009ab2e0 fffff880`04b9f804 [COLOR=#ff0000]nvlddmkm+0x28492a[/COLOR]
09 fffff880`009ab310 fffff880`04b9f827 [COLOR=#ff0000]nvlddmkm+0x340804[/COLOR]
0a fffff880`009ab350 fffff880`04b9f827 [COLOR=#ff0000]nvlddmkm+0x340827[/COLOR]
0b fffff880`009ab390 fffff880`04b9f827 [COLOR=#ff0000]nvlddmkm+0x340827[/COLOR]
0c fffff880`009ab3d0 fffff880`04b9f827 [COLOR=#ff0000]nvlddmkm+0x340827[/COLOR]
0d fffff880`009ab410 fffff880`04b9f827 [COLOR=#ff0000]nvlddmkm+0x340827[/COLOR]
0e fffff880`009ab450 fffff880`04b9f827 [COLOR=#ff0000]nvlddmkm+0x340827[/COLOR]
So here, we see the Nvidia driver allocating a lot of memory, there are lots of lines of allocations, although they all have the same offset from 0a until ff which in hexadecimal is a lot, it looks like it then tried to allocate more memory and because the non paged memory has ran dry it can no longer allocate anything else, causing the double fault which invokes the bugcheck function.
For more information on how the bugcheck is actually called then I recommend reading Patrick's post here.
https://www.sysnative.com/forums/bs...e-screen-of-death-actually-works-why-etc.html
So lets get back to this...
Code:
3: kd> [COLOR=#008000]!vm[/COLOR]
*** Virtual Memory Usage ***
Physical Memory: 1036418 ( 4145672 Kb)
Page File: \??\C:\pagefile.sys
Current: 4145672 Kb Free Space: 3702732 Kb
Minimum: 4145672 Kb Maximum: 12437016 Kb
Available Pages: 100902 ( 403608 Kb)
ResAvail Pages: 209219 ( 836876 Kb)
Locked IO Pages: 0 ( 0 Kb)
Free System PTEs: 33504448 ( 134017792 Kb)
Modified Pages: 4479 ( 17916 Kb)
Modified PF Pages: 4364 ( 17456 Kb)
[COLOR=#ff0000]NonPagedPool Usage: 764909 ( 3059636 Kb)[/COLOR]
[COLOR=#800080]NonPagedPool Max: 764972 ( 3059888 Kb)[/COLOR]
[COLOR=#ff0000]********** Excessive NonPaged Pool Usage *****[/COLOR]
PagedPool 0 Usage: 54754 ( 219016 Kb)
PagedPool 1 Usage: 4432 ( 17728 Kb)
PagedPool 2 Usage: 466 ( 1864 Kb)
PagedPool 3 Usage: 474 ( 1896 Kb)
PagedPool 4 Usage: 559 ( 2236 Kb)
PagedPool Usage: 60685 ( 242740 Kb)
PagedPool Maximum: 33554432 ( 134217728 Kb)
[COLOR=#ff0000]********** 724 pool allocations have failed **********[/COLOR]
Session Commit: 9348 ( 37392 Kb)
Shared Commit: 41683 ( 166732 Kb)
Special Pool: 0 ( 0 Kb)
Shared Process: 6529 ( 26116 Kb)
Pages For MDLs: 957 ( 3828 Kb)
PagedPool Commit: 60749 ( 242996 Kb)
Driver Commit: 8281 ( 33124 Kb)
Committed pages: 1064679 ( 4258716 Kb)
Commit limit: 2072370 ( 8289480 Kb)
Here we can see the non paged memory has been completely exhausted, and with that 724 pool allocations failed, so the reason for the bugcheck is pretty obvious.
I'm not great with memory management at this moment in time so I'm still stumped how the Nvidia driver is being blamed, memory leakage is caused by pages being allocated (by a driver in this case) and not being released afterwards.
Basically when a driver allocates memory for use it must release the memory afterwards for other allocations, in order for the memory manager to clean the memory it must first be free by the program using it.
What I think has happened is NotMyFault has been allocating memory and not freeing it, the Nvidia driver comes along and tries to allocate the memory but it can't as it's in use so it fails and causes the bugcheck.
I might be wrong so if I am please correct me on that.
I decided to do a little register dissembling which doesn't help in finding the cause but can help (I find) in understanding the cause.
Code:
nt!KiDoubleFaultAbort+0xb2 (TrapFrame @ [COLOR=#008000]fffff880`02fdde30[/COLOR])
Lets look at the trap frame that was pushed.
Code:
3: kd> [COLOR=#008000].trap fffff880`02fdde30[/COLOR]
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=00000000000bac2c rbx=0000000000000000 rcx=0000000000000001
rdx=fffff880009ab0b8 rsi=0000000000000000 rdi=0000000000000000
rip=[COLOR=#ff8c00]fffff80002e69f2c[/COLOR] rsp=fffff880009ab000 rbp=fffff880009ab080
r8=ffffffffffffffff r9=fffffa80035eb5b8 r10=00000000ffffffff
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl zr na po nc
nt!MiExpandNonPagedPool+0x14:
[COLOR=#ff8c00]fffff800`02e69f2c[/COLOR] 4156 [COLOR=#ff0000]push[/COLOR] r14
Here we can see a push instruction which is used to add data to the stack which is what is happening with the memory leak, because the pool has ran dry it's tried to expand the non paged pool but eventually failed.
On a different note, one thing I have noticed and I remember asking Patrick a long time ago when I first started debugging: "Why is the faulting instruction always in the rip register?" I believe the answer lies within the Acronym for RIP, (No, not Rest In Peace!) it stands for Reference Instruction Pointer which I believe contains the last failing instruction which is why it's always the rip register.
Code:
3: kd> [COLOR=#008000]u @rip[/COLOR]
nt!MiExpandNonPagedPool+0x14:
fffff800`02e69f2c 4156 [COLOR=#ff0000] push[/COLOR] r14
fffff800`02e69f2e 4157 [COLOR=#ff0000]push[/COLOR] r15
fffff800`02e69f30 4881ecd0000000 sub rsp,0D0h
fffff800`02e69f37 488db9ff010000 lea rdi,[rcx+1FFh]
fffff800`02e69f3e 4881e700feffff and rdi,0FFFFFFFFFFFFFE00h
fffff800`02e69f45 483bf9 cmp rdi,rcx
fffff800`02e69f48 0f82cfbffeff jb nt! ?? ::FNODOBFM::`string'+0x1e009 (fffff800`02e55f1d)
fffff800`02e69f4e 488b0513762100 mov rax,qword ptr [nt!MiSystemVaTypeCount+0x28 (fffff800`03081568)]
It appears to be trying to add data onto different stacks.
I can't find any indication at all towards NotMyFault causing the problem, I know it was definitely the cause but why it isn't showing up I'm not too sure, anyone know why?
If anyone wants the dump file I could upload it to my OneDrive.
EDIT: Looking at Mark Russinovich's blog on memory leaks I can see that the pooltag 'Leak' is associated with NotMyFault, I must have forgot to post the pools used.
Code:
3: kd> [COLOR=#008000]!poolused 2[/COLOR]
....
Sorting by NonPaged Pool Consumed
[COLOR=#ff0000] NonPaged [/COLOR] Paged
Tag Allocs [COLOR=#ff0000]Used[/COLOR] Allocs Used
Leak 32707 [COLOR=#ff0000]3088705168[/COLOR] 0 0 UNKNOWN [COLOR=#ff8c00]pooltag 'Leak'[/COLOR], please update pooltag.txt
There it is, we can see it's allocated nearly 4GB of RAM which is what I have on my system.
On a different note, I do intend on setting up a virtual machine to do this on at some point as NotMyFault can cause data loss but it's quite rare, you should close and applications when running it. The risk of data loss varies from system and the type of error you cause.
Last edited: