0x7F memory leak

Jared · Jul 6, 2014

This is an interesting crash as I caused it by using a non paged memory leak with NotMyFault.

Code:

BugCheck 7F, {[COLOR=#ff0000]8[/COLOR], 80050033, 406f8, fffff80002e69f2c}

The bugcheck is a double fault meaning a fault occurred when the handler for the previous exception was called.
In this example, a guard page is hit which monitors memory access, the Kernel then tries to push a trap frame to catch the fault but because all the memory has been used, so there is no stack left a Kernel stack overflow is produced which causes the double fault. Most of the time double faults can be handled but in this example it always causes a bugcheck.

I'm not 100% familiar with how to deal with this types of errors so if anyone has any suggestions then feel free to post them.

First of all the call stack is interesting, the cause is my Nvidia driver.

Code:

3: kd> [COLOR=#008000]knL[/COLOR]
 # Child-SP          RetAddr           Call Site
00 fffff880`02fddce8 fffff800`02ec7169 [COLOR=#ff8c00]nt!KeBugCheckEx[/COLOR]
01 fffff880`02fddcf0 fffff800`02ec5632 [COLOR=#ff8c00]nt!KiBugCheckDispatch+0x69[/COLOR]
02 fffff880`02fdde30 fffff800`02e69f2c [COLOR=#800080]nt!KiDoubleFaultAbort+0xb2[/COLOR]
03 fffff880`009ab000 fffff800`02ff947c [COLOR=#0000ff]nt!MiExpandNonPagedPool+0x14[/COLOR]
04 fffff880`009ab020 fffff800`02ffbf26 [COLOR=#0000ff]nt!MiAllocatePoolPages+0xdfd[/COLOR]
05 fffff880`009ab160 fffff880`04a1ea55 [COLOR=#0000ff]nt!ExAllocatePoolWithTag+0x316[/COLOR]
06 fffff880`009ab250 fffff880`04a1b6e8 [COLOR=#ff0000]nvlddmkm+0x1bfa55[/COLOR]
07 fffff880`009ab280 fffff880`04ae392a [COLOR=#ff0000]nvlddmkm+0x1bc6e8[/COLOR]
08 fffff880`009ab2e0 fffff880`04b9f804 [COLOR=#ff0000]nvlddmkm+0x28492a[/COLOR]
09 fffff880`009ab310 fffff880`04b9f827 [COLOR=#ff0000]nvlddmkm+0x340804[/COLOR]
0a fffff880`009ab350 fffff880`04b9f827 [COLOR=#ff0000]nvlddmkm+0x340827[/COLOR]
0b fffff880`009ab390 fffff880`04b9f827 [COLOR=#ff0000]nvlddmkm+0x340827[/COLOR]
0c fffff880`009ab3d0 fffff880`04b9f827 [COLOR=#ff0000]nvlddmkm+0x340827[/COLOR]
0d fffff880`009ab410 fffff880`04b9f827 [COLOR=#ff0000]nvlddmkm+0x340827[/COLOR]
0e fffff880`009ab450 fffff880`04b9f827 [COLOR=#ff0000]nvlddmkm+0x340827[/COLOR]

So here, we see the Nvidia driver allocating a lot of memory, there are lots of lines of allocations, although they all have the same offset from 0a until ff which in hexadecimal is a lot, it looks like it then tried to allocate more memory and because the non paged memory has ran dry it can no longer allocate anything else, causing the double fault which invokes the bugcheck function.

For more information on how the bugcheck is actually called then I recommend reading Patrick's post here.

https://www.sysnative.com/forums/bs...e-screen-of-death-actually-works-why-etc.html

So lets get back to this...

Code:

3: kd> [COLOR=#008000]!vm[/COLOR]


*** Virtual Memory Usage ***
    Physical Memory:     1036418 (   4145672 Kb)
    Page File: \??\C:\pagefile.sys
      Current:   4145672 Kb  Free Space:   3702732 Kb
      Minimum:   4145672 Kb  Maximum:     12437016 Kb
    Available Pages:      100902 (    403608 Kb)
    ResAvail Pages:       209219 (    836876 Kb)
    Locked IO Pages:           0 (         0 Kb)
    Free System PTEs:   33504448 ( 134017792 Kb)
    Modified Pages:         4479 (     17916 Kb)
    Modified PF Pages:      4364 (     17456 Kb)
    [COLOR=#ff0000]NonPagedPool Usage:   764909 (   3059636 Kb)[/COLOR]
    [COLOR=#800080]NonPagedPool Max:     764972 (   3059888 Kb)[/COLOR]
    [COLOR=#ff0000]********** Excessive NonPaged Pool Usage *****[/COLOR]
    PagedPool 0 Usage:     54754 (    219016 Kb)
    PagedPool 1 Usage:      4432 (     17728 Kb)
    PagedPool 2 Usage:       466 (      1864 Kb)
    PagedPool 3 Usage:       474 (      1896 Kb)
    PagedPool 4 Usage:       559 (      2236 Kb)
    PagedPool Usage:       60685 (    242740 Kb)
    PagedPool Maximum:  33554432 ( 134217728 Kb)


    [COLOR=#ff0000]********** 724 pool allocations have failed **********[/COLOR]


    Session Commit:         9348 (     37392 Kb)
    Shared Commit:         41683 (    166732 Kb)
    Special Pool:              0 (         0 Kb)
    Shared Process:         6529 (     26116 Kb)
    Pages For MDLs:          957 (      3828 Kb)
    PagedPool Commit:      60749 (    242996 Kb)
    Driver Commit:          8281 (     33124 Kb)
    Committed pages:     1064679 (   4258716 Kb)
    Commit limit:        2072370 (   8289480 Kb)

Here we can see the non paged memory has been completely exhausted, and with that 724 pool allocations failed, so the reason for the bugcheck is pretty obvious.

I'm not great with memory management at this moment in time so I'm still stumped how the Nvidia driver is being blamed, memory leakage is caused by pages being allocated (by a driver in this case) and not being released afterwards.

Basically when a driver allocates memory for use it must release the memory afterwards for other allocations, in order for the memory manager to clean the memory it must first be free by the program using it.
What I think has happened is NotMyFault has been allocating memory and not freeing it, the Nvidia driver comes along and tries to allocate the memory but it can't as it's in use so it fails and causes the bugcheck.
I might be wrong so if I am please correct me on that.

I decided to do a little register dissembling which doesn't help in finding the cause but can help (I find) in understanding the cause.

Code:

nt!KiDoubleFaultAbort+0xb2 (TrapFrame @ [COLOR=#008000]fffff880`02fdde30[/COLOR])

Lets look at the trap frame that was pushed.

Code:

3: kd> [COLOR=#008000].trap fffff880`02fdde30[/COLOR]
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=00000000000bac2c rbx=0000000000000000 rcx=0000000000000001
rdx=fffff880009ab0b8 rsi=0000000000000000 rdi=0000000000000000
rip=[COLOR=#ff8c00]fffff80002e69f2c[/COLOR] rsp=fffff880009ab000 rbp=fffff880009ab080
 r8=ffffffffffffffff  r9=fffffa80035eb5b8 r10=00000000ffffffff
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl zr na po nc
nt!MiExpandNonPagedPool+0x14:
[COLOR=#ff8c00]fffff800`02e69f2c[/COLOR] 4156            [COLOR=#ff0000]push[/COLOR]    r14

Here we can see a push instruction which is used to add data to the stack which is what is happening with the memory leak, because the pool has ran dry it's tried to expand the non paged pool but eventually failed.

On a different note, one thing I have noticed and I remember asking Patrick a long time ago when I first started debugging: "Why is the faulting instruction always in the rip register?" I believe the answer lies within the Acronym for RIP, (No, not Rest In Peace!) it stands for Reference Instruction Pointer which I believe contains the last failing instruction which is why it's always the rip register.

Code:

3: kd> [COLOR=#008000]u @rip[/COLOR]

nt!MiExpandNonPagedPool+0x14:
fffff800`02e69f2c 4156           [COLOR=#ff0000] push[/COLOR]    r14
fffff800`02e69f2e 4157            [COLOR=#ff0000]push[/COLOR]    r15
fffff800`02e69f30 4881ecd0000000  sub     rsp,0D0h
fffff800`02e69f37 488db9ff010000  lea     rdi,[rcx+1FFh]
fffff800`02e69f3e 4881e700feffff  and     rdi,0FFFFFFFFFFFFFE00h
fffff800`02e69f45 483bf9          cmp     rdi,rcx
fffff800`02e69f48 0f82cfbffeff    jb      nt! ?? ::FNODOBFM::`string'+0x1e009 (fffff800`02e55f1d)
fffff800`02e69f4e 488b0513762100  mov     rax,qword ptr [nt!MiSystemVaTypeCount+0x28 (fffff800`03081568)]

It appears to be trying to add data onto different stacks.

I can't find any indication at all towards NotMyFault causing the problem, I know it was definitely the cause but why it isn't showing up I'm not too sure, anyone know why?

If anyone wants the dump file I could upload it to my OneDrive.

EDIT: Looking at Mark Russinovich's blog on memory leaks I can see that the pooltag 'Leak' is associated with NotMyFault, I must have forgot to post the pools used.

Code:

3: kd> [COLOR=#008000]!poolused 2[/COLOR]
....
 Sorting by NonPaged Pool Consumed


              [COLOR=#ff0000] NonPaged [/COLOR]                 Paged
 Tag     Allocs         [COLOR=#ff0000]Used[/COLOR]     Allocs         Used


 Leak     32707   [COLOR=#ff0000]3088705168[/COLOR]          0            0	UNKNOWN [COLOR=#ff8c00]pooltag 'Leak'[/COLOR], please update pooltag.txt

There it is, we can see it's allocated nearly 4GB of RAM which is what I have on my system.

On a different note, I do intend on setting up a virtual machine to do this on at some point as NotMyFault can cause data loss but it's quite rare, you should close and applications when running it. The risk of data loss varies from system and the type of error you cause.

x BlueRobot · Jul 9, 2014

Double Faults generally mean that a exception occurred when another exception was being processed. There's also a triple fault which is a complete system shutdown.

Yes, RIP is the current instruction pointer on x64; it's EIP on x86 processors. The nVidia driver is most likely being blamed because the stack is based upon the last saved context which is the exception.

Jared · Jul 9, 2014

Thanks for the add Harry, I've heard of triple faults before but never really looked into them.

x BlueRobot · Jul 9, 2014

If you want to check where the stack was gathered from, then using !analyze -v and looking at the STACK_COMMAND should be suffice.

Code:

STACK_COMMAND:  kb

Jared · Jul 9, 2014

Well the stack is huge, it's the same from the nvlddmkm+0x340827 function downwards lots of times with all the arguments exactly the same.

Code:

3: kd> [COLOR=#008000]kb
[/COLOR]RetAddr           : Args to Child                                                           : Call Site
fffff800`02ec7169 : 00000000`0000007f 00000000`00000008 00000000`80050033 00000000`000406f8 : nt!KeBugCheckEx
fffff800`02ec5632 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiBugCheckDispatch+0x69
fffff800`02e69f2c : fffffa80`035d4000 00000000`00000000 00000000`00000000 fffff800`02ff947c : nt!KiDoubleFaultAbort+0xb2
fffff800`02ff947c : 00000000`00000000 fffff880`009ab080 00000000`00000000 00000000`00000000 : nt!MiExpandNonPagedPool+0x14
fffff800`02ffbf26 : fffff800`030586c0 00000000`00000003 00000000`00000000 fffff880`049f9c05 : nt!MiAllocatePoolPages+0xdfd
fffff880`04a1ea55 : 00000000`00000000 00000000`00000001 fffff880`009ab2b8 fffff880`00000000 : nt!ExAllocatePoolWithTag+0x316
fffff880`04a1b6e8 : fffffa80`05b75000 00000000`00000002 00000000`00000002 fffffa80`036a7000 : nvlddmkm+0x1bfa55
fffff880`04ae392a : fffff880`009ab318 fffffa80`00000018 fffffa80`036a7000 fffffa80`05b75000 : nvlddmkm+0x1bc6e8
fffff880`04b9f804 : 00000000`00100005 00000000`00000000 00000000`00100006 fffffa80`05b75000 : nvlddmkm+0x28492a
fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340804
fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827

x BlueRobot · Jul 9, 2014

Did you check the stack command field?

Jared · Jul 9, 2014

Yeah there's nothing else unless I'm looking in the wrong place

x BlueRobot · Jul 9, 2014

Checked here?

Jared · Jul 9, 2014

Yeah nothing else.

x BlueRobot · Jul 10, 2014

That's strange, but I suspect that it's the context of the trap frame and not a thread.

0x7F memory leak

Jared

Sysnative Staff, BSOD Kernel Dump Expert

x BlueRobot

Administrator

Jared

Sysnative Staff, BSOD Kernel Dump Expert

x BlueRobot

Administrator

Jared

Sysnative Staff, BSOD Kernel Dump Expert

x BlueRobot

Administrator

Jared

Sysnative Staff, BSOD Kernel Dump Expert

x BlueRobot

Administrator

Jared

Sysnative Staff, BSOD Kernel Dump Expert

x BlueRobot

Administrator