Tuesday, 29 July 2014

0x7F (memory leak)

In this post, we will be looking at a memory leak caused by a program called NotMyFault which is supplied by Sysinternals, they have some excellent tools you should check out if interested.
To download NotMyFault then here's the link.


Let's take a look.
BugCheck 7F, {8, 80050033, 406f8, fffff80002e69f2c}
This bugcheck indicates the Kernel encountered a trap which it's not allowed to catch, this means that it cannot be resolved and must bugcheck. In this case the cause of the crash was a double fault, this cannot be resolved and crashes the system.
A double fault occurs when an exception is takes place during the processing of another exception,  if an exception occurs when processing a double fault a triple fault can occur.

So looking at the callstack this is what we see, do note this is only a small snippet as the callstack is massive with repeats of Nvidia driver functions at the same address.

fffff880`02fddce8 fffff800`02ec7169 : 00000000`0000007f 00000000`00000008 00000000`80050033 00000000`000406f8 : nt!KeBugCheckEx
fffff880`02fddcf0 fffff800`02ec5632 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiBugCheckDispatch+0x69
fffff880`02fdde30 fffff800`02e69f2c : fffffa80`035d4000 00000000`00000000 00000000`00000000 fffff800`02ff947c : nt!KiDoubleFaultAbort+0xb2
fffff880`009ab000 fffff800`02ff947c : 00000000`00000000 fffff880`009ab080 00000000`00000000 00000000`00000000 : nt!MiExpandNonPagedPool+0x14
fffff880`009ab020 fffff800`02ffbf26 : fffff800`030586c0 00000000`00000003 00000000`00000000 fffff880`049f9c05 : nt!MiAllocatePoolPages+0xdfd
fffff880`009ab160 fffff880`04a1ea55 : 00000000`00000000 00000000`00000001 fffff880`009ab2b8 fffff880`00000000 : nt!ExAllocatePoolWithTag+0x316
fffff880`009ab250 fffff880`04a1b6e8 : fffffa80`05b75000 00000000`00000002 00000000`00000002 fffffa80`036a7000 : nvlddmkm+0x1bfa55
fffff880`009ab280 fffff880`04ae392a : fffff880`009ab318 fffffa80`00000018 fffffa80`036a7000 fffffa80`05b75000 : nvlddmkm+0x1bc6e8
fffff880`009ab2e0 fffff880`04b9f804 : 00000000`00100005 00000000`00000000 00000000`00100006 fffffa80`05b75000 : nvlddmkm+0x28492a
fffff880`009ab310 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340804
fffff880`009ab350 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
fffff880`009ab390 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
fffff880`009ab3d0 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
fffff880`009ab410 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
fffff880`009ab450 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
fffff880`009ab490 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
fffff880`009ab4d0 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
 So what is happening is the Nvidia driver is being blamed (probably due to it being in the stack when the last context was saved which was the exception) and is calling lots of function with what appears to be allocating more pages until a double fault in initiated, I suspect the double fault occurred due to memory not being able to be allocated which caused an exception then another exception occurred.

So looking at the virtual memory usage we can see the following.

3: kd> !vm

*** Virtual Memory Usage ***
    Physical Memory:     1036418 (   4145672 Kb)
    Page File: \??\C:\pagefile.sys
      Current:   4145672 Kb  Free Space:   3702732 Kb
      Minimum:   4145672 Kb  Maximum:     12437016 Kb
    Available Pages:      100902 (    403608 Kb)
    ResAvail Pages:       209219 (    836876 Kb)
    Locked IO Pages:           0 (         0 Kb)
    Free System PTEs:   33504448 ( 134017792 Kb)
    Modified Pages:         4479 (     17916 Kb)
    Modified PF Pages:      4364 (     17456 Kb)
    NonPagedPool Usage:   764909 (   3059636 Kb)
    NonPagedPool Max:     764972 (   3059888 Kb)
    ********** Excessive NonPaged Pool Usage *****

We can see that the non paged pool memory has been completed depleted which caused the system to crash.
Now you might be asking, can't it just put the memory onto disk to stop it crashing?
Well moving memory from RAM onto disk is known as paging which is used to save space when the memory usage is high. However, Kernel memory is mainly divided into two main categories:

-Paged Pool
-Non Paged Pool

Paged pool is for applications and other memory allocations that when not in use can be moved to disk to save storage space, non paged pool on the other hand can't be moved to disk under any circumstances as device drivers and other critical operating system components use these dynamic memory allocations to function correctly, they must be available immediately for use.

So why can't they just page the memory back from disk when needed?

Well it's not that simple, paging can be very expensive in that it takes time and puts a lot of pressure on the drive which is much slower than RAM.
Not only that but the IRQL must be at 1 or below in order to page files, when the IRQL is higher than 1 paging is not allowed. Just say for example we get a system call that needs servicing quickly at an IRQL of 7 for example, that may need the device driver to perform certain tasks but it can't because it's paged out, we can't page it in because the IRQL needs to be at 1 or below.
Now we can't just lower the IRQL because the higher the IRQL the higher the priority which causes a bugcheck of 0xA or 0xD1.

So why is the memory being leaked and what is it?

A memory leak occur when an object acquires memory but doesn't free it after it's being used which prevents those pages from being allocated as they need to be freed but they're not in use.
If the object keeps calling ExAllocatePool then it keeps allocating memory but not using it, just because they're not in use doesn't mean they can be used by anything else.
So when the last of the non paged memory pools have been used up the system cannot function anymore as critical objects cannot allocate memory to function so the system crashes.

We can look at the assembly instructions to see what is happening.

3: kd> .trap fffff880`02fdde30
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=00000000000bac2c rbx=0000000000000000 rcx=0000000000000001
rdx=fffff880009ab0b8 rsi=0000000000000000 rdi=0000000000000000
rip=fffff80002e69f2c rsp=fffff880009ab000 rbp=fffff880009ab080
 r8=ffffffffffffffff  r9=fffffa80035eb5b8 r10=00000000ffffffff
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl zr na po nc
fffff800`02e69f2c 4156            push    r14
 So it's calling a function which I believe tries to expand non paged pool to allow objects to allocate it as it might be too small for use.

 3: kd> u nt!MiExpandNonPagedPool+0x14
fffff800`02e69f2c 4156            push    r14
fffff800`02e69f2e 4157            push    r15
fffff800`02e69f30 4881ecd0000000  sub     rsp,0D0h
fffff800`02e69f37 488db9ff010000  lea     rdi,[rcx+1FFh]
fffff800`02e69f3e 4881e700feffff  and     rdi,0FFFFFFFFFFFFFE00h
fffff800`02e69f45 483bf9          cmp     rdi,rcx
fffff800`02e69f48 0f82cfbffeff    jb      nt! ?? ::FNODOBFM::`string'+0x1e009 (fffff800`02e55f1d)
fffff800`02e69f4e 488b0513762100  mov     rax,qword ptr [nt!MiSystemVaTypeCount+0x28 (fffff800`03081568)]
 So here we can see push instructions which adds data onto the stack but because there is no more memory left it stops adding data and crashes.

No comments:

Post a Comment