Friday, 20 June 2014

0x1A debugging

This is my first blog so I'll see how this goes.

So what exactly is this blog about?

Well on a daily basis I go on forums and help people out with what is commonly known as the Blue Screen of Death or BSOD, I like to go in detail to analyse the exact cause.

I will try to explain everything as best as I can throughout this blog to try and help you understand what I'm rambling on about.

I thought I'd decide to look through some old Kernel Memory Dump files in my downloads and see what I can find.
Lets begin.

BugCheck 1A, {41790, fffffa80015c69c0, ffff, 0}
So what does this bugcheck mean?
I can guess what you're probably thinking if you're new to debugging

"It just looks like a bunch of random numbers and letters. How can you work with that?"

Well in Windows Debugger we can use the !analyze -v command to make some sense.

MEMORY_MANAGEMENT (1a)
    # Any other values for parameter 1 must be individually examined.
Arguments:
Arg1: 0000000000041790, A page table page has been corrupted. On a 64 bit OS, parameter 2
contains the address of the PFN for the corrupted page table page.
On a 32 bit OS, parameter 2 contains a pointer to the number of used
PTEs, and parameter 3 contains the number of used PTEs.
Arg2: fffffa80015c69c0
Arg3: 000000000000ffff
Arg4: 0000000000000000
So what does this mean?

Well, to put it simply a page table page has been corrupt.

What's a Page Table?

A page table is the data structure which maps virtual memory addresses to physical memory address stored in RAM, it helps manage these entries by making the memory look like a flat continuous line of virtual addresses when in fact these addresses could be spread out all over the place.

So the second parameter contains the address of the Page Frame Number for the corrupted table Page Table Page.

A Page Frame Database is a way to track physical pages of memory, it keeps track of pages allocate to working sets, free, available etc.
So it's an efficient way for the memory manager to know which pages are in use and which are available to use.

So lets take a look at the Page Table Page that's been corrupted.

2: kd> dt nt!_MMPFN fffffa80015c69c0
   +0x000 u1               : <unnamed-tag>
   +0x008 u2               : <unnamed-tag>
   +0x010 PteAddress       : 0xfffff6fb`400001e0 _MMPTE
   +0x010 VolatilePteAddress : 0xfffff6fb`400001e0 Void
   +0x010 Lock             : 0n1073742304
   +0x010 PteLong          : 0xfffff6fb`400001e0
   +0x018 u3               : <unnamed-tag>
   +0x01c UsedPageTableEntries : 0xffff
   +0x01e VaType           : 0 ''
   +0x01f ViewCount        : 0 ''
   +0x020 OriginalPte      : _MMPTE
   +0x020 AweReferenceCount : 0n128
   +0x028 u4               : <unnamed-tag>
This indicates that the used paged table entry count has actually fallen below zero which is normally caused by drivers calling the MmUnlockPages function too many times on a linked list data structure.

So lets look at the callstack which contains a list of functions before the bugcheck.
The callstack contains all functions made starting at the bottom and working its way up to the most recent, it's basically like a timeline.

2: kd> knL
 # Child-SP          RetAddr           Call Site
00 fffff880`138be698 fffff800`03b45d50 nt!KeBugCheckEx <-- BSOD
01 fffff880`138be6a0 fffff800`03b077d9 nt! ?? ::FNODOBFM::`string'+0x35084
02 fffff880`138be860 fffff800`03dee0f1 nt!MiRemoveMappedView+0xd9
03 fffff880`138be980 fffff960`00099d06 nt!MiUnmapViewOfSection+0x1b1
04 fffff880`138bea40 fffff960`002c194b win32k!EngUnmapFontFileFD+0x8a
05 fffff880`138beab0 fffff960`00288ade win32k!ttfdSemDestroyFont+0x8b
06 fffff880`138beae0 fffff960`00286d0a win32k!PDEVOBJ::DestroyFont+0xf2
07 fffff880`138beb50 fffff960`000a933f win32k!RFONTOBJ::vDeleteRFONT+0x4a
08 fffff880`138bebc0 fffff960`000a8d73 win32k!RFONTOBJ::bMakeInactiveHelper+0x427
09 fffff880`138bec40 fffff960`000aa324 win32k!RFONTOBJ::vMakeInactive+0xa3
0a fffff880`138bece0 fffff960`00062a95 win32k!RFONTOBJ::bInit+0x1ec
0b fffff880`138bee00 fffff960`0006223f win32k!GreExtTextOutWLocked+0x7e5
0c fffff880`138bf220 fffff960`00062125 win32k!GreExtTextOutWInternal+0x10f
0d fffff880`138bf2d0 fffff960`00055c37 win32k!GreExtTextOutW+0x3d
0e fffff880`138bf330 fffff960`0006c90a win32k!DrawIt+0xd7
0f fffff880`138bf390 fffff960`00067560 win32k!DrawFrameControl+0x324
10 fffff880`138bf4b0 fffff960`00067224 win32k!CreateBitmapStrip+0x308
11 fffff880`138bf510 fffff960`00073677 win32k!xxxSetWindowNCMetrics+0x354
12 fffff880`138bf790 fffff960`00072e6e win32k!xxxUpdatePerUserSystemParameters+0x7f3
13 fffff880`138bfbf0 fffff800`03ad3e53 win32k!NtUserUpdatePerUserSystemParameters+0x2a
14 fffff880`138bfc20 00000000`76ea3d4a nt!KiSystemServiceCopyEnd+0x13
15 00000000`00abf7b8 00000000`00000000 0x76ea3d4a
 So we see a lot of win32k functions which may or may not be related to the bugcheck.

Win32k.sys is a Kernel Mode device driver that contains the window manager, graphics device interface and wrappers for DirectX support.

The window manager controls all windows, screen output displays, mouse and keyboard inputs as well as passing information to user mode applications.

The Graphics Device Interface (GDI) is a library of functions for graphics device output devices, it communicates via device drivers.
Basically applications call user mode functions for requests such as windows and buttons. The window manager communicates these requests to the GDI which are sent formatted and sent to the device driver, the device driver is then paired up with a video miniport driver to complete the display display output.

So there's not much revealing in the callstack but there's something else that sticks out.
WARNING: !chkimg output was truncated to 50 lines. Invoke !chkimg without '-lo [num_lines]' to view  entire output.
Page 31a1d9 not present in the dump file. Type ".hh dbgerr004" for details
Page 3199f1 not present in the dump file. Type ".hh dbgerr004" for details
Page 31a167 not present in the dump file. Type ".hh dbgerr004" for details
483 errors : !win32k (fffff96000056248-fffff9600023e2b9)
What does this gibberish mean?

Well !chkimg is a way of copying executable images such as .dll, .exes etc to memory whenever a process is ran, this prevents the files from disk being altered. It's a little more complicated than that which I need to look into but that's the basics.

Now these images can be corrupted for various reasons but we're not getting too many clues besides possibly bad RAM.

I decided to look at all the IRPs present in the system to see if anything else cropped up.

fffffa801620dd00 [fffffa8016215580] irpStack: ( e, 0)  fffffa800b17ae40 [ \Driver\aswNdisFlt]
fffffa8016210a60 [fffffa8016241060] irpStack: ( e, 0)  fffffa800b17ae40 [ \Driver\aswNdisFlt]
fffffa8016213680 [fffffa8016243060] irpStack: ( e, 0)  fffffa800b17ae40 [ \Driver\aswNdisFlt]
fffffa80162162f0 [fffffa8016243640] irpStack: ( e, 0)  fffffa800b17ae40 [ \Driver\aswNdisFlt]
fffffa8016219400 [fffffa8016244b50] irpStack: ( e, 0)  fffffa800b17ae40 [ \Driver\aswNdisFlt]
fffffa8016219c60 [fffffa801551d060] irpStack: ( c, 2)  fffffa8014813030 [ \FileSystem\Ntfs]
fffffa801621aa50 [fffffa8016246640] irpStack: ( e, 0)  fffffa800b17ae40 [ \Driver\aswNdisFlt]
fffffa801621eb10 [fffffa801551d060] irpStack: ( c, 2)  fffffa8014813030 [ \FileSystem\Ntfs]
fffffa80162207b0 [fffffa8016006b50] irpStack: ( e, 0)  fffffa800b044ba0 [ \FileSystem\FltMgr]
fffffa80162209e0 [fffffa8016006b50] irpStack: ( e, 0)  fffffa800b044ba0 [ \FileSystem\FltMgr]
fffffa8016228c60 [fffffa801551d060] irpStack: ( c, 2)  fffffa8014813030 [ \FileSystem\Ntfs]
fffffa8016237ee0 [fffffa8016247640] irpStack: ( d, 0)  fffffa800b1a3df0 [ \FileSystem\Npfs]
fffffa8016238a00 [fffffa8016248640] irpStack: ( e, 0)  fffffa800b17ae40 [ \Driver\aswNdisFlt]
fffffa80162399b0 [fffffa8016246b50] irpStack: ( e, 0)  fffffa800b17ae40 [ \Driver\aswNdisFlt]
fffffa801623e6d0 [fffffa8016244060] irpStack: ( e, 0)  fffffa800b17ae40 [ \Driver\aswNdisFlt]
aswNdisFlt is the Avast Anti Virus firewall driver that appears to be calling a lot of IRPs which makes me believe this is part of the problem, given that Avast is problematic.

I won't post more code as it fills the page but I'm seeing a lot of HID USB and other USB IRPs being called, not only that but Logitech drivers specifically the keyboard and possibly mouse causing the problems.

Finally Intel Rapid Storage Technology is being flagged which isn't surprising given that this driver is very problematic.

With all this said it looks like the cause is strong with possibly a mixture of causes and possibly bad RAM which can easily be tested using Memtest86+.

No comments:

Post a Comment