Monday, 22 September 2014

Device/Driver Objects and Stacks

Today I thought I'd write a bit about device stacks and driver stacks and how they implement IRPs.
I'm not going into detail on how drivers function and the types of drivers as I would be here all day so I'll save that for another time.

What is a device object and a driver object?

A device object is an opaque structure that represents a device or function. A device object is an instance of the DEVICE_OBJECT data structure which is used by the operating system to represent a device.
Some device objects don't always represent a physical device, they can represent a logical device.

A driver object is just a Kernel image to represent the Kernel mode driver which includes a pointer to the driver's routines.
When a driver initialises it creates a device object to represent physical or logical devices.

Device stacks and Device nodes

The Kernel organises drivers into a tree structure called the Plug and Play device tree containing device nodes that represent devices, do note that some nodes represent software components which don't have any physical devices attached to them.

A device stack contains a PDO (Physical Device Object) which represents the physical device connected to a physical bus on the motherboard, in this case I'll talk about the PCI bus as an example.
The PCI bus enumerates the child devices which are connected to the PCI bus on the motherboard, this creates the PDO for each device and is then represented by a device node in the PnP device tree.
Do note that depending on your perspective determines what type of driver the pci.sys driver is, for example if you're looking at the PCI bus device node then it's the function driver but if you're looking at one of the PCI bus node child devices associated with it then it's the bus driver.

After the device node has been associated with the new PDO the PnP manager then searches the registry for the driver(s) which needs to be part of the device stack, these drivers are called Function Drivers.

Here's a small point about the drivers usually found in device stacks:

  • Bus drivers detect and inform the PnP manager about its devices on its bus as well as controlling the power to the bus. There is only allowed to be one bus driver at once and Microsoft normally supplies them.
  • Function drivers on the other hand are the main driver that represents the device and performs the basic operations for reading and writing, it's the driver that knows the most about its device.
  • Filter drivers modify the device behaviour when needed and it's located above or below the function driver. It normally fixes errors that are detected before it reaches the function driver on the stack.

0: kd> !devstack fffffa8004615680
  !DevObj           !DrvObj            !DevExt           ObjectName
  fffffa8005f8c150  \DRIVER\VERIFIER_FILTERfffffa8005f8c2a0 
  fffffa8005f8c390 *** ERROR: Module load completed but symbols could not be loaded for GEARAspiWDM.sys
  fffffa8005f202e0  \DRIVER\VERIFIER_FILTERfffffa8005f20430 
  fffffa8005eec060  \Driver\cdrom      fffffa8005f77b80  CdRom0
  fffffa80057379b0  \Driver\ACPI       fffffa80047f6a00 
> fffffa8004615680  \Driver\atapi      fffffa80046157d0  IdeDeviceP0T1L0-5
!DevNode fffffa8005742900 :
  DeviceInst is "IDE\CdRomATAPI_iHAS124___B_______________________AL0R____\5&f437ab5&0&0.1.0"
  ServiceName is "cdrom"

This is the device stack for the cd drive in the computer which shows the associated device objects and driver objects within it.

  • Atapi provides the interface to enable support for cd players.
  • ACPI is the bus filter driver that enables Power Management for the operating system so when devices are not in use (In this case the cd player) it will be powered off.
  • cdrom is the function driver for the cd drive that allows discs to be read and written to.
  • GEARAspiWDM.sys is the cdrom 3rd party filter driver.
  • VERIFIER_FILTER are filter drivers used by Driver Verifier which is enabled to monitor driver routines and operations to make sure everything is working correctly.

For more information on Driver Verifier see here: Driver Verifier (Windows Drivers)

Driver Stacks are determined by how many drivers are present when processing an IRP by passing it down a device stack or in some cases multiple device stacks.
A driver object can be associated with multiple different device objects and therefore lots of device stacks, this shows that an IRP can be passed down lots of device stacks but only being serviced by a few drivers.

0: kd> !drvobj \Driver\ACPI
Driver object (fffffa80039a6af0) is for:
Driver Extension List: (id , addr)

Device Object list:
fffffa80057379b0  fffffa800573a9b0  fffffa80057399b0  fffffa800572fc20
fffffa800572fe40  fffffa800572ea00  fffffa800572ec20  fffffa800572ee40
fffffa800572da00  fffffa800572dc20  fffffa800572de40  fffffa8005819e40
fffffa8005814e40  fffffa800580fe40  fffffa800572a9b0  fffffa80057289b0
fffffa8005720e40  fffffa800571c920  fffffa800571cb20  fffffa800571bc40
fffffa800571be40  fffffa8005713bc0  fffffa8005700e40  fffffa80056ffa40
fffffa80056ffc40  fffffa80056ffe40  fffffa80056fea40  fffffa80056fec40
fffffa80056fee40  fffffa80056fda40  fffffa80056fdc40  fffffa80056fde40
fffffa80056fca40  fffffa80056fcc40  fffffa80056fce40  fffffa8004616330
fffffa8004616040  fffffa8004616c20  fffffa8004616e40  fffffa80047f9770
fffffa80039eadb0  fffffa8004be1060  fffffa80047fe170  fffffa80047fe390
fffffa80047fe5b0  fffffa80047fe7d0  fffffa80047fe9f0  fffffa80047fec10

So as proven here we can clearly see that the ACPI.sys driver is associated with a lot of device objects as it can't just represent one device otherwise one hardware component would use ACPI and everything else would be powered on all the time, think about how many USB devices would be turned on.
So our CD drive is just one component that uses ACPI.

Finally we can see information about the IRP being sent by looking at the IRP data structure.

0: kd> dt nt!_IRP fffff9801c458dc0
   +0x000 Type             : 0n6
   +0x002 Size             : 0x238
   +0x008 MdlAddress       : (null)
   +0x010 Flags            : 0x40000000
   +0x018 AssociatedIrp    : <unnamed-tag>
   +0x020 ThreadListEntry  : _LIST_ENTRY [ 0xfffff980`1c458de0 - 0xfffff980`1c458de0 ]
   +0x030 IoStatus         : _IO_STATUS_BLOCK
   +0x040 RequestorMode    : 0 ''
   +0x041 PendingReturned  : 0 ''
   +0x042 StackCount       : 5 ''
   +0x043 CurrentLocation  : 1 ''
   +0x044 Cancel           : 0 ''
   +0x045 CancelIrql       : 0 ''
   +0x046 ApcEnvironment   : 0 ''
   +0x047 AllocationFlags  : 0x80 ''
   +0x048 UserIosb         : (null)
   +0x050 UserEvent        : (null)
   +0x058 Overlay          : <unnamed-tag>
   +0x068 CancelRoutine    : (null)
   +0x070 UserBuffer       : (null)
   +0x078 Tail             : <unnamed-tag>

Some of the entries are pretty obvious from the name and some aren't documented, the ones that are can be found here:

IRP (Windows Drivers)

Power IRPs

I found an old dump file which was a 0x9F Kernel dump file caused by a power IRP not synchronising with the pnp manager.
Power IRPs are used to change the power state for a device and therefore they must reach the bottom of the device stack which is the physical device object.

A driver has failed to complete a power IRP within a specific time.
Arg1: 0000000000000004, The power transition timed out waiting to synchronize with the Pnp

Arg2: 0000000000000258, Timeout in seconds.
Arg3: fffffa8007005660, The thread currently holding on to the Pnp lock.
Arg4: fffff800053e83d0, nt!TRIAGE_9F_PNP on Win7 and higher

So we can see our 0x9F bugcheck with a power IRP failing to synchronise with the PnP manager because the IRP hasn't reached the bottom of the stack.

0: kd> !locks
KD: Scanning for held locks..

Resource @ nt!IopDeviceTreeLock (0xfffff80003492ce0)    Shared 1 owning threads
    Contention Count = 1
     Threads: fffffa8007005660-01<*>
KD: Scanning for held locks.

Resource @ nt!PiEngineLock (0xfffff80003492be0)    Exclusively owned
    Contention Count = 21
    NumberOfExclusiveWaiters = 1
     Threads: fffffa8007005660-01<*>
     Threads Waiting On Exclusive Access:

KD: Scanning for held locks............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
18855 total locks, 2 locks currently held

We can see two locks have been held, IopDeviceTreeLock is to synchronise the device tree as a spinlock and the PiEngineLock which is a pnp and power management lock. The PiEngineLock is being owned by the ZTEusbnet driver in order to pass down the power IRP.

0: kd> !thread fffffa80`07005660
THREAD fffffa8007005660  Cid 0004.0048  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertable
    fffffa800d035ee8  NotificationEvent
IRP List:
    fffffa8008f5cc10: (0006,03e8) Flags: 00000000  Mdl: 00000000
Not impersonating
DeviceMap                 fffff8a000008c10
Owning Process            fffffa8006f8d890       Image:         System
Attached Process          N/A            Image:         N/A
Wait Start TickCount      396427         Ticks: 38463 (0:00:10:00.026)
Context Switch Count      44059          IdealProcessor: 2  NoStackSwap
UserTime                  00:00:00.000
KernelTime                00:00:00.343
Win32 Start Address nt!ExpWorkerThread (0xfffff80003298150)
Stack Init fffff88003bd2c70 Current fffff88003bd2280
Base fffff88003bd3000 Limit fffff88003bcd000 Call 0
Priority 15 BasePriority 12 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
fffff880`03bd22c0 fffff800`032845f2 : fffffa80`07005660 fffffa80`07005660 00000000`00000000 00000000`00000000 : nt!KiSwapContext+0x7a
fffff880`03bd2400 fffff800`0329599f : fffffa80`0d0df208 fffff880`0ae9e10b fffffa80`00000000 00000000`00000000 : nt!KiCommitThreadWait+0x1d2
fffff880`03bd2490 fffff880`0ae915dd : fffffa80`0d035000 00000000`00000000 fffffa80`0dd8ca00 00000000`00000000 : nt!KeWaitForSingleObject+0x19f
fffff880`03bd2530 fffff880`0ae92627 : fffffa80`0d035000 00000000`00000000 fffffa80`0c0891a0 fffff880`03bd2670 : ZTEusbnet+0x35dd
fffff880`03bd2580 fffff880`0215d809 : fffffa80`0c0891a0 fffff880`020f0ecd fffff880`03bd2670 fffffa80`091c5550 : ZTEusbnet+0x4627
fffff880`03bd25b0 fffff880`0215d7d0 : fffffa80`091c54a0 fffffa80`0c0891a0 fffff880`03bd2670 fffffa80`08fc2ac0 : ndis!NdisFDevicePnPEventNotify+0x89
fffff880`03bd25e0 fffff880`0215d7d0 : fffffa80`08fc2a10 fffffa80`0c0891a0 fffffa80`091f9010 fffffa80`091f90c0 : ndis!NdisFDevicePnPEventNotify+0x50
fffff880`03bd2610 fffff880`0219070c : fffffa80`0c0891a0 00000000`00000000 00000000`00000000 fffffa80`0c0891a0 : ndis!NdisFDevicePnPEventNotify+0x50
fffff880`03bd2640 fffff880`021a1da2 : 00000000`00000000 fffffa80`08f5cc10 00000000`00000000 fffffa80`0c0891a0 : ndis! ?? ::LNCPHCLB::`string'+0xddf
fffff880`03bd26f0 fffff800`034fb121 : fffffa80`091c7060 fffffa80`0c089050 fffff880`03bd2848 fffffa80`070bfa00 : ndis!ndisPnPDispatch+0x843
fffff880`03bd2790 fffff800`0367b3a1 : fffffa80`070bfa00 00000000`00000000 fffffa80`0dc19990 fffff880`03bd2828 : nt!IopSynchronousCall+0xe1
fffff880`03bd2800 fffff800`03675d78 : fffffa80`09196e00 fffffa80`070bfa00 00000000`0000030a 00000000`00000308 : nt!IopRemoveDevice+0x101
fffff880`03bd28c0 fffff800`0367aee7 : fffffa80`0dc19990 00000000`00000000 00000000`00000003 00000000`00000136 : nt!PnpSurpriseRemoveLockedDeviceNode+0x128
fffff880`03bd2900 fffff800`0367b000 : 00000000`00000000 fffff8a0`11d1c000 fffff8a0`049330d0 fffff880`03bd2a58 : nt!PnpDeleteLockedDeviceNode+0x37
fffff880`03bd2930 fffff800`0370b97f : 00000000`00000002 00000000`00000000 fffffa80`09122010 00000000`00000000 : nt!PnpDeleteLockedDeviceNodes+0xa0
fffff880`03bd29a0 fffff800`0370c53c : fffff880`03bd2b78 fffffa80`114ab700 fffffa80`07005600 fffffa80`00000000 : nt!PnpProcessQueryRemoveAndEject+0x6cf
fffff880`03bd2ae0 fffff800`035f573e : 00000000`00000000 fffffa80`114ab7d0 fffff8a0`123a25b0 00000000`00000000 : nt!PnpProcessTargetDeviceEvent+0x4c
fffff880`03bd2b10 fffff800`03298261 : fffff800`034f9f88 fffff8a0`11d1c010 fffff800`034342d8 fffff800`034342d8 : nt! ?? ::NNGAKEGL::`string'+0x54d9b
fffff880`03bd2b70 fffff800`0352b2ea : 00000000`00000000 fffffa80`07005660 00000000`00000080 fffffa80`06f8d890 : nt!ExpWorkerThread+0x111
fffff880`03bd2c00 fffff800`0327f8e6 : fffff880`03965180 fffffa80`07005660 fffff880`0396ffc0 00000000`00000000 : nt!PspSystemThreadStartup+0x5a
fffff880`03bd2c40 00000000`00000000 : fffff880`03bd3000 fffff880`03bcd000 fffff880`03bd2410 00000000`00000000 : nt!KxStartSystemThread+0x16

0: kd> !irp fffffa8008f5cc10
Irp is active with 10 stacks 10 is current (= 0xfffffa8008f5cf68)
 No Mdl: No System Buffer: Thread fffffa8007005660:  Irp stack trace. 
     cmd  flg cl Device   File     Completion-Context
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
>[ 1b,17]   0  0 fffffa800c089050 00000000 00000000-00000000   
            Args: 00000000 00000000 00000000 00000000

I'm not sure why but the ZTEusbnet driver isn't processing the power IRP, it's just leaving it and that's what caused the system to crash.
It'd be nice to know exactly why it didn't pass the power IRP on.
I'm not suprised though given the date of the driver.

0: kd> !devstack fffffa800c089050
  !DevObj           !DrvObj            !DevExt           ObjectName
> fffffa800c089050  \Driver\ZTEusbnet  fffffa800c0891a0  NDMP14
  fffffa80070bfa00  \Driver\usbccgp    fffffa80070bfb50  000000a8
!DevNode fffffa800dc19990 :
  DeviceInst is "USB\VID_19D2&PID_0063&MI_04\6&200b5242&0&0004"
  ServiceName is "ZTEusbnet"

We can see that it was meant to pass the power IRP down to the USB common class generic parent driver which, to put it simply exposes each USB composite device in order to seperate it to a single device. Passing it down to the USB bus driver should change the power state.

0: kd> lmvm ZTEusbnet
start             end                 module name
fffff880`0ae8e000 fffff880`0aebc000   ZTEusbnet   (no symbols)          
    Loaded symbol image file: ZTEusbnet.sys
    Image path: \SystemRoot\system32\DRIVERS\ZTEusbnet.sys
    Image name: ZTEusbnet.sys
    Timestamp:        Mon Oct 13 06:50:10 2008 (48F2E192)
    CheckSum:         000329ED
    ImageSize:        0002E000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

This is a dump file from quite a while ago but if memory serves me correctly I think an update solved the issue.

Any other questions feel free to ask, I believe I've covered most things without going into detail about drivers.

Sources: Device nodes and device stacks (Windows Drivers)
Driver stacks (Windows Drivers)

Thursday, 18 September 2014


I've not posted in a while but I found an interesting case on a forum and managed to acquire a Kernel memory dump.
I'm not going into detail about DPCs or interrupts as I have made blog posts on these in the past.

The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arg1: 0000000000000000, A single DPC or ISR exceeded its time allotment. The offending
    component can usually be identified with a stack trace.
Arg2: 0000000000000501, The DPC time count (in ticks).
Arg3: 0000000000000500, The DPC time allotment (in ticks).
Arg4: 0000000000000000

So here it states that we encountered a DPC which exceeded the allocated time for it to finish executing. The problem is that it went over this time, and as stated before DPCs can hold up the system when taking too long to execute which can result in lagging, a slow system or even sound cutting out.

So lets look at our stack trace.

ffffd001`50c93c98 fffff800`9238bcc2 : 00000000`00000133 00000000`00000000 00000000`00000501 00000000`00000500 : nt!KeBugCheckEx
ffffd001`50c93ca0 fffff800`92271115 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff801`dceabf17 : nt! ?? ::FNODOBFM::`string'+0x18b12
ffffd001`50c93d30 fffff800`929a07b5 : ffffe001`00400a02 fffff800`922fcae6 fffff801`daed3cf8 ffffe001`00008201 : nt!KeClockInterruptNotify+0x95
ffffd001`50c93f40 fffff800`922e80e3 : ffffd001`50c93f60 00000000`00000008 ffff5377`5487cf7d 00000000`0000000c : hal!HalpTimerClockIpiRoutine+0x15
ffffd001`50c93f70 fffff800`9236412a : ffffe001`9c600500 ffffe001`9e8de1a0 00000000`00000000 00000000`00000000 : nt!KiCallInterruptServiceRoutine+0xa3
ffffd001`50c93fb0 fffff800`92364a9b : 44454c49`4146203a 696c6564`206f7420 6e657665`20726576 20212121`20352074 : nt!KiInterruptSubDispatchNoLockNoEtw+0xea
ffffd001`50c853a0 fffff800`922e8383 : ffffe001`9e92d030 ffffe001`9e968030 00000000`02290a8d 00000000`00000018 : nt!KiInterruptDispatchNoLockNoEtw+0xfb
ffffd001`50c85530 fffff801`dcfa5751 : ffffe001`9e66e7a0 ffffe001`00000000 ffffe001`9e92fbe0 00000000`fffff850 : nt!KeAcquireSpinLockRaiseToDpc+0x13
ffffd001`50c85560 fffff801`dcfa531d : ffffe001`9e96b840 fffff801`dcf2c48f ffffe001`9eb68490 fffff801`dcf2c550 : athwbx+0x161751
ffffd001`50c855f0 fffff801`dcf60c42 : ffffe001`9e96b840 ffffd001`50c85650 ffffd001`50c85654 00000000`00000000 : athwbx+0x16131d
ffffd001`50c85630 fffff801`dcf33472 : ffffe001`9e9bf030 fffff801`00000000 ffffd001`50c856d0 fffff801`dd074319 : athwbx+0x11cc42
ffffd001`50c85680 fffff801`dd0c129f : ffffe001`9e9bf030 ffffffff`ffffffff ffffe001`9e6d97e8 fffff801`dd011189 : athwbx+0xef472
ffffd001`50c856f0 fffff801`dd08679e : ffffe001`9e968030 00000000`00000000 00000000`00000000 00000000`00000000 : athwbx+0x27d29f
ffffd001`50c85720 fffff801`dae9e81e : ffffe001`9e961030 00000000`00000000 ffffd001`50c85790 00000000`00000000 : athwbx+0x24279e
ffffd001`50c85760 fffff800`92252130 : ffffd001`50c85b00 00000000`00000000 00000000`00000200 fffff800`92274ae0 : ndis!ndisInterruptDpc+0x269ce
ffffd001`50c85860 fffff800`9225134b : ffffd001`50c5c180 ffffe001`9e8f4010 ffffe001`9c46b900 ffffe001`a12f3080 : nt!KiExecuteAllDpcs+0x1b0
ffffd001`50c859b0 fffff800`923667ea : ffffd001`50c5c180 ffffd001`50c5c180 ffffd001`50c682c0 ffffe001`9dbbb540 : nt!KiRetireDpcList+0xdb
ffffd001`50c85c60 00000000`00000000 : ffffd001`50c86000 ffffd001`50c80000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x5a

So in this callstack we see our processor in an idle loop, when idle it tends to execute any DPCs if there are any waiting in the DPC queue.
It begins to execute all the DPCs in the queue (also known as draining) when get execute an [B]ndis dpc interrupt[/B], this begins to call network functions and then acquire a spinlock and raise to DPC/Dispatch IRQL level if it hasn't already (this is the standard routine that is used, I can't remember if it is required), we then recieve more interrupts followed by a clock interrupt and a bugcheck.

Okay so we know that we bugchecked because a DPC was taking too long to finish executing and risk holding up the system, especially where spinlocks are concerned.

The main thing that interests me is why is there a clock interrupt?

3: kd> !dpcs
CPU Type      KDPC       Function
 3: Normal  : 0xffffe0019e66e880 0xfffff801dae78eb0 ndis!ndisMTimerObjectDpc
 3: Normal  : 0xffffd00150c61668 0xfffff80092327b28 nt!PpmPerfAction
 3: Normal  : 0xffffd0015589a280 0xfffff80092258854 nt!PopExecuteProcessorCallback
 3: Threaded: 0xffffd00150c617c0 0xfffff8009231a0a0 nt!KiDpcWatchdog

I believe the ndis dpc interrupt is related to this timer object but I may be wrong, if it is related then the clock interrupt makes sense as the system requires intervals for clock interrupts to take place in order to keep track of system time and logical run time for threads and timers. Processes can modify the clock interrupt interval for their needs to process timers much quicker, I'll not go into detail as I will talk about timers another time.

The only problem is that I ran into a dead end, I couldn't find anything related to the network driver in terms of modifying the clock interrupt timer.

3: kd> !list "-e -x \"dt nt!_EPROCESS @$extret-@@(#FIELD_OFFSET(nt!_EPROCESS, TimerResolutionLink)) ImageFileName SmallestTimerResolution RequestedTimerResolution\" nt!ExpTimerResolutionListHead"
dt nt!_EPROCESS @$extret-@@(#FIELD_OFFSET(nt!_EPROCESS, TimerResolutionLink)) ImageFileName SmallestTimerResolution RequestedTimerResolution
   +0x438 ImageFileName            : [15]  "???"
   +0x638 RequestedTimerResolution : 0x9c3d2000
   +0x63c SmallestTimerResolution  : 0xffffe001

dt nt!_EPROCESS @$extret-@@(#FIELD_OFFSET(nt!_EPROCESS, TimerResolutionLink)) ImageFileName SmallestTimerResolution RequestedTimerResolution
   +0x438 ImageFileName            : [15]  "svchost.exe"
   +0x638 RequestedTimerResolution : 0
   +0x63c SmallestTimerResolution  : 0x2710

So I thought I'd look into this a bit more.

3: kd> u ndis!ndisInterruptDpc+0x269ce
fffff801`dae9e81e 488b75a7        mov     rsi,qword ptr [rbp-59h]
fffff801`dae9e822 e9e297fdff      jmp     ndis!ndisInterruptDpc+0x1b9 (fffff801`dae78009)
fffff801`dae9e827 33d2            xor     edx,edx
fffff801`dae9e829 488d4dd7        lea     rcx,[rbp-29h]
fffff801`dae9e82d 448d420d        lea     r8d,[rdx+0Dh]
fffff801`dae9e831 e8160c0000      call    ndis!ndisPcwEndCycleCounter (fffff801`dae9f44c)
fffff801`dae9e836 90              nop
fffff801`dae9e837 e9d797fdff      jmp     ndis!ndisInterruptDpc+0x1c3 (fffff801`dae78013)

It appears the interrupt routine is looping for some reason.

I can't find anything on the cycle counter function as it is undocumented but I'll take a guess and say that it's keeping track of the time the interrupt has been executing, AFAIK this is don't by using a counter on the currently executing thread to see how long it's running.

3: kd> u nt!KeAcquireSpinLockRaiseToDpc+0x13
fffff800`922e8383 f605fcac270021  test    byte ptr [nt!PerfGlobalGroupMask+0x6 (fffff800`92563086)],21h
fffff800`922e838a 751f            jne     nt!KeAcquireSpinLockRaiseToDpc+0x3b (fffff800`922e83ab)
fffff800`922e838c f0480fba2900    lock bts qword ptr [rcx],0
fffff800`922e8392 7209            jb      nt!KeAcquireSpinLockRaiseToDpc+0x2d (fffff800`922e839d)
fffff800`922e8394 0fb6c3          movzx   eax,bl
fffff800`922e8397 4883c420        add     rsp,20h
fffff800`922e839b 5b              pop     rbx
fffff800`922e839c c3              ret

Here we can see the same DPC interrupt routine trying to acquire a spinlock yet it's not managing to do it and therefore looping all whilst it is still running at DPC level and therefore preventing normal thread execution.

Eventually it seems it managed to acquire the spinlock and then call a clock interrupt in order to perform some operation, I suspect updating the system time in order to service the network driver with higher response times.
The system realised that it was taking too long to complete and therefore bugchecked.

3: kd> lmvm athwbx
start             end                 module name
fffff801`dce44000 fffff801`dd1ff000   athwbx     (no symbols)          
    Loaded symbol image file: athwbx.sys
    Image path: \SystemRoot\system32\DRIVERS\athwbx.sys
    Image name: athwbx.sys
    Timestamp:        Thu Oct 17 10:46:01 2013 (525FB1D9)
    CheckSum:         003BC161
    ImageSize:        003BB000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

So the network driver was quite outdated, he updated it and the blue screens stopped so it looks like it was an easy fix.

Thanks for reading.

Tuesday, 2 September 2014

DPCs and APCs

I got back off holiday yesterday and I feel it's time for a blog post, this might be a long one as I have spent some time reading Windows Internals and looking at various subjects of interrupts DPCs and APCs.

So a DPC or Deferred Procedure Call is a way for the Kernel to defer execution until a later date, this is a good way for Windows to stop threads that take too long executing or are running for long periods of time at a high IRQL from holding up the system.

DPCs and dispatch interrupts

Well when the Kernel needs to synchronise access to shared data structures the IRQL is raised to DPC/dispatch level or above, therefore thread dispatching is disabled along with system service calling and page faults aren't permitted. When thread dispatching needs to take place the Kernel requests a software interrupt but masks it until the the processor finishes the work it's currently executing, afterwards the processor lowers the IRQL to APC or PASSIVE_LEVEL but first checks if there are any pending dispatch interrupts waiting to be executed, if there are then the IRQL drops to DPC/Dispatch level and executes them. Using dispatch interrupts is a way to defer thread dispatching until the right conditions are met, this is useful as the current activity could be more important than the thread wanting to be executed and therefore makes more sense to finish more important work first.

DPCs are mainly used to defer systems tasks for later execution, ones that are less important and can be executed later on with little repercusions than the task being executed. Device drivers also use DPCs to process interrupts, Windows and device drivers try to keep the IRQL below device IRQLs to prevent the system from running into problems by holding up normal execution. DPCs are used to process timer expiration and to release threads that are waiting on a timer which has already expired, if a thread doesn't release a thread waiting on a timer then the thread will stay in the wait state forever which AFAIK causes a deadlock.
 To view DPCs queued on a certain processor then you can use the !dpcs extension.

DPCs Advanced

DPCs do not run in the context of a particular thread being executed and therefore are not allowed to call system services, generate page faults and cannot create/wait for dispatcher objects. I will talk about dispatcher objects in another post as they are a long topic but simply, they are synchronisation mechanisms such as mutex objects, event objects, semaphores and timer objects.However, DPCs can access nonpaged memory as it is mapped in the system address space at all times.
DPCs are put in per processor queues meaning each processor has their own queue which are all independent from each other. A DPC is represented by a DPC object which contains code for the thread being executed, the most important information is the location of the thread in memory being executed when the DPC is initiated. Think of it as a shortcut on the desktop, without the shortcut containing the exact memory location of the actual program it can't run it.
I know what you're thinking now, "why can't the DPC just find the thread in memory if it isn't in the code?"
It's possible to do but it would take that long it would hold everything up and would cause more problems if it did so.

DPCs have priorities (low, medium, medium-high and high), these priorities determine when the DPC is executed and how it is added to the queue, by default device drivers set the DPC at medium but it can be overwritten. When the DPC priority is high it is placed at the front of the DPC queue for that particular processor, if the priority is medium-high or lower it places the DPC at the back of the queue. DPCs also use targeting which determines which CPU will execute the DPC, unfortunately most device drivers don't specify a processor which results in processor 0 being overwhelmed with DPCs.
DPCs are then pulled out of the queue and executed one by one until the queue is empty, this is known as draining, once the queue is drained the IRQL can drop below DPC/Dispatch level.

DPC draining takes place when the DPC priority is higher than low and is targeted at the current processor executing, if the priority is low then DPC draining takes place when the amount of DPCs in the queue exceed a certain threshold.
If the targeted CPU is different from the one running the ISR then depending on the priority depends on how the DPC is handled. If the priority is high or medium-high then the current processor sends out an IPI (Inter Processor Interrupt) to the targeted processor to drain the DPC queue, this happens only when the targeted CPU is idle.
If the priority is low or medium then the targeted processor's DPC queue must exceed a threshold for the draining to be initiated or the system is idle.

To view the Inter Processor Interrupts pending then you can use the !ipi extension.

The Kernel uses DPCs paired with clock interrupts, when a clock interrupt occurs (at every system clock tick count) the IRQL is raised to Clock level where the clock interrupt handler updates the system time, it then decrements a counter to track how long the current thread as run and when the counter reaches 0 the thread's quantum has expired and the kernel might need to reschedule the processor which is a low priority task and should be done later on which is when it gets implemented as a DPC. When the clock interrupt handler finishes its work the Kernel lowers the IRQL and executes the DPC.

The main problem with DPCs is that they don't run in the context of a particular thread and therefore don't care what thread is currently executing meaning even the highest priority threads can be interrupted which is why you can experience slowness and sound lagging. This can be worked around by using Threaded DPCs which run at Passive level IRQL and run as a real time thread (level 31), this allows the DPC to preempt most user mode threads as they don't run at real time thread priority but allows nonthreaded DPCs, APCs and higher priority threads to preempt the routine, preventing this problem.

For more information on interrupts then take a look here

Asynchronous Procedure Calls

Asynchronous Procedure Calls or APCs provide a way for user mode and system code to execute in context to a particular thread. Due to the fact that APCs run in context of a particular thread and therefore in a process address space they don't have the same restrictions to a DPC in that they can acquire objects, wait for object handles, incur page faults and call system services.

APCs are described by a Kernel object called an APC object and APCs waiting to run are placed in an APC queue which unlike DPC queues (which are per processor) are in the context of a thread rather than per processor meaning every thread has its own APC queue. The Kernel requests a software interrupt at APC level and when the thread is ran the APC is executed.

There are two kinds of APCs, Kernel APCs and User APCs which differ in that User APCs need permission from the thread it is trying to run on where as Kernel APCs run without permission.
There are then two types of Kernel mode APCs, normal and special; Special APCs run at APC IRQL and allow the modification of the normal APC, the normal APC runs and uses the altered parameters from the special APC or the original ones if there was no special APC.

Special APCs can be disabled by calling the KeEnterGuardedRegion which sets the SpecialApcDelivery field in the KTHREAD data structure shown here.

You can also see the KernelApcDisable and CombinedApcDisable entries, these are self-explanatory.
You can view the KTHREAD data structure by typing in dt nt!_KTHREAD followed by the thread's address in Windbg.

Special Kernel APCs are used, for example to show results after an asynchronous I/O operation has completed from and is then saved in the thread's address space.
Kernel APCs are also used mainly for thread suspension and termination as well as to query the context of a particular thread, this can be ran from arbitrary threads so device drivers often block APCs when holding a lock otherwise the lcok might never be released resulting in a system deadlock.

There are lots of different APIs that are user mode APCs such as ReadFileEx and WriteFileEx which allow the caller to initiate certain routines when an I/O operation completes. User mode threads however can't recieve APCs unless they're in a wait state by waiting for an object handle or volunatry entering a wait state by calling WaitForMultipleObjectsEx, this allows the kernel to alert the thread which can then let the thread recieve the APC and allows it to be executed. After the APC has completed the thread can resume execution as normal.

Unfortunately when APCs are initiated and the thread is in a wait state, when the APC completes and the thread checks whether the wait is resolved, if it isn't then it is put at the back of the wait queue for the objects it's waiting for. In other words when APCs are initiated the wait on a thread is removed until the APC completes, then it's put at the back of the thread wait list for the objects it's trying to access.

The APCs have different priorities like DPCs:
-Special Kernel APCs which are inserted at the tail of the Kernel mode APC list, theyare ran at APC level and when the thread isn't in a critical region, it's then given pointers to the parameters of the Kernel APC associated with it.
- Kernel APCs are implemented after the last Special Kernel APC (ahead of the rest of the Kernel APCS) and ran at Passive level , it recieves the arguements given from the Special Kernel APC associated with it (or none if there wasn't any alterations).
- User APCs are inserted at the tail of the user mode APC list, they are ran at passive level and when the thread isn't in a guarded region and the thread is in an alerted state, it is then given the arguments from the Special APC.

0x1 bugchecks (APC_INDEX_MISMATCH) can be caused by APCs not entering and leaving guarded or critical states correctly, there are mainly two reasons:
-The thread didn't leave the guarded/critical region which is needed as the I/O manager and Object manager use APCs to perform a lot of operations.
-The thread entered and left guarded/critical regions too many times.

The best option for finding the thread at fault is to enable Driver Verifier.
I believe the option to monitor APCs is Automatic Checks but I'm not too sure.
For more information on Driver Verifier see here:

I was going to talk a lot about timers but I've already written a lot here so I think I'll save that for a seperate post, a lot of this is from Windows Internals but I've tried to explain it a bit differently as it can be overwhelming.

Monday, 25 August 2014

Debugging and my story

I thought is write a blot post about myself and how I started debugging as I haven't posted in a long time, the reason is I'm actually on holiday in Greece so I can't write much about debugging as I don't have access to my computer.

So here's my story on how I got to where I am today...

Before time began... Wait, that's not right.
Okay, I first got my own computer in March 2013 from a local computer shop, I saved up from Xmas and birthday money, my parents didn't agree getting my own computer was a good idea but I insisted so I went to this shop and poured out a sum of £590 to pay for a gaming computer.
At the time I knew nothing, literally... I mean nothing about computers, I just wanted to play better games than what was on my Xbox at the time.
So here I carried it upstairs (my Uncle fetched the computer in his car for me at the time) into my bedroom, here comes the phone call of my mum ringing up and to her disbelief I told her I bought a computer.
She came home and to no surprise wasn't happy at all, we had no where to put it at all, eventually she decided to move everything around to put a desk in.
Here's where the irony comes in, she said it was a bad idea to buy the computer and especially from this local shop, so here I was setting it up, buying a few games on Steam, starting playing the games and... (You guessed it) blue screen!

Oh no, what am I going to do?
Panic, panic...
I eventually ring the shop and say this has happened, so I took it down to the shop, he looked at it and told me to come back in a few days, I came back and it ran fine until it happened again.
I remember him saying it was a driver issue and I'm installing something that's causing problems, funny how he never told me what it was.

This is the point where I decided to go online and see if I can find any solutions, all gibberish as I knew nothing of computing.
That's when I stumbled upon

I asked for help and I got all these solutions that wouldn't work, from somebody called Arc, I then contacted somebody else called X Blue Robot to help me which happens to be a good friend now over at

Anyway, the problems still persisted so joined another forum which was and made a thread, Vir Gnarus and two others helped me and it was appearing to be a hardware failure, especially given it was a 0x124, (discussed in another post).

I still couldn't fix the problem, I got numerous errors to the point where I contacted the supplier where the local shop got the computers from. They told me to send it to them free of charge, they contacted me days after and said they couldn't replicate the issue but replaced the PSU and GPU as the GPU had a loose bearing as wasn't actually supplied with the computer, the local shop bought that seperate so the 450w PSU couldn't really handle it so that was replaced to an el cheapo 750w Ace switching PSU which can be bought from around £10, that's good...

I gave up and changed what software I could as the supplier still couldn't find any problems.

I started then posting BSOD instructions over at to help some BSOD analysts.
I got a few thanks, I then installed the Windows Debugger to take a look at some files, still gibberish, no luck in finding useful commands on my own, thanks to some of x BlueRobot's posts (Harry Miller) I managed to use some commands to find simply BSOD cases.
I then started learning basics and reading blog posts by Harry on learning debugging. Afterwards I bought Windows Internals and read a bit of that.
Without causing more wars I got into a large disagreement over at and got banned, after already having an account on Sysnative about freezing I decided to take my knowledge over there.
Shorty after I made friends with two fellow BSOD analysts Patrick Barker and John Griffith.
I've been there ever since at my new online home, I then joined and helped people out there (which I still do).
And now here I am (in Greece) helping people out with BSODs and hopefully starting a Computer Science degree at Sheffield Hallam University this time next year.
Oh and over at Vir Gnarus who is now a good friend but he recently switched to IT Infrastructure as apposed to debugging but I'm hoping he'll return soon.

I'm undecided in what to do as a career at the moment but an Escalation Engineer at Microsoft looks like a very interesting job.

So that's my story so far, debugging is very interesting and so hard to believe that if I hadn't had bought that specific computer I wouldn't be here today.

Tuesday, 5 August 2014

Memory Management - Stacks

In this blog I'll talk about stacks, what they are and how they are used in Windows.
We've come across the term before but we don't know that much about them unless you really look into them.

So a stack is an abstract data type that is implemented as a LIFO structure which means Last In First Out.

So from good old wikipedia here's a very good simple picture of a LIFO mechanism, we can see it uses Push to add data onto the stack and Pop to remove it, so now you know how the simple stack works lets go a bit more advanced.

A stack has a fixed origin within memory called (you know it) stack origin, it then uses a the push instruction to initialize the stack. It then contains a stack pointer which points to the address of the last item added to the stack. The pointer moves further away from the origin as more data is added, although this doesn't necessarily mean it's moving up, it can move down.
Now the pointer cannot cross the origin margine at any time, if this happens a stack underrun occurs, this is normally caused by using pop more times than it should be.
A stack overflow occurs when Push is used more times than allowed so the pointer moves into the boundary of another stack, in other words it spills data outside of the allocated region and goes into another stack.
This is a very big problem on Kernel stacks as there are no process address spaces to protect the memory, in Kernel mode everything is ran from a single system memory space that has access to the entire Kernel system of the OS. When this overflow happens it can and will corrupt data on another stack elsewhere that can be executing a thread completely different to the stack overflowing, the culprit on the current stack essentially flees and the stack being corrupt blames somebody else and a bugcheck is called once the corruption is detected, if this is the case Driver Verifier should be enabled.

A good picture to show how this works is as follows.

I rambled a bit here but I just tried to briefly explain how stack overflows cause corruption that can bring the system to a halt so device drivers and other kernel objects should be written carefully and correctly to prevent these situations from happening.

Stacks can also be implemented within arrays which involves the first element at offset zero being the stack origin and it builds from there.
Implementing stacks in linked lists differs in that AFAIK it doesn't involve using the LIFO mechanism but rather removing nodes and replacing them with different ones in order to change the bottom element of the stack, I need to look into that a bit more though.

There are generally three major types of stacks: User Stacks, Kernel Stacks and DPC Stacks.

In User Stacks when a thread is created by the memory manager 1MB of memory is reserved which can be altered when calling the CreateThread function. Once the thread is created only the first page and a guard page is created, more data can then be added to the page until the guard page is hitwhen an exception occurs, this then allows it to grow with demand but it will never shrink back.

Kernel stacks are a lot smaller than user stacks, they typically range from 12KB (x86) to 16KB (x64), this excludes a guard page table entry which consumes an additional 4KB.
Kernel running code tends to have less recursion than user mode code and therefore contains more efficient code which keeps stack buffer sizes smaller. As stated before, Kernel code has a much larger impact on the system as it runs in a single system address space.

However, interactions between the graphics system and win32k.sys subsequent calls back into user mode are recursive the Kernel implements a way for stacks to be added when nearing the guard page, these stacks contain an additional 16KB, when calls are returned the memory manager frees the stacks afterwards.

 The DPC stack contains a processor stack (One for each processor) which is available for use everytime DPCs are executed, they stay in their own stack as it's generally unrelated to the current kernel stack's operations as it runs in an arbitary thread context.

I believe I've covered pretty much everything on stacks, I hope that's helped your understanding.

Windows Internals

Saturday, 2 August 2014

Interrupt dispatching and handling

In this post I'll talk about interrupt dispatching and the type of interrupts. Interrupts have always been interesting yet slightly confusing at the same time so I'll try and explain what they are and the different types they come in.

So what is an interrupt?

It's kind of in the name, it's an asynchronous event that diverts the processors flow of control.
They generally come in two forms, hardware interrupts and software interrupts.
 Interrupts can occur from I/O devices, timers or processor clocks.

Hardware Interrupts

These interrupts are external I/Os that come from lines in the interrupt controllers, so when an IRQ (Interrupt Request) is received it enters through a line on the interrupt controller which converts the IRQ into a number which is matched with the IDT index (Interrupt Dispatch Table), then the ISR (Interrupt Service Routine) trap handler is invoked to save the context of the currently executing thread, once the interrupt is completed the context is restored so the thread continues execution like nothing has ever happened.

Interrupt controllers

Hardware interrupts use interrupt controllers which generally speaking come in two forms, PIC (Programmable Interrupt Controller) and APIC (Advanced Programmable Interrupt Controller). The PIC is a uniprocessor controller that is generally used on x86 systems and uses 8 lines. However another PIC can be added called a slave which can add an additional 7 lines to the controller adding to a total of 15 lines.
The APIC is multiprossor interrupt controller which is generally used on x64 systems that contains 256 lines, with this in play the PIC is quickly being phased out.

Here is an example of the IDT which contains lots of different entries for specific interrupts, trap handlers for exceptions also use the IDT for events such as page faults.
I will discuss later on how page faults come into play with bugchecks and IRQs but for a more indepth explanation on how page faults are handled take a look at my friend Patrick's post over at

Software Interrupts

Although interrupt controllers implement their own prioritisation mechanisms Windows uses it's own technique for doing so, IRQLs.

These are IRQLs for x64, IA64 and x86 systems.
IRQLs are a way for interrupts to be prioritised appropriately, IRQLs aren't implemented in a first in first serve technique but rather the higher the IRQL the higher the priority so an IRQL at 15 would get serviced before one at IRQL 2.

To put this into perspective an IRQ that is at IRQL 2 would have to wait for any IRQs at 3 or above to get serviced before the IRQL is lowered for it to be serviced as the level cannot be lowered when a new interrupt has occurred.
For example, if an interrupt is being serviced and another interrupt needs servicing two things can happen.

One is the current IRQ is but on a waiting list and the new one is serviced.
Two is the current IRQ is finished being serviced then the next IRQL further down the list is next.

It depends on the IRQL of the interrupts.

Back to page faults,
A page fault occurs when a request to memory that is not present happens, when a page fault occurs the page fault handler requests the memory being referenced is brought into memory but in order to do that the IRQL must be at 1 or below as this is when pageable memory can be accessed.
Now when the IRQL is higher than this servicing and interrupt and a page fault occurs this is when we bugcheck with either 0xA or 0xD1 (DRIVER_)IRQL_NOT_LESS_OR_EQUAL

So why can't we just lower the IRQL to service the page fault or wait for the current interrupt to finish?

Well IRQLs cannot be lowered when an interrupt at that level is being serviced as that has priority, a page fault cannot wait as it must be serviced immediately.
You see the problem?
It's an endless cycle so the system crashes as it can't compute anything else.

I hope I've covered pretty much everything and I hope you've learned something.

I forgot to add, hardware interrupts (IRQs) can only be serviced above DPC/dispatch level, so anything at that level or below will not allow hardware interrupts to be serviced.

Wednesday, 30 July 2014

Instruction pointer misalignments

This time I'll talk about instruction pointer misalignment.
So what is an instruction pointer misalignment?

Well, when an object references memory it uses a pointer to (you guessed it) point to a certain memory address, once it references the data inside that address it grabs the data from inside the address which is known as dereferencing.

When a pointer is misaligned it grabs data from the wrong address which causes a lot of problems by causing severe memory corruption depending on the contents of the address being referenced, if allowed to write it can completely corrupt the address, the culprit can escape and some innocent pointer comes along, tries to use the address and gets blamed by the computer police.
This is why bugchecks are called to prevent such memory corruption, now the way data structures are arranged and accessed it will read/write in chunks of 4 bytes (sometimes larger) so the memory offset size will be a multiple of the word size, the reason this is done is to maximise the performance by utilising way the CPU handles memory.

When the memory being referenced isn't a multiple of 4 then that's when things go wrong, it generally results in an alignment fault which is also known a bus error, a good example is this.

This instruction taken from a crash dump can explain this a little bit.

 So the nt!Mm is the module, in this case it's a Memory Management Windows function.
The CleanProcessAddressSpace is the actual function, in this case it's scrubbing a memory address ready for allocation.
 The +0xe6 is the offset which is like the address on a street, it's the location which the function takes place.

 I was actually looking at the differences between a segmentation fault and a bus error as they both involve the CPU not physically being capable of addressing the memory being referenced.

  • The segmentation error (or access violation exception) occurs when memory outside of the allowed location is referenced (not to be confused with buffer overruns which involve writing outside allocated memory into another buffer).
  • The bus error occurs when an address which is not alligned is referenced, by this as you know is when they aren't multiples of 4.
Another thing to note, in dump files you can see where it says misaligned pointer it mentions it's probably caused by hardware. As I've mentioned, it's probably due to the fact that the CPU cannot address memory that isn't alligned with multiples of 4 so it looks like it's due to the CPU not being able to read it at all.
Misaligned IPs don't always result in a bus error, they can be caused by drivers writing more data in a buffer on a stack which results in a stack overflow, this also results in a bugcheck to prevent critical memory corruption.
 I hope this has helped you understand the differences and more about instruction pointers.