Monday, 22 September 2014

Device/Driver Objects and Stacks

Today I thought I'd write a bit about device stacks and driver stacks and how they implement IRPs.
I'm not going into detail on how drivers function and the types of drivers as I would be here all day so I'll save that for another time.

What is a device object and a driver object?

A device object is an opaque structure that represents a device or function. A device object is an instance of the DEVICE_OBJECT data structure which is used by the operating system to represent a device.
Some device objects don't always represent a physical device, they can represent a logical device.

A driver object is just a Kernel image to represent the Kernel mode driver which includes a pointer to the driver's routines.
When a driver initialises it creates a device object to represent physical or logical devices.

Device stacks and Device nodes

The Kernel organises drivers into a tree structure called the Plug and Play device tree containing device nodes that represent devices, do note that some nodes represent software components which don't have any physical devices attached to them.

A device stack contains a PDO (Physical Device Object) which represents the physical device connected to a physical bus on the motherboard, in this case I'll talk about the PCI bus as an example.
The PCI bus enumerates the child devices which are connected to the PCI bus on the motherboard, this creates the PDO for each device and is then represented by a device node in the PnP device tree.
Do note that depending on your perspective determines what type of driver the pci.sys driver is, for example if you're looking at the PCI bus device node then it's the function driver but if you're looking at one of the PCI bus node child devices associated with it then it's the bus driver.

After the device node has been associated with the new PDO the PnP manager then searches the registry for the driver(s) which needs to be part of the device stack, these drivers are called Function Drivers.

Here's a small point about the drivers usually found in device stacks:

  • Bus drivers detect and inform the PnP manager about its devices on its bus as well as controlling the power to the bus. There is only allowed to be one bus driver at once and Microsoft normally supplies them.
  • Function drivers on the other hand are the main driver that represents the device and performs the basic operations for reading and writing, it's the driver that knows the most about its device.
  • Filter drivers modify the device behaviour when needed and it's located above or below the function driver. It normally fixes errors that are detected before it reaches the function driver on the stack.

0: kd> !devstack fffffa8004615680
  !DevObj           !DrvObj            !DevExt           ObjectName
  fffffa8005f8c150  \DRIVER\VERIFIER_FILTERfffffa8005f8c2a0 
  fffffa8005f8c390 *** ERROR: Module load completed but symbols could not be loaded for GEARAspiWDM.sys
  fffffa8005f202e0  \DRIVER\VERIFIER_FILTERfffffa8005f20430 
  fffffa8005eec060  \Driver\cdrom      fffffa8005f77b80  CdRom0
  fffffa80057379b0  \Driver\ACPI       fffffa80047f6a00 
> fffffa8004615680  \Driver\atapi      fffffa80046157d0  IdeDeviceP0T1L0-5
!DevNode fffffa8005742900 :
  DeviceInst is "IDE\CdRomATAPI_iHAS124___B_______________________AL0R____\5&f437ab5&0&0.1.0"
  ServiceName is "cdrom"

This is the device stack for the cd drive in the computer which shows the associated device objects and driver objects within it.

  • Atapi provides the interface to enable support for cd players.
  • ACPI is the bus filter driver that enables Power Management for the operating system so when devices are not in use (In this case the cd player) it will be powered off.
  • cdrom is the function driver for the cd drive that allows discs to be read and written to.
  • GEARAspiWDM.sys is the cdrom 3rd party filter driver.
  • VERIFIER_FILTER are filter drivers used by Driver Verifier which is enabled to monitor driver routines and operations to make sure everything is working correctly.

For more information on Driver Verifier see here: Driver Verifier (Windows Drivers)

Driver Stacks are determined by how many drivers are present when processing an IRP by passing it down a device stack or in some cases multiple device stacks.
A driver object can be associated with multiple different device objects and therefore lots of device stacks, this shows that an IRP can be passed down lots of device stacks but only being serviced by a few drivers.

0: kd> !drvobj \Driver\ACPI
Driver object (fffffa80039a6af0) is for:
Driver Extension List: (id , addr)

Device Object list:
fffffa80057379b0  fffffa800573a9b0  fffffa80057399b0  fffffa800572fc20
fffffa800572fe40  fffffa800572ea00  fffffa800572ec20  fffffa800572ee40
fffffa800572da00  fffffa800572dc20  fffffa800572de40  fffffa8005819e40
fffffa8005814e40  fffffa800580fe40  fffffa800572a9b0  fffffa80057289b0
fffffa8005720e40  fffffa800571c920  fffffa800571cb20  fffffa800571bc40
fffffa800571be40  fffffa8005713bc0  fffffa8005700e40  fffffa80056ffa40
fffffa80056ffc40  fffffa80056ffe40  fffffa80056fea40  fffffa80056fec40
fffffa80056fee40  fffffa80056fda40  fffffa80056fdc40  fffffa80056fde40
fffffa80056fca40  fffffa80056fcc40  fffffa80056fce40  fffffa8004616330
fffffa8004616040  fffffa8004616c20  fffffa8004616e40  fffffa80047f9770
fffffa80039eadb0  fffffa8004be1060  fffffa80047fe170  fffffa80047fe390
fffffa80047fe5b0  fffffa80047fe7d0  fffffa80047fe9f0  fffffa80047fec10

So as proven here we can clearly see that the ACPI.sys driver is associated with a lot of device objects as it can't just represent one device otherwise one hardware component would use ACPI and everything else would be powered on all the time, think about how many USB devices would be turned on.
So our CD drive is just one component that uses ACPI.

Finally we can see information about the IRP being sent by looking at the IRP data structure.

0: kd> dt nt!_IRP fffff9801c458dc0
   +0x000 Type             : 0n6
   +0x002 Size             : 0x238
   +0x008 MdlAddress       : (null)
   +0x010 Flags            : 0x40000000
   +0x018 AssociatedIrp    : <unnamed-tag>
   +0x020 ThreadListEntry  : _LIST_ENTRY [ 0xfffff980`1c458de0 - 0xfffff980`1c458de0 ]
   +0x030 IoStatus         : _IO_STATUS_BLOCK
   +0x040 RequestorMode    : 0 ''
   +0x041 PendingReturned  : 0 ''
   +0x042 StackCount       : 5 ''
   +0x043 CurrentLocation  : 1 ''
   +0x044 Cancel           : 0 ''
   +0x045 CancelIrql       : 0 ''
   +0x046 ApcEnvironment   : 0 ''
   +0x047 AllocationFlags  : 0x80 ''
   +0x048 UserIosb         : (null)
   +0x050 UserEvent        : (null)
   +0x058 Overlay          : <unnamed-tag>
   +0x068 CancelRoutine    : (null)
   +0x070 UserBuffer       : (null)
   +0x078 Tail             : <unnamed-tag>

Some of the entries are pretty obvious from the name and some aren't documented, the ones that are can be found here:

IRP (Windows Drivers)

Power IRPs

I found an old dump file which was a 0x9F Kernel dump file caused by a power IRP not synchronising with the pnp manager.
Power IRPs are used to change the power state for a device and therefore they must reach the bottom of the device stack which is the physical device object.

A driver has failed to complete a power IRP within a specific time.
Arg1: 0000000000000004, The power transition timed out waiting to synchronize with the Pnp

Arg2: 0000000000000258, Timeout in seconds.
Arg3: fffffa8007005660, The thread currently holding on to the Pnp lock.
Arg4: fffff800053e83d0, nt!TRIAGE_9F_PNP on Win7 and higher

So we can see our 0x9F bugcheck with a power IRP failing to synchronise with the PnP manager because the IRP hasn't reached the bottom of the stack.

0: kd> !locks
KD: Scanning for held locks..

Resource @ nt!IopDeviceTreeLock (0xfffff80003492ce0)    Shared 1 owning threads
    Contention Count = 1
     Threads: fffffa8007005660-01<*>
KD: Scanning for held locks.

Resource @ nt!PiEngineLock (0xfffff80003492be0)    Exclusively owned
    Contention Count = 21
    NumberOfExclusiveWaiters = 1
     Threads: fffffa8007005660-01<*>
     Threads Waiting On Exclusive Access:

KD: Scanning for held locks............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
18855 total locks, 2 locks currently held

We can see two locks have been held, IopDeviceTreeLock is to synchronise the device tree as a spinlock and the PiEngineLock which is a pnp and power management lock. The PiEngineLock is being owned by the ZTEusbnet driver in order to pass down the power IRP.

0: kd> !thread fffffa80`07005660
THREAD fffffa8007005660  Cid 0004.0048  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertable
    fffffa800d035ee8  NotificationEvent
IRP List:
    fffffa8008f5cc10: (0006,03e8) Flags: 00000000  Mdl: 00000000
Not impersonating
DeviceMap                 fffff8a000008c10
Owning Process            fffffa8006f8d890       Image:         System
Attached Process          N/A            Image:         N/A
Wait Start TickCount      396427         Ticks: 38463 (0:00:10:00.026)
Context Switch Count      44059          IdealProcessor: 2  NoStackSwap
UserTime                  00:00:00.000
KernelTime                00:00:00.343
Win32 Start Address nt!ExpWorkerThread (0xfffff80003298150)
Stack Init fffff88003bd2c70 Current fffff88003bd2280
Base fffff88003bd3000 Limit fffff88003bcd000 Call 0
Priority 15 BasePriority 12 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
fffff880`03bd22c0 fffff800`032845f2 : fffffa80`07005660 fffffa80`07005660 00000000`00000000 00000000`00000000 : nt!KiSwapContext+0x7a
fffff880`03bd2400 fffff800`0329599f : fffffa80`0d0df208 fffff880`0ae9e10b fffffa80`00000000 00000000`00000000 : nt!KiCommitThreadWait+0x1d2
fffff880`03bd2490 fffff880`0ae915dd : fffffa80`0d035000 00000000`00000000 fffffa80`0dd8ca00 00000000`00000000 : nt!KeWaitForSingleObject+0x19f
fffff880`03bd2530 fffff880`0ae92627 : fffffa80`0d035000 00000000`00000000 fffffa80`0c0891a0 fffff880`03bd2670 : ZTEusbnet+0x35dd
fffff880`03bd2580 fffff880`0215d809 : fffffa80`0c0891a0 fffff880`020f0ecd fffff880`03bd2670 fffffa80`091c5550 : ZTEusbnet+0x4627
fffff880`03bd25b0 fffff880`0215d7d0 : fffffa80`091c54a0 fffffa80`0c0891a0 fffff880`03bd2670 fffffa80`08fc2ac0 : ndis!NdisFDevicePnPEventNotify+0x89
fffff880`03bd25e0 fffff880`0215d7d0 : fffffa80`08fc2a10 fffffa80`0c0891a0 fffffa80`091f9010 fffffa80`091f90c0 : ndis!NdisFDevicePnPEventNotify+0x50
fffff880`03bd2610 fffff880`0219070c : fffffa80`0c0891a0 00000000`00000000 00000000`00000000 fffffa80`0c0891a0 : ndis!NdisFDevicePnPEventNotify+0x50
fffff880`03bd2640 fffff880`021a1da2 : 00000000`00000000 fffffa80`08f5cc10 00000000`00000000 fffffa80`0c0891a0 : ndis! ?? ::LNCPHCLB::`string'+0xddf
fffff880`03bd26f0 fffff800`034fb121 : fffffa80`091c7060 fffffa80`0c089050 fffff880`03bd2848 fffffa80`070bfa00 : ndis!ndisPnPDispatch+0x843
fffff880`03bd2790 fffff800`0367b3a1 : fffffa80`070bfa00 00000000`00000000 fffffa80`0dc19990 fffff880`03bd2828 : nt!IopSynchronousCall+0xe1
fffff880`03bd2800 fffff800`03675d78 : fffffa80`09196e00 fffffa80`070bfa00 00000000`0000030a 00000000`00000308 : nt!IopRemoveDevice+0x101
fffff880`03bd28c0 fffff800`0367aee7 : fffffa80`0dc19990 00000000`00000000 00000000`00000003 00000000`00000136 : nt!PnpSurpriseRemoveLockedDeviceNode+0x128
fffff880`03bd2900 fffff800`0367b000 : 00000000`00000000 fffff8a0`11d1c000 fffff8a0`049330d0 fffff880`03bd2a58 : nt!PnpDeleteLockedDeviceNode+0x37
fffff880`03bd2930 fffff800`0370b97f : 00000000`00000002 00000000`00000000 fffffa80`09122010 00000000`00000000 : nt!PnpDeleteLockedDeviceNodes+0xa0
fffff880`03bd29a0 fffff800`0370c53c : fffff880`03bd2b78 fffffa80`114ab700 fffffa80`07005600 fffffa80`00000000 : nt!PnpProcessQueryRemoveAndEject+0x6cf
fffff880`03bd2ae0 fffff800`035f573e : 00000000`00000000 fffffa80`114ab7d0 fffff8a0`123a25b0 00000000`00000000 : nt!PnpProcessTargetDeviceEvent+0x4c
fffff880`03bd2b10 fffff800`03298261 : fffff800`034f9f88 fffff8a0`11d1c010 fffff800`034342d8 fffff800`034342d8 : nt! ?? ::NNGAKEGL::`string'+0x54d9b
fffff880`03bd2b70 fffff800`0352b2ea : 00000000`00000000 fffffa80`07005660 00000000`00000080 fffffa80`06f8d890 : nt!ExpWorkerThread+0x111
fffff880`03bd2c00 fffff800`0327f8e6 : fffff880`03965180 fffffa80`07005660 fffff880`0396ffc0 00000000`00000000 : nt!PspSystemThreadStartup+0x5a
fffff880`03bd2c40 00000000`00000000 : fffff880`03bd3000 fffff880`03bcd000 fffff880`03bd2410 00000000`00000000 : nt!KxStartSystemThread+0x16

0: kd> !irp fffffa8008f5cc10
Irp is active with 10 stacks 10 is current (= 0xfffffa8008f5cf68)
 No Mdl: No System Buffer: Thread fffffa8007005660:  Irp stack trace. 
     cmd  flg cl Device   File     Completion-Context
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
>[ 1b,17]   0  0 fffffa800c089050 00000000 00000000-00000000   
            Args: 00000000 00000000 00000000 00000000

I'm not sure why but the ZTEusbnet driver isn't processing the power IRP, it's just leaving it and that's what caused the system to crash.
It'd be nice to know exactly why it didn't pass the power IRP on.
I'm not suprised though given the date of the driver.

0: kd> !devstack fffffa800c089050
  !DevObj           !DrvObj            !DevExt           ObjectName
> fffffa800c089050  \Driver\ZTEusbnet  fffffa800c0891a0  NDMP14
  fffffa80070bfa00  \Driver\usbccgp    fffffa80070bfb50  000000a8
!DevNode fffffa800dc19990 :
  DeviceInst is "USB\VID_19D2&PID_0063&MI_04\6&200b5242&0&0004"
  ServiceName is "ZTEusbnet"

We can see that it was meant to pass the power IRP down to the USB common class generic parent driver which, to put it simply exposes each USB composite device in order to seperate it to a single device. Passing it down to the USB bus driver should change the power state.

0: kd> lmvm ZTEusbnet
start             end                 module name
fffff880`0ae8e000 fffff880`0aebc000   ZTEusbnet   (no symbols)          
    Loaded symbol image file: ZTEusbnet.sys
    Image path: \SystemRoot\system32\DRIVERS\ZTEusbnet.sys
    Image name: ZTEusbnet.sys
    Timestamp:        Mon Oct 13 06:50:10 2008 (48F2E192)
    CheckSum:         000329ED
    ImageSize:        0002E000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

This is a dump file from quite a while ago but if memory serves me correctly I think an update solved the issue.

Any other questions feel free to ask, I believe I've covered most things without going into detail about drivers.

Sources: Device nodes and device stacks (Windows Drivers)
Driver stacks (Windows Drivers)

Thursday, 18 September 2014


I've not posted in a while but I found an interesting case on a forum and managed to acquire a Kernel memory dump.
I'm not going into detail about DPCs or interrupts as I have made blog posts on these in the past.

The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arg1: 0000000000000000, A single DPC or ISR exceeded its time allotment. The offending
    component can usually be identified with a stack trace.
Arg2: 0000000000000501, The DPC time count (in ticks).
Arg3: 0000000000000500, The DPC time allotment (in ticks).
Arg4: 0000000000000000

So here it states that we encountered a DPC which exceeded the allocated time for it to finish executing. The problem is that it went over this time, and as stated before DPCs can hold up the system when taking too long to execute which can result in lagging, a slow system or even sound cutting out.

So lets look at our stack trace.

ffffd001`50c93c98 fffff800`9238bcc2 : 00000000`00000133 00000000`00000000 00000000`00000501 00000000`00000500 : nt!KeBugCheckEx
ffffd001`50c93ca0 fffff800`92271115 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff801`dceabf17 : nt! ?? ::FNODOBFM::`string'+0x18b12
ffffd001`50c93d30 fffff800`929a07b5 : ffffe001`00400a02 fffff800`922fcae6 fffff801`daed3cf8 ffffe001`00008201 : nt!KeClockInterruptNotify+0x95
ffffd001`50c93f40 fffff800`922e80e3 : ffffd001`50c93f60 00000000`00000008 ffff5377`5487cf7d 00000000`0000000c : hal!HalpTimerClockIpiRoutine+0x15
ffffd001`50c93f70 fffff800`9236412a : ffffe001`9c600500 ffffe001`9e8de1a0 00000000`00000000 00000000`00000000 : nt!KiCallInterruptServiceRoutine+0xa3
ffffd001`50c93fb0 fffff800`92364a9b : 44454c49`4146203a 696c6564`206f7420 6e657665`20726576 20212121`20352074 : nt!KiInterruptSubDispatchNoLockNoEtw+0xea
ffffd001`50c853a0 fffff800`922e8383 : ffffe001`9e92d030 ffffe001`9e968030 00000000`02290a8d 00000000`00000018 : nt!KiInterruptDispatchNoLockNoEtw+0xfb
ffffd001`50c85530 fffff801`dcfa5751 : ffffe001`9e66e7a0 ffffe001`00000000 ffffe001`9e92fbe0 00000000`fffff850 : nt!KeAcquireSpinLockRaiseToDpc+0x13
ffffd001`50c85560 fffff801`dcfa531d : ffffe001`9e96b840 fffff801`dcf2c48f ffffe001`9eb68490 fffff801`dcf2c550 : athwbx+0x161751
ffffd001`50c855f0 fffff801`dcf60c42 : ffffe001`9e96b840 ffffd001`50c85650 ffffd001`50c85654 00000000`00000000 : athwbx+0x16131d
ffffd001`50c85630 fffff801`dcf33472 : ffffe001`9e9bf030 fffff801`00000000 ffffd001`50c856d0 fffff801`dd074319 : athwbx+0x11cc42
ffffd001`50c85680 fffff801`dd0c129f : ffffe001`9e9bf030 ffffffff`ffffffff ffffe001`9e6d97e8 fffff801`dd011189 : athwbx+0xef472
ffffd001`50c856f0 fffff801`dd08679e : ffffe001`9e968030 00000000`00000000 00000000`00000000 00000000`00000000 : athwbx+0x27d29f
ffffd001`50c85720 fffff801`dae9e81e : ffffe001`9e961030 00000000`00000000 ffffd001`50c85790 00000000`00000000 : athwbx+0x24279e
ffffd001`50c85760 fffff800`92252130 : ffffd001`50c85b00 00000000`00000000 00000000`00000200 fffff800`92274ae0 : ndis!ndisInterruptDpc+0x269ce
ffffd001`50c85860 fffff800`9225134b : ffffd001`50c5c180 ffffe001`9e8f4010 ffffe001`9c46b900 ffffe001`a12f3080 : nt!KiExecuteAllDpcs+0x1b0
ffffd001`50c859b0 fffff800`923667ea : ffffd001`50c5c180 ffffd001`50c5c180 ffffd001`50c682c0 ffffe001`9dbbb540 : nt!KiRetireDpcList+0xdb
ffffd001`50c85c60 00000000`00000000 : ffffd001`50c86000 ffffd001`50c80000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x5a

So in this callstack we see our processor in an idle loop, when idle it tends to execute any DPCs if there are any waiting in the DPC queue.
It begins to execute all the DPCs in the queue (also known as draining) when get execute an [B]ndis dpc interrupt[/B], this begins to call network functions and then acquire a spinlock and raise to DPC/Dispatch IRQL level if it hasn't already (this is the standard routine that is used, I can't remember if it is required), we then recieve more interrupts followed by a clock interrupt and a bugcheck.

Okay so we know that we bugchecked because a DPC was taking too long to finish executing and risk holding up the system, especially where spinlocks are concerned.

The main thing that interests me is why is there a clock interrupt?

3: kd> !dpcs
CPU Type      KDPC       Function
 3: Normal  : 0xffffe0019e66e880 0xfffff801dae78eb0 ndis!ndisMTimerObjectDpc
 3: Normal  : 0xffffd00150c61668 0xfffff80092327b28 nt!PpmPerfAction
 3: Normal  : 0xffffd0015589a280 0xfffff80092258854 nt!PopExecuteProcessorCallback
 3: Threaded: 0xffffd00150c617c0 0xfffff8009231a0a0 nt!KiDpcWatchdog

I believe the ndis dpc interrupt is related to this timer object but I may be wrong, if it is related then the clock interrupt makes sense as the system requires intervals for clock interrupts to take place in order to keep track of system time and logical run time for threads and timers. Processes can modify the clock interrupt interval for their needs to process timers much quicker, I'll not go into detail as I will talk about timers another time.

The only problem is that I ran into a dead end, I couldn't find anything related to the network driver in terms of modifying the clock interrupt timer.

3: kd> !list "-e -x \"dt nt!_EPROCESS @$extret-@@(#FIELD_OFFSET(nt!_EPROCESS, TimerResolutionLink)) ImageFileName SmallestTimerResolution RequestedTimerResolution\" nt!ExpTimerResolutionListHead"
dt nt!_EPROCESS @$extret-@@(#FIELD_OFFSET(nt!_EPROCESS, TimerResolutionLink)) ImageFileName SmallestTimerResolution RequestedTimerResolution
   +0x438 ImageFileName            : [15]  "???"
   +0x638 RequestedTimerResolution : 0x9c3d2000
   +0x63c SmallestTimerResolution  : 0xffffe001

dt nt!_EPROCESS @$extret-@@(#FIELD_OFFSET(nt!_EPROCESS, TimerResolutionLink)) ImageFileName SmallestTimerResolution RequestedTimerResolution
   +0x438 ImageFileName            : [15]  "svchost.exe"
   +0x638 RequestedTimerResolution : 0
   +0x63c SmallestTimerResolution  : 0x2710

So I thought I'd look into this a bit more.

3: kd> u ndis!ndisInterruptDpc+0x269ce
fffff801`dae9e81e 488b75a7        mov     rsi,qword ptr [rbp-59h]
fffff801`dae9e822 e9e297fdff      jmp     ndis!ndisInterruptDpc+0x1b9 (fffff801`dae78009)
fffff801`dae9e827 33d2            xor     edx,edx
fffff801`dae9e829 488d4dd7        lea     rcx,[rbp-29h]
fffff801`dae9e82d 448d420d        lea     r8d,[rdx+0Dh]
fffff801`dae9e831 e8160c0000      call    ndis!ndisPcwEndCycleCounter (fffff801`dae9f44c)
fffff801`dae9e836 90              nop
fffff801`dae9e837 e9d797fdff      jmp     ndis!ndisInterruptDpc+0x1c3 (fffff801`dae78013)

It appears the interrupt routine is looping for some reason.

I can't find anything on the cycle counter function as it is undocumented but I'll take a guess and say that it's keeping track of the time the interrupt has been executing, AFAIK this is don't by using a counter on the currently executing thread to see how long it's running.

3: kd> u nt!KeAcquireSpinLockRaiseToDpc+0x13
fffff800`922e8383 f605fcac270021  test    byte ptr [nt!PerfGlobalGroupMask+0x6 (fffff800`92563086)],21h
fffff800`922e838a 751f            jne     nt!KeAcquireSpinLockRaiseToDpc+0x3b (fffff800`922e83ab)
fffff800`922e838c f0480fba2900    lock bts qword ptr [rcx],0
fffff800`922e8392 7209            jb      nt!KeAcquireSpinLockRaiseToDpc+0x2d (fffff800`922e839d)
fffff800`922e8394 0fb6c3          movzx   eax,bl
fffff800`922e8397 4883c420        add     rsp,20h
fffff800`922e839b 5b              pop     rbx
fffff800`922e839c c3              ret

Here we can see the same DPC interrupt routine trying to acquire a spinlock yet it's not managing to do it and therefore looping all whilst it is still running at DPC level and therefore preventing normal thread execution.

Eventually it seems it managed to acquire the spinlock and then call a clock interrupt in order to perform some operation, I suspect updating the system time in order to service the network driver with higher response times.
The system realised that it was taking too long to complete and therefore bugchecked.

3: kd> lmvm athwbx
start             end                 module name
fffff801`dce44000 fffff801`dd1ff000   athwbx     (no symbols)          
    Loaded symbol image file: athwbx.sys
    Image path: \SystemRoot\system32\DRIVERS\athwbx.sys
    Image name: athwbx.sys
    Timestamp:        Thu Oct 17 10:46:01 2013 (525FB1D9)
    CheckSum:         003BC161
    ImageSize:        003BB000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

So the network driver was quite outdated, he updated it and the blue screens stopped so it looks like it was an easy fix.

Thanks for reading.

Tuesday, 2 September 2014

DPCs and APCs

I got back off holiday yesterday and I feel it's time for a blog post, this might be a long one as I have spent some time reading Windows Internals and looking at various subjects of interrupts DPCs and APCs.

So a DPC or Deferred Procedure Call is a way for the Kernel to defer execution until a later date, this is a good way for Windows to stop threads that take too long executing or are running for long periods of time at a high IRQL from holding up the system.

DPCs and dispatch interrupts

Well when the Kernel needs to synchronise access to shared data structures the IRQL is raised to DPC/dispatch level or above, therefore thread dispatching is disabled along with system service calling and page faults aren't permitted. When thread dispatching needs to take place the Kernel requests a software interrupt but masks it until the the processor finishes the work it's currently executing, afterwards the processor lowers the IRQL to APC or PASSIVE_LEVEL but first checks if there are any pending dispatch interrupts waiting to be executed, if there are then the IRQL drops to DPC/Dispatch level and executes them. Using dispatch interrupts is a way to defer thread dispatching until the right conditions are met, this is useful as the current activity could be more important than the thread wanting to be executed and therefore makes more sense to finish more important work first.

DPCs are mainly used to defer systems tasks for later execution, ones that are less important and can be executed later on with little repercusions than the task being executed. Device drivers also use DPCs to process interrupts, Windows and device drivers try to keep the IRQL below device IRQLs to prevent the system from running into problems by holding up normal execution. DPCs are used to process timer expiration and to release threads that are waiting on a timer which has already expired, if a thread doesn't release a thread waiting on a timer then the thread will stay in the wait state forever which AFAIK causes a deadlock.
 To view DPCs queued on a certain processor then you can use the !dpcs extension.

DPCs Advanced

DPCs do not run in the context of a particular thread being executed and therefore are not allowed to call system services, generate page faults and cannot create/wait for dispatcher objects. I will talk about dispatcher objects in another post as they are a long topic but simply, they are synchronisation mechanisms such as mutex objects, event objects, semaphores and timer objects.However, DPCs can access nonpaged memory as it is mapped in the system address space at all times.
DPCs are put in per processor queues meaning each processor has their own queue which are all independent from each other. A DPC is represented by a DPC object which contains code for the thread being executed, the most important information is the location of the thread in memory being executed when the DPC is initiated. Think of it as a shortcut on the desktop, without the shortcut containing the exact memory location of the actual program it can't run it.
I know what you're thinking now, "why can't the DPC just find the thread in memory if it isn't in the code?"
It's possible to do but it would take that long it would hold everything up and would cause more problems if it did so.

DPCs have priorities (low, medium, medium-high and high), these priorities determine when the DPC is executed and how it is added to the queue, by default device drivers set the DPC at medium but it can be overwritten. When the DPC priority is high it is placed at the front of the DPC queue for that particular processor, if the priority is medium-high or lower it places the DPC at the back of the queue. DPCs also use targeting which determines which CPU will execute the DPC, unfortunately most device drivers don't specify a processor which results in processor 0 being overwhelmed with DPCs.
DPCs are then pulled out of the queue and executed one by one until the queue is empty, this is known as draining, once the queue is drained the IRQL can drop below DPC/Dispatch level.

DPC draining takes place when the DPC priority is higher than low and is targeted at the current processor executing, if the priority is low then DPC draining takes place when the amount of DPCs in the queue exceed a certain threshold.
If the targeted CPU is different from the one running the ISR then depending on the priority depends on how the DPC is handled. If the priority is high or medium-high then the current processor sends out an IPI (Inter Processor Interrupt) to the targeted processor to drain the DPC queue, this happens only when the targeted CPU is idle.
If the priority is low or medium then the targeted processor's DPC queue must exceed a threshold for the draining to be initiated or the system is idle.

To view the Inter Processor Interrupts pending then you can use the !ipi extension.

The Kernel uses DPCs paired with clock interrupts, when a clock interrupt occurs (at every system clock tick count) the IRQL is raised to Clock level where the clock interrupt handler updates the system time, it then decrements a counter to track how long the current thread as run and when the counter reaches 0 the thread's quantum has expired and the kernel might need to reschedule the processor which is a low priority task and should be done later on which is when it gets implemented as a DPC. When the clock interrupt handler finishes its work the Kernel lowers the IRQL and executes the DPC.

The main problem with DPCs is that they don't run in the context of a particular thread and therefore don't care what thread is currently executing meaning even the highest priority threads can be interrupted which is why you can experience slowness and sound lagging. This can be worked around by using Threaded DPCs which run at Passive level IRQL and run as a real time thread (level 31), this allows the DPC to preempt most user mode threads as they don't run at real time thread priority but allows nonthreaded DPCs, APCs and higher priority threads to preempt the routine, preventing this problem.

For more information on interrupts then take a look here

Asynchronous Procedure Calls

Asynchronous Procedure Calls or APCs provide a way for user mode and system code to execute in context to a particular thread. Due to the fact that APCs run in context of a particular thread and therefore in a process address space they don't have the same restrictions to a DPC in that they can acquire objects, wait for object handles, incur page faults and call system services.

APCs are described by a Kernel object called an APC object and APCs waiting to run are placed in an APC queue which unlike DPC queues (which are per processor) are in the context of a thread rather than per processor meaning every thread has its own APC queue. The Kernel requests a software interrupt at APC level and when the thread is ran the APC is executed.

There are two kinds of APCs, Kernel APCs and User APCs which differ in that User APCs need permission from the thread it is trying to run on where as Kernel APCs run without permission.
There are then two types of Kernel mode APCs, normal and special; Special APCs run at APC IRQL and allow the modification of the normal APC, the normal APC runs and uses the altered parameters from the special APC or the original ones if there was no special APC.

Special APCs can be disabled by calling the KeEnterGuardedRegion which sets the SpecialApcDelivery field in the KTHREAD data structure shown here.

You can also see the KernelApcDisable and CombinedApcDisable entries, these are self-explanatory.
You can view the KTHREAD data structure by typing in dt nt!_KTHREAD followed by the thread's address in Windbg.

Special Kernel APCs are used, for example to show results after an asynchronous I/O operation has completed from and is then saved in the thread's address space.
Kernel APCs are also used mainly for thread suspension and termination as well as to query the context of a particular thread, this can be ran from arbitrary threads so device drivers often block APCs when holding a lock otherwise the lcok might never be released resulting in a system deadlock.

There are lots of different APIs that are user mode APCs such as ReadFileEx and WriteFileEx which allow the caller to initiate certain routines when an I/O operation completes. User mode threads however can't recieve APCs unless they're in a wait state by waiting for an object handle or volunatry entering a wait state by calling WaitForMultipleObjectsEx, this allows the kernel to alert the thread which can then let the thread recieve the APC and allows it to be executed. After the APC has completed the thread can resume execution as normal.

Unfortunately when APCs are initiated and the thread is in a wait state, when the APC completes and the thread checks whether the wait is resolved, if it isn't then it is put at the back of the wait queue for the objects it's waiting for. In other words when APCs are initiated the wait on a thread is removed until the APC completes, then it's put at the back of the thread wait list for the objects it's trying to access.

The APCs have different priorities like DPCs:
-Special Kernel APCs which are inserted at the tail of the Kernel mode APC list, theyare ran at APC level and when the thread isn't in a critical region, it's then given pointers to the parameters of the Kernel APC associated with it.
- Kernel APCs are implemented after the last Special Kernel APC (ahead of the rest of the Kernel APCS) and ran at Passive level , it recieves the arguements given from the Special Kernel APC associated with it (or none if there wasn't any alterations).
- User APCs are inserted at the tail of the user mode APC list, they are ran at passive level and when the thread isn't in a guarded region and the thread is in an alerted state, it is then given the arguments from the Special APC.

0x1 bugchecks (APC_INDEX_MISMATCH) can be caused by APCs not entering and leaving guarded or critical states correctly, there are mainly two reasons:
-The thread didn't leave the guarded/critical region which is needed as the I/O manager and Object manager use APCs to perform a lot of operations.
-The thread entered and left guarded/critical regions too many times.

The best option for finding the thread at fault is to enable Driver Verifier.
I believe the option to monitor APCs is Automatic Checks but I'm not too sure.
For more information on Driver Verifier see here:

I was going to talk a lot about timers but I've already written a lot here so I think I'll save that for a seperate post, a lot of this is from Windows Internals but I've tried to explain it a bit differently as it can be overwhelming.