- May 7, 2013
- 10,400
Within the past week or so, I've noticed a number of Stop 0xCA crashes caused by two drivers in particular: dptf_acpi.sys and dptf_cpu.sys. Both of which, appear to belong to same driver package which is Intel Dynamic Platform Thermal Framework (dptf). Despite the DRT stating that the driver is available from Intel, there doesn't appear to be any updates and from what I understand, this driver appears to be exclusively delivered through Windows Update. Most users have reported that using a system restore point has managed to resolve the issue for them.
I've been looking at the processors of the affected machines and there doesn't appear to be any discernible pattern, other than they all belong to the same processor family which is 0x06. This processor family includes a wide variety of different processors. If you're able to, then please check the processor information using the !sysinfo cpuinfo command.
The main purpose of this post to provide some information on what is happening with this crash, at the moment, I'm not aware of any definitive solution other than waiting for the driver to be patched by Intel.
As we can see from the bugcheck description and first parameter, a driver has failed to maintain a legal value within the combined APC disable field for the associated thread. We're discuss this a little later on, but I would recommend reading about APCs before continuing with this post. They're quite a lengthy topic and therefore I won't discuss them here.
Firstly, let's dump the driver object.
Looks like the issue is with the Intel ACPI driver. Each of the associated device and their stacks appear to be related to a slightly different thermal control device e.g. fans.
So, we know which driver is responsible, however, we don't know what exactly how it was caused. Let's look at the call stack for the thread.
I've highlighted the two most important parts of the call stack, but before we delve into those a little further, it would be best I briefly explained what PnP notifications are. Much like registry filter callbacks, the PnP Manager provides a number of different PnP events which drivers are able to subscribe to and then perform an action accordingly. For example, you may wish to listen to a particular event for debugging purposes and then log to a file each time it happens. Each PnP event belongs to a particular event category, these are stored in an enumeration called IO_NOTIFICATION_EVENT_CATEGORY.
A driver registers it's callback routine (the function called when the event happens) by calling IoRegisterPlugPlayNotification. Once the event which the driver has subscribed to has fired, then the driver's registered callback routine will be executed. There is a number of different rules which a notification routine must follow and this includes ensuring that the notification routine is only called at PASSIVE_LEVEL (IRQL Level 0). These rules are described here.
We can find the event and the event category by examining the second parameter passed to the PnpNotifyDriverCallback function. The callback routine takes a notification structure object which describes which event the callback is registered for.
As we can see, the notification callback routine was registered for the device interface change event category, which is responsible for checking when a device interface has been enabled or disabled. A device interface is a broad identifier for a number of different devices. For example, all of your disk storage devices would have a device interface identifier which indicated that it was a storage device.
The event in question can be found in the Event field. In this case, we were listening to an interface arrival event, which is the enabling of a new device with that particular device interface. For instance, imagine that you inserted a USB device, this would likely trigger a device interface arrival event as shown above.
I would just like to point out, if you don't know the type of notification structure the callback routine takes, then you can use the _PLUGPLAY_NOTIFICATION_HEADER structure first and then determine the appropriate structure from the Event field, since notification header structure is cast to the more specific structure based on that field value.
In fact, the same information can be obtained by using the !pnpevent command.
You may have noticed that the event category is a slightly different value, this is because the _PLUGPLAY_EVENT_BLOCK structure uses the the _PLUGPLAY_EVENT_CATEGORY enumeration instead.
I mentioned earlier about the combined APC disable count and the fact that it can be obtained from the thread itself. In order to do so, you simply have to dump the CombinedApcDisable field as so:
As you can see, the field has an invalid value of 0xFFFF. This value is actually the combination of two fields: KernelApcDisable and SpecialApcDisable. Both of which are part of the _KTHREAD structure.
The CombinedApcDisable is then calculated using the following expression:
As we can see, the KernelApcDisable field has a value of -1 which isn't valid; a driver has left a critical region more times than it has entered one (if it entered one), which leads to the value of -1. This is a critical error and therefore the system crashes with the bugcheck shown. We know that the IRQL level was correct since the callback routine must be called at PASSIVE_LEVEL.
This crash seems to commonly occur during boot, whereupon the PnP Manager will be asking for which devices are currently connected to the system.
References:
Handling Device Interface Change Events - Windows drivers
Guidelines for Writing PnP Notification Callback Routines - Windows drivers
Using PnP Notification - Windows drivers
I've been looking at the processors of the affected machines and there doesn't appear to be any discernible pattern, other than they all belong to the same processor family which is 0x06. This processor family includes a wide variety of different processors. If you're able to, then please check the processor information using the !sysinfo cpuinfo command.
The main purpose of this post to provide some information on what is happening with this crash, at the moment, I'm not aware of any definitive solution other than waiting for the driver to be patched by Intel.
Rich (BB code):
PNP_DETECTED_FATAL_ERROR (ca)
PnP encountered a severe error, either as a result of a problem in a driver or
a problem in PnP itself. The first argument describes the nature of the
problem, the second argument is the address of the PDO. The other arguments
vary depending on argument 1.
Arguments:
Arg1: 000000000000000a, Incorrect notify callback behavior
Driver failed to preserve IRQL or combined APC disable across
a PlugPlay notification.
Arg2: ffffd68b8f2e6a70, Driver Object.
Arg3: 0000000000000000, IRQL after returning from driver callback.
Arg4: 000000000000ffff, Combined APC disable count after returning from driver callback.
As we can see from the bugcheck description and first parameter, a driver has failed to maintain a legal value within the combined APC disable field for the associated thread. We're discuss this a little later on, but I would recommend reading about APCs before continuing with this post. They're quite a lengthy topic and therefore I won't discuss them here.
Firstly, let's dump the driver object.
Rich (BB code):
1: kd> !drvobj ffffd68b8f2e6a70
Driver object (ffffd68b8f2e6a70) is for:
\Driver\dptf_acpi
Driver Extension List: (id , addr)
(fffff80582304d50 ffffd68b90a34950)
Device Object list:
ffffd68b90b4ad00 ffffd68b90b4cd00 ffffd68b90b4dd00 ffffd68b90b4ed00
Looks like the issue is with the Intel ACPI driver. Each of the associated device and their stacks appear to be related to a slightly different thermal control device e.g. fans.
Rich (BB code):
1: kd> !devstack ffffd68b90b4ad00
!DevObj !DrvObj !DevExt ObjectName
> ffffd68b90b4ad00 \Driver\dptf_acpi ffffd68b90f62b30
ffffd68b901b7cf0 \Driver\ACPI ffffd68b9012ebe0 00000033
!DevNode ffffd68b90184310 :
DeviceInst is "ACPI\INT3402\TMEM"
ServiceName is "dptf_acpi"
So, we know which driver is responsible, however, we don't know what exactly how it was caused. Let's look at the call stack for the thread.
Rich (BB code):
1: kd> knL
# Child-SP RetAddr Call Site
00 fffff900`78aa7998 fffff805`7e64bf58 nt!KeBugCheckEx
01 fffff900`78aa79a0 fffff805`7e522f26 nt!PnpNotifyDriverCallback+0x13ee78 << Crash here!
02 fffff900`78aa7a50 fffff805`7e50a400 nt!PnpNotifyDeviceClassChange+0x18e << The event category which the callback belongs to
03 fffff900`78aa7af0 fffff805`7e025975 nt!PnpDeviceEventWorker+0x290
04 fffff900`78aa7b70 fffff805`7e117e85 nt!ExpWorkerThread+0x105
05 fffff900`78aa7c10 fffff805`7e1fd2a8 nt!PspSystemThreadStartup+0x55
06 fffff900`78aa7c60 00000000`00000000 nt!KiStartSystemThread+0x28
I've highlighted the two most important parts of the call stack, but before we delve into those a little further, it would be best I briefly explained what PnP notifications are. Much like registry filter callbacks, the PnP Manager provides a number of different PnP events which drivers are able to subscribe to and then perform an action accordingly. For example, you may wish to listen to a particular event for debugging purposes and then log to a file each time it happens. Each PnP event belongs to a particular event category, these are stored in an enumeration called IO_NOTIFICATION_EVENT_CATEGORY.
Rich (BB code):
1: kd> dt IO_NOTIFICATION_EVENT_CATEGORY
Wdf01000!IO_NOTIFICATION_EVENT_CATEGORY
EventCategoryReserved = 0n0
EventCategoryHardwareProfileChange = 0n1
EventCategoryDeviceInterfaceChange = 0n2
EventCategoryTargetDeviceChange = 0n3
EventCategoryKernelSoftRestart = 0n4
A driver registers it's callback routine (the function called when the event happens) by calling IoRegisterPlugPlayNotification. Once the event which the driver has subscribed to has fired, then the driver's registered callback routine will be executed. There is a number of different rules which a notification routine must follow and this includes ensuring that the notification routine is only called at PASSIVE_LEVEL (IRQL Level 0). These rules are described here.
We can find the event and the event category by examining the second parameter passed to the PnpNotifyDriverCallback function. The callback routine takes a notification structure object which describes which event the callback is registered for.
Rich (BB code):
1: kd> !stack -p
Call Stack : 7 frames
## Stack-Pointer Return-Address Call-Site
00 fffff90078aa7998 fffff8057e64bf58 nt!KeBugCheckEx+0
Parameter[0] = 00000000000000ca
Parameter[1] = 000000000000000a
Parameter[2] = ffffd68b8f2e6a70
Parameter[3] = 0000000000000000
01 fffff90078aa79a0 fffff8057e522f26 nt!PnpNotifyDriverCallback+13ee78 (perf)
Parameter[0] = ffffe688ce3d71f0
Parameter[1] = fffff90078aa7a78 << Notification structure
Parameter[2] = fffff90078aa7a70
Parameter[3] = (unknown)
[...]
Rich (BB code):
1: kd> dt _DEVICE_INTERFACE_CHANGE_NOTIFICATION fffff90078aa7a78
Wdf01000!_DEVICE_INTERFACE_CHANGE_NOTIFICATION
+0x000 Version : 1
+0x002 Size : 0x30
+0x004 Event : _GUID {cb3a4004-46f0-11d0-b08f-00609713053f} << GUID_DEVICE_INTERFACE_ARRIVAL
+0x014 InterfaceClassGuid : _GUID {ee27098e-1b22-472a-89d8-5ccce16b1356}
+0x028 SymbolicLinkName : 0xfffff900`78aa7b28 _UNICODE_STRING "\??\ACPI#INT3400#2&daba3ff&0#{ee27098e-1b22-472a-89d8-5ccce16b1356}"
As we can see, the notification callback routine was registered for the device interface change event category, which is responsible for checking when a device interface has been enabled or disabled. A device interface is a broad identifier for a number of different devices. For example, all of your disk storage devices would have a device interface identifier which indicated that it was a storage device.
The event in question can be found in the Event field. In this case, we were listening to an interface arrival event, which is the enabling of a new device with that particular device interface. For instance, imagine that you inserted a USB device, this would likely trigger a device interface arrival event as shown above.
I would just like to point out, if you don't know the type of notification structure the callback routine takes, then you can use the _PLUGPLAY_NOTIFICATION_HEADER structure first and then determine the appropriate structure from the Event field, since notification header structure is cast to the more specific structure based on that field value.
Rich (BB code):
1: kd> dt _PLUGPLAY_NOTIFICATION_HEADER fffff90078aa7a78
Wdf01000!_PLUGPLAY_NOTIFICATION_HEADER
+0x000 Version : 1
+0x002 Size : 0x30
+0x004 Event : _GUID {cb3a4004-46f0-11d0-b08f-00609713053f} << GUID_DEVICE_INTERFACE_ARRIVAL
In fact, the same information can be obtained by using the !pnpevent command.
Rich (BB code):
1: kd> !pnpevent
********************************************************************************
Dumping PnP DeviceEvent Queue @ 0xffffd68b8f2a9eb0
********************************************************************************
List = 0xffffe688cf13f280, 0xffffe688cf13f280
Dumping DeviceEventEntry @ 0xffffe688cf13f280
ListEntry = 0xffffd68b8f2a9f28, 0xffffd68b8f2a9f28, Argument = 0x00000000
CallerEvent = 0x00000000, Callback = 0x00000000, Context = 0x00000000
VetoType = 0x00000000, VetoName = 0x00000000
Dumping PlugPlayEventBlock @ 0xCF13F2F0
EventGuid = GUID_DEVICE_INTERFACE_ARRIVAL
Category = DeviceClassChangeEvent
Result = 0x00000000, Flags = 0x00000000, TotalSize = 214
DeviceObject = 0x00000000
ClassGuid = 09A5554B-DA25-4461-8D80-5BC6C96DD932
SymbolicLinkName = \??\ACPI#INT3400#2&daba3ff&0#{09a5554b-da25-4461-8d80-5bc6c96dd932}
Total events in the list: 1
You may have noticed that the event category is a slightly different value, this is because the _PLUGPLAY_EVENT_BLOCK structure uses the the _PLUGPLAY_EVENT_CATEGORY enumeration instead.
Rich (BB code):
1: kd> dt _PLUGPLAY_EVENT_CATEGORY
nt!_PLUGPLAY_EVENT_CATEGORY
HardwareProfileChangeEvent = 0n0
TargetDeviceChangeEvent = 0n1
DeviceClassChangeEvent = 0n2
CustomDeviceEvent = 0n3
DeviceInstallEvent = 0n4
DeviceArrivalEvent = 0n5
VetoEvent = 0n6
BlockedDriverEvent = 0n7
InvalidIDEvent = 0n8
DevicePropertyChangeEvent = 0n9
DeviceInstanceRemovalEvent = 0n10
DeviceInstanceStartedEvent = 0n11
MaxPlugEventCategory = 0n12
I mentioned earlier about the combined APC disable count and the fact that it can be obtained from the thread itself. In order to do so, you simply have to dump the CombinedApcDisable field as so:
Rich (BB code):
1: kd> dt _KTHREAD CombinedApcDisable ffffd68b90eb5040
nt!_KTHREAD
+0x1e4 CombinedApcDisable : 0xffff
As you can see, the field has an invalid value of 0xFFFF. This value is actually the combination of two fields: KernelApcDisable and SpecialApcDisable. Both of which are part of the _KTHREAD structure.
Rich (BB code):
1: kd> dt _KTHREAD SpecialApcDisable ffffd68b90eb5040
nt!_KTHREAD
+0x1e6 SpecialApcDisable : 0n0
Rich (BB code):
1: kd> dt _KTHREAD KernelApcDisable ffffd68b90eb5040
nt!_KTHREAD
+0x1e4 KernelApcDisable : 0n-1
The CombinedApcDisable is then calculated using the following expression:
Rich (BB code):
1: kd> ? 0n0 << 16 | 0n-1
Evaluate expression: -1 = ffffffff`ffffffff
As we can see, the KernelApcDisable field has a value of -1 which isn't valid; a driver has left a critical region more times than it has entered one (if it entered one), which leads to the value of -1. This is a critical error and therefore the system crashes with the bugcheck shown. We know that the IRQL level was correct since the callback routine must be called at PASSIVE_LEVEL.
This crash seems to commonly occur during boot, whereupon the PnP Manager will be asking for which devices are currently connected to the system.
References:
Handling Device Interface Change Events - Windows drivers
Guidelines for Writing PnP Notification Callback Routines - Windows drivers
Using PnP Notification - Windows drivers