Debugging Stop 0xCA - dptf_acpi.sys & dptf_cpu.sys

x BlueRobot · Apr 13, 2021

Within the past week or so, I've noticed a number of Stop 0xCA crashes caused by two drivers in particular: dptf_acpi.sys and dptf_cpu.sys. Both of which, appear to belong to same driver package which is Intel Dynamic Platform Thermal Framework (dptf). Despite the DRT stating that the driver is available from Intel, there doesn't appear to be any updates and from what I understand, this driver appears to be exclusively delivered through Windows Update. Most users have reported that using a system restore point has managed to resolve the issue for them.

I've been looking at the processors of the affected machines and there doesn't appear to be any discernible pattern, other than they all belong to the same processor family which is 0x06. This processor family includes a wide variety of different processors. If you're able to, then please check the processor information using the !sysinfo cpuinfo command.

The main purpose of this post to provide some information on what is happening with this crash, at the moment, I'm not aware of any definitive solution other than waiting for the driver to be patched by Intel.

Rich (BB code):

PNP_DETECTED_FATAL_ERROR (ca)
PnP encountered a severe error, either as a result of a problem in a driver or
a problem in PnP itself.  The first argument describes the nature of the
problem, the second argument is the address of the PDO.  The other arguments
vary depending on argument 1.
Arguments:
Arg1: 000000000000000a, Incorrect notify callback behavior
    Driver failed to preserve IRQL or combined APC disable across
    a PlugPlay notification.
Arg2: ffffd68b8f2e6a70, Driver Object.
Arg3: 0000000000000000, IRQL after returning from driver callback.
Arg4: 000000000000ffff, Combined APC disable count after returning from driver callback.

As we can see from the bugcheck description and first parameter, a driver has failed to maintain a legal value within the combined APC disable field for the associated thread. We're discuss this a little later on, but I would recommend reading about APCs before continuing with this post. They're quite a lengthy topic and therefore I won't discuss them here.

Firstly, let's dump the driver object.

Rich (BB code):

1: kd> !drvobj ffffd68b8f2e6a70
Driver object (ffffd68b8f2e6a70) is for:
\Driver\dptf_acpi

Driver Extension List: (id , addr)
(fffff80582304d50 ffffd68b90a34950) 
Device Object list:
ffffd68b90b4ad00  ffffd68b90b4cd00  ffffd68b90b4dd00  ffffd68b90b4ed00

Looks like the issue is with the Intel ACPI driver. Each of the associated device and their stacks appear to be related to a slightly different thermal control device e.g. fans.

Rich (BB code):

1: kd> !devstack ffffd68b90b4ad00
  !DevObj           !DrvObj            !DevExt           ObjectName
> ffffd68b90b4ad00  \Driver\dptf_acpi  ffffd68b90f62b30 
  ffffd68b901b7cf0  \Driver\ACPI       ffffd68b9012ebe0  00000033
!DevNode ffffd68b90184310 :
  DeviceInst is "ACPI\INT3402\TMEM"
  ServiceName is "dptf_acpi"

So, we know which driver is responsible, however, we don't know what exactly how it was caused. Let's look at the call stack for the thread.

Rich (BB code):

1: kd> knL
# Child-SP          RetAddr           Call Site
00 fffff900`78aa7998 fffff805`7e64bf58 nt!KeBugCheckEx
01 fffff900`78aa79a0 fffff805`7e522f26 nt!PnpNotifyDriverCallback+0x13ee78 << Crash here!
02 fffff900`78aa7a50 fffff805`7e50a400 nt!PnpNotifyDeviceClassChange+0x18e << The event category which the callback belongs to
03 fffff900`78aa7af0 fffff805`7e025975 nt!PnpDeviceEventWorker+0x290
04 fffff900`78aa7b70 fffff805`7e117e85 nt!ExpWorkerThread+0x105
05 fffff900`78aa7c10 fffff805`7e1fd2a8 nt!PspSystemThreadStartup+0x55
06 fffff900`78aa7c60 00000000`00000000 nt!KiStartSystemThread+0x28

I've highlighted the two most important parts of the call stack, but before we delve into those a little further, it would be best I briefly explained what PnP notifications are. Much like registry filter callbacks, the PnP Manager provides a number of different PnP events which drivers are able to subscribe to and then perform an action accordingly. For example, you may wish to listen to a particular event for debugging purposes and then log to a file each time it happens. Each PnP event belongs to a particular event category, these are stored in an enumeration called IO_NOTIFICATION_EVENT_CATEGORY.

Rich (BB code):

1: kd> dt IO_NOTIFICATION_EVENT_CATEGORY
Wdf01000!IO_NOTIFICATION_EVENT_CATEGORY
   EventCategoryReserved = 0n0
   EventCategoryHardwareProfileChange = 0n1
   EventCategoryDeviceInterfaceChange = 0n2
   EventCategoryTargetDeviceChange = 0n3
   EventCategoryKernelSoftRestart = 0n4

A driver registers it's callback routine (the function called when the event happens) by calling IoRegisterPlugPlayNotification. Once the event which the driver has subscribed to has fired, then the driver's registered callback routine will be executed. There is a number of different rules which a notification routine must follow and this includes ensuring that the notification routine is only called at PASSIVE_LEVEL (IRQL Level 0). These rules are described here.

We can find the event and the event category by examining the second parameter passed to the PnpNotifyDriverCallback function. The callback routine takes a notification structure object which describes which event the callback is registered for.

Rich (BB code):

1: kd> !stack -p
Call Stack : 7 frames
## Stack-Pointer    Return-Address   Call-Site      
00 fffff90078aa7998 fffff8057e64bf58 nt!KeBugCheckEx+0
    Parameter[0] = 00000000000000ca
    Parameter[1] = 000000000000000a
    Parameter[2] = ffffd68b8f2e6a70
    Parameter[3] = 0000000000000000
01 fffff90078aa79a0 fffff8057e522f26 nt!PnpNotifyDriverCallback+13ee78 (perf)
    Parameter[0] = ffffe688ce3d71f0
    Parameter[1] = fffff90078aa7a78 << Notification structure
    Parameter[2] = fffff90078aa7a70
    Parameter[3] = (unknown)

[...]

Rich (BB code):

1: kd> dt _DEVICE_INTERFACE_CHANGE_NOTIFICATION fffff90078aa7a78
Wdf01000!_DEVICE_INTERFACE_CHANGE_NOTIFICATION
   +0x000 Version          : 1
   +0x002 Size             : 0x30
   +0x004 Event            : _GUID {cb3a4004-46f0-11d0-b08f-00609713053f} << GUID_DEVICE_INTERFACE_ARRIVAL
   +0x014 InterfaceClassGuid : _GUID {ee27098e-1b22-472a-89d8-5ccce16b1356}
   +0x028 SymbolicLinkName : 0xfffff900`78aa7b28 _UNICODE_STRING "\??\ACPI#INT3400#2&daba3ff&0#{ee27098e-1b22-472a-89d8-5ccce16b1356}"

As we can see, the notification callback routine was registered for the device interface change event category, which is responsible for checking when a device interface has been enabled or disabled. A device interface is a broad identifier for a number of different devices. For example, all of your disk storage devices would have a device interface identifier which indicated that it was a storage device.

The event in question can be found in the Event field. In this case, we were listening to an interface arrival event, which is the enabling of a new device with that particular device interface. For instance, imagine that you inserted a USB device, this would likely trigger a device interface arrival event as shown above.

I would just like to point out, if you don't know the type of notification structure the callback routine takes, then you can use the _PLUGPLAY_NOTIFICATION_HEADER structure first and then determine the appropriate structure from the Event field, since notification header structure is cast to the more specific structure based on that field value.

Rich (BB code):

1: kd> dt _PLUGPLAY_NOTIFICATION_HEADER fffff90078aa7a78
Wdf01000!_PLUGPLAY_NOTIFICATION_HEADER
   +0x000 Version          : 1
   +0x002 Size             : 0x30
   +0x004 Event            : _GUID {cb3a4004-46f0-11d0-b08f-00609713053f} << GUID_DEVICE_INTERFACE_ARRIVAL

In fact, the same information can be obtained by using the !pnpevent command.

Rich (BB code):

1: kd> !pnpevent

********************************************************************************
Dumping PnP DeviceEvent Queue @ 0xffffd68b8f2a9eb0
********************************************************************************

List = 0xffffe688cf13f280, 0xffffe688cf13f280

Dumping DeviceEventEntry @ 0xffffe688cf13f280
  ListEntry = 0xffffd68b8f2a9f28, 0xffffd68b8f2a9f28, Argument = 0x00000000
  CallerEvent = 0x00000000, Callback = 0x00000000, Context = 0x00000000
  VetoType = 0x00000000, VetoName = 0x00000000

  Dumping PlugPlayEventBlock @ 0xCF13F2F0
    EventGuid = GUID_DEVICE_INTERFACE_ARRIVAL
    Category = DeviceClassChangeEvent
    Result = 0x00000000, Flags = 0x00000000, TotalSize = 214
    DeviceObject = 0x00000000
      ClassGuid = 09A5554B-DA25-4461-8D80-5BC6C96DD932
      SymbolicLinkName = \??\ACPI#INT3400#2&daba3ff&0#{09a5554b-da25-4461-8d80-5bc6c96dd932}

  Total events in the list: 1

You may have noticed that the event category is a slightly different value, this is because the _PLUGPLAY_EVENT_BLOCK structure uses the the _PLUGPLAY_EVENT_CATEGORY enumeration instead.

Rich (BB code):

1: kd> dt _PLUGPLAY_EVENT_CATEGORY
nt!_PLUGPLAY_EVENT_CATEGORY
   HardwareProfileChangeEvent = 0n0
   TargetDeviceChangeEvent = 0n1
   DeviceClassChangeEvent = 0n2
   CustomDeviceEvent = 0n3
   DeviceInstallEvent = 0n4
   DeviceArrivalEvent = 0n5
   VetoEvent = 0n6
   BlockedDriverEvent = 0n7
   InvalidIDEvent = 0n8
   DevicePropertyChangeEvent = 0n9
   DeviceInstanceRemovalEvent = 0n10
   DeviceInstanceStartedEvent = 0n11
   MaxPlugEventCategory = 0n12

I mentioned earlier about the combined APC disable count and the fact that it can be obtained from the thread itself. In order to do so, you simply have to dump the CombinedApcDisable field as so:

Rich (BB code):

1: kd> dt _KTHREAD CombinedApcDisable ffffd68b90eb5040
nt!_KTHREAD
   +0x1e4 CombinedApcDisable : 0xffff

As you can see, the field has an invalid value of 0xFFFF. This value is actually the combination of two fields: KernelApcDisable and SpecialApcDisable. Both of which are part of the _KTHREAD structure.

Rich (BB code):

1: kd> dt _KTHREAD SpecialApcDisable ffffd68b90eb5040
nt!_KTHREAD
   +0x1e6 SpecialApcDisable : 0n0

Rich (BB code):

1: kd> dt _KTHREAD KernelApcDisable ffffd68b90eb5040
nt!_KTHREAD
   +0x1e4 KernelApcDisable : 0n-1

The CombinedApcDisable is then calculated using the following expression:

Rich (BB code):

1: kd> ? 0n0 << 16 | 0n-1
Evaluate expression: -1 = ffffffff`ffffffff

As we can see, the KernelApcDisable field has a value of -1 which isn't valid; a driver has left a critical region more times than it has entered one (if it entered one), which leads to the value of -1. This is a critical error and therefore the system crashes with the bugcheck shown. We know that the IRQL level was correct since the callback routine must be called at PASSIVE_LEVEL.

This crash seems to commonly occur during boot, whereupon the PnP Manager will be asking for which devices are currently connected to the system.

References:

Handling Device Interface Change Events - Windows drivers
Guidelines for Writing PnP Notification Callback Routines - Windows drivers
Using PnP Notification - Windows drivers

x BlueRobot · Apr 19, 2021

Addendum:

Rich (BB code):

1: kd> !object \Global??\ACPI#INT3400#2&daba3ff&0#{09a5554b-da25-4461-8d80-5bc6c96dd932}
Object: ffffe688cef960d0  Type: (ffffd68b8f2acce0) SymbolicLink
    ObjectHeader: ffffe688cef960a0 (new version)
    HandleCount: 1  PointerCount: 2
    Directory Object: ffffe688c9c19ad0  Name: ACPI#INT3400#2&daba3ff&0#{09a5554b-da25-4461-8d80-5bc6c96dd932}
    Flags: 00000000 ( Local )
    Target String is '\Device\0000002f'

I just remembered that you can resolve symbolic links with the !object command.

Rich (BB code):

1: kd> !object \Device\0000002f
Object: ffffd68b901f18f0  Type: (ffffd68b8f2fb980) Device
    ObjectHeader: ffffd68b901f18c0 (new version)
    HandleCount: 0  PointerCount: 9
    Directory Object: ffffe688c9c65060  Name: 0000002f

Rich (BB code):

1: kd> !devobj ffffd68b901f18f0
Device object (ffffd68b901f18f0) is for:
 0000002f \Driver\ACPI DriverObject ffffd68b8f5fb4e0
Current Irp 00000000 RefCount 1 Type 00000032 Flags 00001040
SecurityDescriptor ffffe688ca0bf060 DevExt ffffd68b9012e010 DevObjExt ffffd68b901f1a40 DevNode ffffd68b8f2e9050 
ExtensionFlags (0000000000)  
Characteristics (0x00000180)  FILE_AUTOGENERATED_DEVICE_NAME, FILE_DEVICE_SECURE_OPEN
AttachedDevice (Upper) ffffd68b984a2de0 \Driver\esif_lf
Device queue is not busy.

esif_lf.sys is another driver which is part of the Intel Dynamic Platform Thermal Framework.

jcgriff2 · Apr 22, 2021

x BlueRobot said:
Within the past week or so, I've noticed a number of Stop 0xCA crashes caused by two drivers in particular: dptf_acpi.sys and dptf_cpu.sys.

Me too.

This reminds me of the Asus asacpi.sys fiasco that brought down (BSOD'd them to death) millions of systems back around 2010/2011 and continued for years.

For some reason, a still-unknown Windows driver was updated during Windows Update and clashed with asacpi.sys in most (but not all!) systems, thus causing massive amounts of BSODs and as I recall, never naming asacpi.sys as the "Probably caused by:" driver, either. I believe that most named NT or win32k.sys.

Former Sysnative Admin and MVP usasma (John Carrona) figured the fix out fairly quickly as he had asacpi.sys on all of his desktop systems (but not all BSOD'd !) -- update the 2004/2005 asacpi.sys driver in Vista and Windows 7 systems directly from Asus.

John

x BlueRobot · Apr 22, 2021

I wonder if this is very similar then? Interestingly, my old laptop hasn't had any updates or crashes related to those drivers.

jcgriff2 · Apr 22, 2021

Well, asacpi.sys was not the driver updated by WU; it was the driver that needed to be updated manually from the Asus driver site to stop the BSODs.

The actual driver(s) that caused the BSODs were never identified - it/they apparently adversely reacted with the outdated/older asacpi.sys.

usasma picked up on it because of the timestamp - 2004 or 2005 in a Vista and/or Windows 7 system, which should not have had drivers dated after late 2006 or after early 2009 respectively (Vista/W7).

Are you seeing any old(er) 3rd party drivers in the loaded driver listing (lmnt or lmntsm)?

John

x BlueRobot · Apr 22, 2021

I haven't checked to be honest, there doesn't appear to be specific pattern and Intel usually states that the user has all of the latest drivers.

Debugging Stop 0xCA - dptf_acpi.sys & dptf_cpu.sys

x BlueRobot

Administrator

x BlueRobot

Administrator

jcgriff2

Co-Founder / Admin
BSOD Instructor/Expert
Microsoft MVP (Ret.)

x BlueRobot

Administrator

jcgriff2

Co-Founder / Admin
BSOD Instructor/Expert
Microsoft MVP (Ret.)

x BlueRobot

Administrator

Debugging Stop 0xCA - dptf_acpi.sys & dptf_cpu.sys

x BlueRobot

Administrator

x BlueRobot

Administrator

jcgriff2

Co-Founder / AdminBSOD Instructor/ExpertMicrosoft MVP (Ret.)

x BlueRobot

Administrator

jcgriff2

Co-Founder / AdminBSOD Instructor/ExpertMicrosoft MVP (Ret.)

x BlueRobot

Administrator

Co-Founder / Admin
BSOD Instructor/Expert
Microsoft MVP (Ret.)

Co-Founder / Admin
BSOD Instructor/Expert
Microsoft MVP (Ret.)