- May 7, 2013
- 10,400
Rich (BB code):
INTERNAL_POWER_ERROR (a0)
The power policy manager experienced a fatal error.
Arguments:
Arg1: 0000000000004001, (INTERNAL_POWER_ERROR_KE_SUBCODE) An internal failure has
occured in kernel executive during a power operation.
Arg2: 0000000000000102, (INTERNAL_POWER_ERROR_KE_SETDESTINATION_FAILED)
Failed to change the target destination of an
interrupt line.
Arg3: ffff868ca9b03c30, (reserved)
Arg4: ffffffffc0350057, (reserved)
This bugcheck was ultimately caused by a bug between the integrated graphics card and the IOMMU. The issue lies with the UMA or Unified Memory Architecture. To resolve the issue, you will need explicitly set the UMA Frame buffer size to either 1GB or 2GB, however, it is recommended that you check your processor documentation since this may vary between devices. The rest of this post will attempt to explain why this fix works.
Let's begin with the call stack:
Rich (BB code):
10: kd> knL
# Child-SP RetAddr Call Site
00 ffffab8f`96cc7578 fffff800`3be825f4 nt!KeBugCheckEx
01 ffffab8f`96cc7580 fffff800`3bc1e4d8 nt!KiIntSteerSetDestination+0x130d94
02 ffffab8f`96cc75c0 fffff800`3bc1e2f2 nt!KiIntSteerDistributeInterrupts+0xd8
03 ffffab8f`96cc7600 fffff800`3bc1e047 nt!KeIntSteerPeriodic+0xd2
04 ffffab8f`96cc7710 fffff800`3bc1f5c0 nt!PpmParkSteerInterrupts+0x447
05 ffffab8f`96cc7b40 fffff800`3bc9a32e nt!PpmCheckRun+0x40
06 ffffab8f`96cc7bb0 fffff800`3bc99614 nt!KiExecuteAllDpcs+0x30e
07 ffffab8f`96cc7d20 fffff800`3bdfe115 nt!KiRetireDpcList+0x1f4
08 ffffab8f`96cc7fb0 fffff800`3bdfdf00 nt!KxRetireDpcList+0x5
09 ffffab8f`9a1676a0 fffff800`3bdfd5ce nt!KiDispatchInterruptContinue
0a ffffab8f`9a1676d0 fffff800`3bca181d nt!KiDpcInterrupt+0x2ee
0b ffffab8f`9a167860 fffff800`3bca0677 nt!MiUnlockWorkingSetShared+0xad
0c ffffab8f`9a167890 fffff800`3bc9f1da nt!MiUserFault+0xf27
0d ffffab8f`9a167920 fffff800`3be0525e nt!MmAccessFault+0x16a
0e ffffab8f`9a167ac0 00007ffb`5e8a028e nt!KiPageFault+0x35e
0f 0000005e`7af18df8 00000000`00000000 0x00007ffb`5e8a028e
The key point here is the last few calls, these are related to the processor power management (PPM) infrastructure, hence why we've experienced a power-related bugcheck. To be more specific, the calls are implementing a key feature which is known as core parking and is aimed at providing better power performance by reducing the number of processor cores which are required to service an interrupt.
The PPM and the thread scheduler will determine the number of cores which are required and then "park" those cores which are not required. These cores are then set to a low power state in order to reduce power consumption. The cores which have been nominated to not be parked will then have interrupts such as DPCs and threads "steered" towards them. This is determined by an internal algorithm which is implemented by the nt!PpmCheckRun function which is periodically called based upon a timer which is fired at set intervals.
Most of the PPM statistics can be found by running the !ppm debugger command or by examining the PowerState field of the processor's KPRCB structure:
Rich (BB code):
0: kd> dt _KPRCB -y PowerState -v
nt!_KPRCB
struct _KPRCB, 376 elements, 0xbf00 bytes
+0x8340 PowerState : struct _PROCESSOR_POWER_STATE, 50 elements, 0x230 bytes
The !ppmstate debugger command will provide you with the address of this structure immediately if you're interested in examining as shown below.
Rich (BB code):
0: kd> !ppmstate
Prcb.PowerState - 0xfffff8072061e4c0
IdleStates: 0x0000000000000000
IdleTimeLast: 0.000.000us (0x0 )
IdleTimeTotal: 0.000.000us (0x0 )
IdleAccounting: 0x0000000000000000
[...]
There is a few other commands associated to processor power management which include !ppmcheck and !ppmsettings.
At this point you may wondering what does UMA have to do with this? With integrated graphics cards and controllers, they do not have their dedicated video memory (VRAM) and therefore will share a portion of the RAM. This is implemented and managed by the IOMMU through the use of GPU isolation. This is to ensure that the graphics card controller is only able to access portions of RAM allocated for it. This special portion is called the frame buffer. These frame buffers are typically accessed during power transitions.
Now, in order to utilise the IOMMU technology, Intel and AMD introduced hardware virtualisation which was partly intended for the use of virtual machines and to prevent DMA attacks by ensuring that devices could only access portions of memory allocated to them by the IOMMU addressing tables. The IOMMU is also used by hardware virtualisation for interrupt remapping; the pieces should all be fitting together quite nicely now.
For the processor to determine where an interrupt originated from, each device is associated with a source or device Id. If we examine the fourth parameter of the bugcheck, we've see that the following error:
Rich (BB code):
10: kd> !error ffffffffc0350057
Error code: (NTSTATUS) 0xc0350057 (3224698967) - The supplied device ID is invalid.
Most modern - if not all - PCIe devices use MSI interrupts instead of the traditional line-based interrupt format. MSI interrupts involve a device writing to a particular I/O memory-mapped address, which in this case, would be part of the frame buffer for the GPU. The interrupt is then remapped by the IOMMU through a interrupt routing table, which keeps track of where the interrupt originated from and the destination processor core. If the device Id is not recognised then the interrupt is discarded and a bugcheck is thrown as shown above.
References:
https://www.amd.com/en/support/kb/faq/pa-280
Understanding the Windows I/O System | Microsoft Press Store
IOMMU-based GPU isolation - Windows drivers
GitHub - 3dgie/AMD-Vi: Some documentation on AMD IOMMU emulation in Qemu
Life of Interrupts: Remapping | Cloud, Computing, Chaos