- May 7, 2013
- 10,400
Rich (BB code):
VIDEO_MEMORY_MANAGEMENT_INTERNAL (10e)
The video memory manager encountered a condition that it can't recover from. By crashing,
the video memory manager is attempting to get enough information into the minidump such that
somebody can pinpoint what lead to this condition.
Arguments:
Arg1: 0000000000000017, Unexpected system command failure.
Arg2: ffffb5070e258000 << Related to video memory management
Arg3: 0000000000000000
Arg4: ffffffffc00002b6 << NTStatus code
This is a generic bugcheck and covers a number of different exceptions which the video manager may encounter. The first parameter of the bugcheck indicates the type of exception which has been thrown. From experience and looking at other threads, it would seem that 0x17 is the most common value for this parameter; a system command has unexpectedly encountered an error which it couldn't resolve. The call stack is identical for each bugcheck which has 0x17 as the first parameter and therefore this tutorial should be applicable to all of this variant.
Let's begin by examining the bugcheck parameters. We've already established the meaning behind the first parameter so we'll focus on the second and fourth parameter. The second parameter is a pointer to some internal video memory manager structure or object and unfortunately I haven't seen anyone mention what it could possibly be.
Rich (BB code):
10: kd> !pool ffffb5070e258000
Pool page ffffb5070e258000 region is Nonpaged pool
*ffffb5070e258000 : large page allocation, tag is Vi15, size is 0xaf80 bytes
Pooltag Vi15 : Video memory manager global state, Binary : dxgmms2.sys
However, fortunately, the second parameter doesn't affect our debugging efforts and can be ignored. The fourth parameter is far more interesting and describes the type of exception which has occurred.
Rich (BB code):
10: kd> !error c00002b6
Error code: (NTSTATUS) 0xc00002b6 (3221226166) - The device has been removed.
It appears that an exception was thrown by the video memory manager because a device was removed. Since we know the video memory manager is related to the graphics card, we can safely assume that the device being removed was likely a component in the graphics card device stack. Please note that device removal doesn't necessarily equate to physical device removal. It can be the device being removed from the device tree.
If we take a look at the call stack, we'll notice a few clues as to what may be happening:
Rich (BB code):
10: kd> knL
# Child-SP RetAddr Call Site
00 fffffe83`4754a888 fffff806`2185529a nt!KeBugCheckEx
01 fffffe83`4754a890 fffff806`232f1c73 watchdog!WdLogSingleEntry5+0x39aa
02 fffffe83`4754a920 fffff806`2323c009 dxgmms2!VIDMM_GLOBAL::RestoreFromPurge+0x2a2cb << Crash here!
03 fffffe83`4754a9e0 fffff806`215cd225 dxgmms2!VidMmRestoreFromPurge+0x9
04 fffffe83`4754aa10 fffff806`21552aaa dxgkrnl!ADAPTER_RENDER::RestoreFromPurgeSegments+0x35
05 fffffe83`4754aa70 fffff806`21552981 dxgkrnl!DXGADAPTER::ReleaseCoreSync+0xfa
06 fffffe83`4754aad0 fffff806`215f17ab dxgkrnl!DxgkReleaseAdapterCoreSync+0x4d
07 fffffe83`4754ab50 fffff806`11d5b615 dxgkrnl!DpiPowerArbiterThread+0x29b
08 fffffe83`4754abb0 fffff806`11e16c24 nt!PspSystemThreadStartup+0x55
09 fffffe83`4754ac00 00000000`00000000 nt!KiStartSystemThread+0x34
We can see that some kind of power thread has been initialised with the dxgkrnl!DpiPowerArbiterThread call, shortly afterwards, there is a call to the dxgkrnl!DXGADAPTER::ReleaseCoreSync and subsequently a final call to the video memory manager with dxgmms2!VIDMM_GLOBAL::RestoreFromPurge. These three calls are most important clues to what has happened and to understand this, we need to briefly discuss some of the features of post-Vista video memory management.
Since the introduction of Windows Vista, there has been a number of changes to the WDDM and subsequently how the Direct X graphics kernel works as well. The most notable change for our case, is the introduction of virtual video memory and the ability for the video memory manager to move pages to and from RAM. The paging scheme for video memory is almost analogous to that of system memory with a hierarchical page table structure and each process having its own page table. Each GPU virtual memory address is known as a memory segment. The following diagram from the MSDN documentation, demonstrates the address translation process for a GPU address:
Please note the above diagram is only applicable to GPUs which are utilising the GpuMmu memory model.
Now, there is a very important consideration to take note of and that is when the page table is moved or destroyed, one of the circumstances is when a device has become idle or been removed. From the MSDN documentation it states the following:
If we refer back to our call stack, then we can see that this is possibly the case. However, to find some further evidence to support our assumption, let's check the device state of our graphics card. We can find the device object by examining the second parameter of the nt!PspSystemThreadStartup function.Page tables may be relocated or evicted by the video memory manager when a device is idle or suspended.
Rich (BB code):
10: kd> !stack -p
Call Stack : 10 frames
## Stack-Pointer Return-Address Call-Site
00 fffffe834754a888 fffff8062185529a nt!KeBugCheckEx+0
Parameter[0] = (unknown)
Parameter[1] = (unknown)
Parameter[2] = (unknown)
Parameter[3] = (unknown)
01 fffffe834754a890 fffff806232f1c73 watchdog!WdLogSingleEntry5+39aa (perf)
Parameter[0] = (unknown)
Parameter[1] = (unknown)
Parameter[2] = (unknown)
Parameter[3] = (unknown)
[...]
08 fffffe834754abb0 fffff80611e16c24 nt!PspSystemThreadStartup+55
Parameter[0] = ffffb50702cd7080
Parameter[1] = ffffb50702c38030
Parameter[2] = (unknown)
Parameter[3] = (unknown)
09 fffffe834754ac00 0000000000000000 nt!KiStartSystemThread+34
Parameter[0] = (unknown)
Parameter[1] = (unknown)
Parameter[2] = (unknown)
Parameter[3] = (unknown)
Rich (BB code):
10: kd> !devobj ffffb50702c38030
Device object (ffffb50702c38030) is for:
\Driver\nvlddmkm DriverObject ffffb50702c30e20
Current Irp 00000000 RefCount 0 Type 00000023 Flags 00002004
SecurityDescriptor ffffdc85f56d0f20 DevExt ffffb50702c38180 DevObjExt ffffb50702c397e8
ExtensionFlags (0000000000)
Characteristics (0x00000100) FILE_DEVICE_SECURE_OPEN
AttachedTo (Lower) ffffb506fb2e2a90 \Driver\ACPI
Device queue is not busy.
Let's check the device power state using the !podev command.
Rich (BB code):
10: kd> !podev ffffb50702c38030
Device object is for:
DriverObject 02c30e20
Current Irp 00000000 RefCount 0 Type 00000023 DevFlags 00002004 DO_POWER_PAGABLE
Device queue is not busy.
Device Object Extension: ffffb50702c397e8:
PowerFlags: 00000010 =>SystemState=0 DeviceState=1
Dope: 00000000:
The device state is part of an enumeration type called _DEVICE_POWER_STATE, if we check what the value of 1 corresponds to:
Rich (BB code):
10: kd> dt _DEVICE_POWER_STATE
nt!_DEVICE_POWER_STATE
PowerDeviceUnspecified = 0n0
PowerDeviceD0 = 0n1
PowerDeviceD1 = 0n2
PowerDeviceD2 = 0n3
PowerDeviceD3 = 0n4
PowerDeviceMaximum = 0n5
As we can see the device is in a fully on yet suspended power state, and if we refer back to the MSDN documentation, we know that the page table segments will be either relocated or evicted if the device becoming idle or suspended. Additionally, we can check the device node and check the state history; it appears that the device has been stopped for some reason, which would likely explain the status code shown in the fourth parameter of the bugcheck.
Rich (BB code):
10: kd> !devnode ffffb506fb8e2a60
DevNode 0xffffb506fb8e2a60 for PDO 0xffffb506fb2e20a0
Parent 0xffffb506f9febb60 Sibling 0xffffb506fcff2010 Child 0000000000
InstancePath is "PCI\VEN_10DE&DEV_249D&SUBSYS_11BC1043&REV_A1\4&30920240&0&0009"
ServiceName is "nvlddmkm"
TargetDeviceNotify List - f 0xffffdc85f6368490 b 0xffffdc85f6368490
State = DeviceNodeStopped (0x30a)
Previous State = DeviceNodeAwaitingQueuedRemoval (0x30f)
StateHistory[12] = DeviceNodeAwaitingQueuedRemoval (0x30f)
StateHistory[11] = DeviceNodeAwaitingQueuedDeletion (0x30e)
StateHistory[10] = DeviceNodeStopped (0x30a)
StateHistory[09] = DeviceNodeAwaitingQueuedRemoval (0x30f)
StateHistory[08] = DeviceNodeAwaitingQueuedDeletion (0x30e)
StateHistory[07] = DeviceNodeStopped (0x30a)
StateHistory[06] = DeviceNodeQueryStopped (0x309)
StateHistory[05] = DeviceNodeStarted (0x308)
StateHistory[04] = DeviceNodeStartPostWork (0x307)
StateHistory[03] = DeviceNodeStartCompletion (0x306)
StateHistory[02] = DeviceNodeStartPending (0x305)
StateHistory[01] = DeviceNodeResourcesAssigned (0x304)
StateHistory[00] = DeviceNodeUninitialized (0x301)
StateHistory[19] = Unknown State (0x0)
StateHistory[18] = Unknown State (0x0)
StateHistory[17] = Unknown State (0x0)
StateHistory[16] = Unknown State (0x0)
StateHistory[15] = Unknown State (0x0)
StateHistory[14] = Unknown State (0x0)
StateHistory[13] = Unknown State (0x0)
Flags (0x6c000030) DNF_ENUMERATED, DNF_IDS_QUERIED,
DNF_NO_LOWER_DEVICE_FILTERS, DNF_NO_LOWER_CLASS_FILTERS,
DNF_NO_UPPER_DEVICE_FILTERS, DNF_NO_UPPER_CLASS_FILTERS
CapabilityFlags (0x00040618) EjectSupported, Removable,
SurpriseRemovalOK, WakeFromD0,
ReservedCap1
So, we've established that the graphics card went into a suspended state incorrectly and this caused the video memory manager to throw an exception. As a result of this, the best of course of action would be to update or rollback the graphics card driver.
Rich (BB code):
10: kd> lmvm nvlddmkm
Browse full module list
start end module name
fffff806`24600000 fffff806`26a66000 nvlddmkm (deferred)
Image path: \SystemRoot\System32\DriverStore\FileRepository\nvami.inf_amd64_df6745aaa4048565\nvlddmkm.sys
Image name: nvlddmkm.sys
Browse all global symbols functions data
Timestamp: Tue Sep 14 00:52:22 2021 (613FE436)
CheckSum: 023C784A
ImageSize: 02466000
Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4
Information from resource tables:
References:
Paging Video Memory Resources - Windows drivers
GPU Architecture Overview | Better Tomorrow with Computer Science
GPU segments - Windows drivers
GPU virtual memory in WDDM 2.0 - Windows drivers
GPU virtual address - Windows drivers