Debugging Stop 0x113 - VIDEO_DXGKRNL_FATAL_ERROR

x BlueRobot

Administrator
Staff member
Joined
May 7, 2013
Posts
10,400
Rich (BB code):
VIDEO_DXGKRNL_FATAL_ERROR (113)
The dxgkrnl has detected that a violation has occurred. This resulted
in a condition that dxgkrnl can no longer progress.  By crashing, dxgkrnl
is attempting to get enough information into the minidump such that somebody
can pinpoint the crash cause. Any other values after parameter 1 must be
individually examined according to the subtype.
Arguments:
Arg1: 0000000000000019, The subtype of the bugcheck:
Arg2: 0000000000000001
Arg3: 00000000000010de << Vendor Id
Arg4: 0000000000001e84 << Device Id

Along with Stop 0x116, Stop 0x117 and Stop 0x119, this is another video-related bugcheck and probably one of the rarest among the them. I've only ever seen it on a handful of occassions and usually they don't provide much information. Although, I was really curious about the meaning of the bugcheck parameters and while searching for clues, I managed to find out a possible exact reason why the bugcheck was caused.

Let's begin with the bugcheck parameters since that is usually the place where I always start when debugging a crash. The first two parameters are completely unknown to me and I still haven't been able to find out what they mean. On the other hand, fortunately I have been able to find the meaning behind the other two parameters; vendor Id and device Id respectively. These values are used to uniquely identify a device and can be used to search an online PCI database to find what device they belong to.

The raw stack also contains the following string which corresponds to the values seen in the parameters.

Rich (BB code):
PCI\VEN_10DE&DEV_1E84&SUBSYS_C7261462&REV_A1\4&2f503515&0&0019

Rich (BB code):
15: kd> !load cmkd; !stack -p
Call Stack : 7 frames
## Stack-Pointer    Return-Address   Call-Site      
00 ffff880e514f43d8 fffff8048cc241a0 nt!KeBugCheckEx+0
    Parameter[0] = 0000000000000113
    Parameter[1] = 0000000000000019
    Parameter[2] = 0000000000000001
    Parameter[3] = 00000000000010de
01 ffff880e514f43e0 fffff8048cb41427 watchdog!WdLogEvent5_WdCriticalError+e0
    Parameter[0] = ffffa381e4df2ee0
    Parameter[1] = (unknown)      
    Parameter[2] = (unknown)      
    Parameter[3] = (unknown)      
02 ffff880e514f4420 fffff8048ca1dfa8 dxgkrnl!DpiFdoHandleSurpriseRemoval+167 
    Parameter[0] = ffffa381e5441030 << Device Object
    Parameter[1] = ffffa381ea99fda0 << IRP Address
    Parameter[2] = (unknown)      
    Parameter[3] = (unknown)      
03 ffff880e514f4460 fffff8048c978029 dxgkrnl!DpiFdoDispatchPnp+b8
    Parameter[0] = ffffa381e5441030
    Parameter[1] = ffffa381ea99fda0
    Parameter[2] = (unknown)      
    Parameter[3] = (unknown)      
04 ffff880e514f4510 fffff8048e7565e4 dxgkrnl!DpiDispatchPnp+e9
    Parameter[0] = ffffa381e5441030
    Parameter[1] = ffffa381ea99fda0
    Parameter[2] = (unknown)      
    Parameter[3] = (unknown)      
05 ffff880e514f4630 ffffa381e5666000 nvlddmkm+d65e4 (leaf)
    Parameter[0] = (unknown)      
    Parameter[1] = (unknown)      
    Parameter[2] = (unknown)      
    Parameter[3] = (unknown)

Now, let's dump the call stack and examine what is happening just before the crash. We can see that a third-party driver is present - graphics card - which fits into context of the bugcheck description. However, it doesn't inform us why the crash has occurred which is important when diagnosing an issue. As we can see in the call stack, the stack frames 4 and 3 are related to dispatching a PnP IRP; stack frame 2 indicates that the IRP was a surprise removal IRP which is a form of PnP IRP. Now, if we look closely, we can see that it takes two parameters: a device object and an IRP.

If we dump the device object, we can see that it is related to the graphics card and the driver which was mentioned in the call stack.

Rich (BB code):
15: kd> !devobj ffffa381e5441030
Device object (ffffa381e5441030) is for:
  \Driver\nvlddmkm DriverObject ffffa381e49ceba0
Current Irp 00000000 RefCount 0 Type 00000023 Flags 00002004
SecurityDescriptor ffffc60f81a9ed60 DevExt ffffa381e5441180 DevObjExt ffffa381e5442808
ExtensionFlags (0000000000) 
Characteristics (0x00000100)  FILE_DEVICE_SECURE_OPEN
AttachedTo (Lower) ffffa381df8ee6e0 Name paged out

Let's dump the IRP stack and examine who is currently processing it.

Rich (BB code):
15: kd> !irp ffffa381ea99fda0
Irp is active with 3 stacks 3 is current (= 0xffffa381ea99ff00)
No Mdl: No System Buffer: Thread ffffa381e4b770c0:  Irp stack trace. 
     cmd  flg cl Device   File     Completion-Context
[N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
[N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
>[IRP_MJ_PNP(1b), IRP_MN_SURPRISE_REMOVAL(17)]
            0  0 ffffa381e5441030 00000000 00000000-00000000   
           \Driver\nvlddmkm
            Args: 00000000 00000000 00000000 00000000

Notice the same device object which was passed as a parameter earlier? We can clearly see that the IRP is a suprise removal IRP which was issued earlier. This IRP can be issued for a number of different reasons, including if the device happens to be suddenly removed from the system.

Instead, let's examine the IRP dispatch table for the driver and see what we can find. The IRP dispatch table contains all the dispatch handler routines which are called when a device receives an IRP of a particular type. In our case, we're particularly interested in the PnP IRP.

Rich (BB code):
15: kd> !drvobj ffffa381e49ceba0 7
fffff8047fe47b58: Unable to get value of ObpRootDirectoryObject
fffff8047fe47b58: Unable to get value of ObpRootDirectoryObject
Driver object (ffffa381e49ceba0) is for:
\Driver\nvlddmkm

Driver Extension List: (id , addr)

Couldn't read extension at 0xffffa381e5426570

Device Object list:
ffffa381e8195b20  ffffa381e8194b20: Could not read device object


DriverEntry:   fffff8048fd191b8    nvlddmkm
DriverStartIo: 00000000   
DriverUnload:  fffff8048e758960    nvlddmkm
AddDevice:     fffff8048e755ba8    nvlddmkm

Dispatch routines:
[00] IRP_MJ_CREATE                      fffff8048e755fb8    nvlddmkm+0xd5fb8
[01] IRP_MJ_CREATE_NAMED_PIPE           fffff8048e755fb8    nvlddmkm+0xd5fb8
[02] IRP_MJ_CLOSE                       fffff8048e755fb8    nvlddmkm+0xd5fb8
[03] IRP_MJ_READ                        fffff8048e755fb8    nvlddmkm+0xd5fb8
[04] IRP_MJ_WRITE                       fffff8048e755fb8    nvlddmkm+0xd5fb8
[05] IRP_MJ_QUERY_INFORMATION           fffff8048e755fb8    nvlddmkm+0xd5fb8
[06] IRP_MJ_SET_INFORMATION             fffff8048e755fb8    nvlddmkm+0xd5fb8
[07] IRP_MJ_QUERY_EA                    fffff8048e755fb8    nvlddmkm+0xd5fb8
[08] IRP_MJ_SET_EA                      fffff8048e755fb8    nvlddmkm+0xd5fb8
[09] IRP_MJ_FLUSH_BUFFERS               fffff8048e755fb8    nvlddmkm+0xd5fb8
[0a] IRP_MJ_QUERY_VOLUME_INFORMATION    fffff8048e755fb8    nvlddmkm+0xd5fb8
[0b] IRP_MJ_SET_VOLUME_INFORMATION      fffff8048e755fb8    nvlddmkm+0xd5fb8
[0c] IRP_MJ_DIRECTORY_CONTROL           fffff8048e755fb8    nvlddmkm+0xd5fb8
[0d] IRP_MJ_FILE_SYSTEM_CONTROL         fffff8048e755fb8    nvlddmkm+0xd5fb8
[0e] IRP_MJ_DEVICE_CONTROL              fffff8048e755fb8    nvlddmkm+0xd5fb8
[0f] IRP_MJ_INTERNAL_DEVICE_CONTROL     fffff8048e755fb8    nvlddmkm+0xd5fb8
[10] IRP_MJ_SHUTDOWN                    fffff8048e755fb8    nvlddmkm+0xd5fb8
[11] IRP_MJ_LOCK_CONTROL                fffff8048e755fb8    nvlddmkm+0xd5fb8
[12] IRP_MJ_CLEANUP                     00000000   
[13] IRP_MJ_CREATE_MAILSLOT             00000000   
[14] IRP_MJ_QUERY_SECURITY              00000000   
[15] IRP_MJ_SET_SECURITY                00000000   
[16] IRP_MJ_POWER                       00000000   
[17] IRP_MJ_SYSTEM_CONTROL              00000000   
[18] IRP_MJ_DEVICE_CHANGE               00000000   
[19] IRP_MJ_QUERY_QUOTA                 00000000   
[1a] IRP_MJ_SET_QUOTA                   00000000   
[1b] IRP_MJ_PNP                         00000000  << No dispatch routine to handle the IRP


Device Object stacks:

!devstack ffffa381e8195b20 :
  !DevObj           !DrvObj            !DevExt           ObjectName
> ffffa381e8195b20 ffffa381e8195b20: Could not read device object or _DEVICE_OBJECT not found
ffffa381e8195c70  InfoMask field not found for _OBJECT_HEADER at ffffa381e8195af0


Could not read DeviceObjectExtension from DeviceObject 0xffffa381e8195b20

ffffa381e8194b20: Could not read device object
Error processing device objects.  Processed 1 device objects before error.

Notice that there is no handler routine for the PnP IRP? The driver doesn't know to handle this I/O request which in turn leads to our bugcheck. You may be thinking that this rather odd, why doesn't the driver have a dispatch routine for every form of IRP? However, it is perfectly normal and reasonable for drivers to be unable to process every form of IRP, especially in the context of this driver. The surprise removal IRP is designed for PnP devices which may unexpectedly be removed or inserted at any time by the user. If you take for example, a USB keyboard or mouse, I may remove my keyboard at the moment's notice, whereas, it is very unlikely that I would remove my graphics card while my system is still running.

With what we've learned from the dump file and what they user has mentioned:

Also I noticed that alot of times it crashed it was after I bumped my desk or something. Then my screen would go black, fans start running at 100% and nothing happens until I manually reboot.

I believe that it could be in fact a loose connection which is causing the crash to occur. A bump could quite easily cause the graphics card to become momentarily disconnected which would cause the PnP manager to issue a surprise removal request since its noticed that the graphics card is no longer present in the device tree. This is just an assumption at the moment and the thread is still active.

If you're interested in the following the development of this bugcheck then the thread is available here - Frequent crashes

References:

Handling an IRP_MN_SURPRISE_REMOVAL Request - Windows drivers
Writing IRP Dispatch Routines - Windows drivers
 
Just guessing...

19 could be "bus 0, device 3, function 1".

Devmanview shows (for my pc):
  • PCI standard host CPU bridge
    PCI\VEN_1022&DEV_1422&SUBSYS_14221849&REV_00\3&11583659&0&00
    @System32\drivers\pci.sys,#65536;PCI bus %1, device %2, function %3;(0,0,0)
    When I check its location (device manager, properties, details, location information), I get: bus 0 device 0 function 0
  • AMD Radeon(TM) R7 Graphics
    PCI\VEN_1002&DEV_130F&SUBSYS_130F1849&REV_00\3&11583659&0&08
    @System32\drivers\pci.sys,#65536;PCI bus %1, device %2, function %3;(0,1,0)
    Location:bus 0 device 1 function 0
  • PCI standard host CPU bridge
    PCI\VEN_1022&DEV_1424&SUBSYS_00000000&REV_00\3&11583659&0&10
    @System32\drivers\pci.sys,#65536;PCI bus %1, device %2, function %3;(0,2,0)
    Location:bus 0 device 2 function 0
  • PCI standard host CPU bridge
    PCI\VEN_1022&DEV_1424&SUBSYS_00000000&REV_00\3&11583659&0&18
    @System32\drivers\pci.sys,#65536;PCI bus %1, device %2, function %3;(0,3,0)
    Location:bus 0 device 3 function 0
  • PCI Express Root Port
    PCI\VEN_1022&DEV_1426&SUBSYS_12341022&REV_00\3&11583659&0&1A
    @System32\drivers\pci.sys,#65536;PCI bus %1, device %2, function %3;(0,3,2)
    Location:bus 0 device 3 function 2
  • PCI standard host CPU bridge
    PCI\VEN_1022&DEV_1424&SUBSYS_00000000&REV_00\3&11583659&0&20
    @System32\drivers\pci.sys,#65536;PCI bus %1, device %2, function %3;(0,4,0)
    Location:bus 0 device 4 function 0

I.e.:
bus 0 device 0 function 0 = (0,0,0) = 00
bus 0 device 0 function 1 = (0,0,1) = 01
bus 0 device 0 function 2 = (0,0,2) = 02
bus 0 device 0 function 3 = (0,0,3) = 03
bus 0 device 0 function 4 = (0,0,4) = 04
bus 0 device 0 function 5 = (0,0,5) = 05
bus 0 device 0 function 6 = (0,0,6) = 06
bus 0 device 0 function 7 = (0,0,7) = 07
bus 0 device 1 function 0 = (0,1,0) = 08
bus 0 device 0 function 2 = (0,0,1) = 09
bus 0 device 0 function 3 = (0,0,2) = 0a
bus 0 device 0 function 4 = (0,0,3) = 0b
bus 0 device 0 function 5 = (0,0,4) = 0c
bus 0 device 0 function 6 = (0,0,5) = 0d
bus 0 device 0 function 7 = (0,0,6) = 0e
bus 0 device 2 function 0 = (0,0,7) = 0f
bus 0 device 2 function 1 = (0,2,0) = 10
And so forth...
 
Hmm possibly, we need to compare it with another bugcheck and see if it has similar parameters.
 

Has Sysnative Forums helped you? Please consider donating to help us support the site!

Back
Top