Server 2012 R2 BSOD

demain1

New member
Joined
Jan 14, 2015
Posts
3
Hi, I have a couple of 2012 R2 Remote Desktop Session Host servers which are experiencing BSODs. At this stage it's happened 3 times in just over a week. We have 7 x RDS servers and so far we have experienced a similar BSOD event on 2 of the 7 servers. I say similar because the .sys file that I can see the crash dumps say are responsible seem to all be direct X related. A key point is that these are all virtual servers running on top of a 2012 R2 hyper-v cluster.

I have a basic knowledge of how to use windbg, and from looking at the crash dumps, I can see that they are reporting dxgmms1.sys and DXGKRNL.sys.

The Sysnative upload includes the mini crash dumps from 1 of the servers but I have a full memory dump file also - approx 2GB in size. (Driver Verifier was running at the time of these crashes)

I couldn't save the perfmon report as HTML, threw an error - so I've attached the whole directory with XMLs etc. This can exported to your local perfmon dir and you can see the report from with perfmon.

Any assistance in tracking down a likely cause is much appreciated!

View attachment RDS03_20150115-000001.zipView attachment 10524

 
1st server:

SYSTEM_SERVICE_EXCEPTION (3b)


This indicates that an exception happened while executing a routine that transitions from non-privileged code to privileged code.


Code:
7: kd> k
Child-SP          RetAddr           Call Site
ffffd000`21996618 fffff800`7f3d97e9 nt!KeBugCheckEx
ffffd000`21996620 fffff800`7f3d90fc nt!KiBugCheckDispatch+0x69
ffffd000`21996760 fffff800`7f3d51ed nt!KiSystemServiceHandler+0x7c
ffffd000`219967a0 fffff800`7f3623a5 nt!RtlpExecuteHandlerForException+0xd
ffffd000`219967d0 fffff800`7f36125f nt!RtlDispatchException+0x1a5
ffffd000`21996ea0 fffff800`7f3d98c2 nt!KiDispatchException+0x61f
ffffd000`21997590 fffff800`7f3d8014 nt!KiExceptionDispatch+0xc2
ffffd000`21997770 fffff801`6806cb7f nt!KiPageFault+0x214 < -- Exception.
ffffd000`21997900 fffff801`680c755f dxgkrnl!DXGAUTOMUTEX::DXGAUTOMUTEX+0x27 // More DirectX stuff, and we go off the rails here.
ffffd000`21997930 fffff801`680c7507 dxgkrnl!OUTPUTDUPL_MGR::QueryActiveContextCount+0x3f // More DirectX stuff.
ffffd000`21997980 fffff801`681608ec dxgkrnl!OutputDuplQueryActiveContextCount+0x9f // More DirectX stuff.
ffffd000`219979b0 fffff800`7f3d94b3 dxgkrnl!DxgkQueryAdapterInfo+0x994 // DirectX kernel query adapter info.
ffffd000`21997b00 00007ffd`879d175a nt!KiSystemServiceCopyEnd+0x13 // Transition to kernel-mode.
0000002b`dd14f218 00000000`00000000 0x00007ffd`879d175a // Some user mode stuff.

Code:
7: kd> .cxr ffffd00021996ed0
rax=0000000000000000 rbx=ffffd00021997950 rcx=ffffe000adef2880
rdx=0000000001047315 rsi=ffffc000893d7880 rdi=0000000000000000
rip=fffff8016806cb7f rsp=ffffd00021997900 rbp=ffffd00021997b80
 r8=ffffe000a2ce0520  r9=0000000000000000 r10=0000000000000801
r11=ffffc00081acb180 r12=0000002bdd14f270 r13=ffffe000a38d7090
r14=0000000000000010 r15=ffffe000a38d7090
iopl=0         nv up ei ng nz na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00010286
dxgkrnl!DXGAUTOMUTEX::DXGAUTOMUTEX+0x27:
fffff801`6806cb7f 483908          cmp     qword ptr [rax],rcx ds:002b:00000000`00000000=????????????????

Comparing two operands, between the rcx register and the rax register.

Code:
7: kd> !pte ffffe000adef2880
                                           VA ffffe000adef2880
PXE at FFFFF6FB7DBEDE00    PPE at FFFFF6FB7DBC0010    PDE at FFFFF6FB78002B78    PTE at FFFFF6F00056F790
Unable to get PXE FFFFF6FB7DBEDE00

Probably invalid but cannot tell due to small memory dump. Anyway that's what caused the exception which is what in turn threw the bug check. It's a verifier enabled crash dump:

Code:
  STANDARD FLAGS:
    [X] (0x00000000) Automatic Checks
    [X] (0x00000001) Special pool
    [ ] (0x00000002) Force IRQL checking
    [ ] (0x00000008) Pool tracking
    [ ] (0x00000010) I/O verification
    [ ] (0x00000020) Deadlock detection
    [ ] (0x00000080) DMA checking
    [ ] (0x00000100) Security checks
    [ ] (0x00000800) Miscellaneous checks
    [ ] (0x00020000) DDI compliance checking

However why so little flags? You should enable more, such as IRQL checking, pool tracking, I/O, deadlock, DMA checking, security checks, misc. Mostly everything minus DDI as that's not really necessary here as I don't think we have a driver making calls at invalid IRQLs, etc. Technically none of them in this situation are minus maybe IRQL, DMA, and misc... but you may as well.



2nd server:

PFN_LIST_CORRUPT (4e)

This indicates that the page frame number (PFN) list is corrupted.

Code:
6: kd> k
Child-SP          RetAddr           Call Site
ffffd000`21899848 fffff803`dd401ae3 nt!KeBugCheckEx
ffffd000`21899850 fffff803`dd2bea22 nt!MiBadRefCount+0x4f
ffffd000`21899890 fffff800`00861080 nt!MmUnlockPages+0x7e2
ffffd000`21899930 fffff800`0086102c dxgmms1!VIDMM_SEGMENT::SafeUnlockPages+0x20
ffffd000`21899960 fffff800`0086e83b dxgmms1!VIDMM_SEGMENT::UnlockAllocationBackingStore+0x64
ffffd000`21899990 fffff800`00865890 dxgmms1!VIDMM_GLOBAL::ProcessDeferredCommand+0x4c3
ffffd000`21899af0 fffff800`008657f4 dxgmms1!VIDMM_GLOBAL::ProcessTerminationCommand+0x44
ffffd000`21899b40 fffff800`00877cc8 dxgmms1!VidSchiSubmitDeviceCommand+0x34
ffffd000`21899b70 fffff800`00877a6d dxgmms1!VidSchiRun_PriorityTable+0x258
ffffd000`21899bc0 fffff803`dd2a02e4 dxgmms1!VidSchiWorkerThread+0x8d
ffffd000`21899c00 fffff803`dd3672c6 nt!PspSystemThreadStartup+0x58
ffffd000`21899c60 00000000`00000000 nt!KiStartSystemThread+0x16

More DirectX stuff, MMS this time as opposed to kernel. We go off the rails safe unlocking pages.

Overall with both of these servers it looks like faulty hardware (RAM if anything, but video is likely as well), but it's impossible to say with a small memory dump alone and so little verifier flags. Have you guys run a Memtest?
 
Hi, thanks for the assistance so far! Here's links to the Kernel dumps.... hoping they will offer some more info on whats happening!

https://onedrive.live.com/redir?resid=44A597E146760C36!1098&authkey=!AIwsoBVUeqBi9nA&ithint=file,rar
https://onedrive.live.com/redir?resid=44A597E146760C36!1099

My logic (which could be flawed) is that the servers are hyper-V guests, running on separate hyper-V hosts. So for this to be hardware related - both hyper-V hosts must have bad hardware. I guess it's possible, but I think unlikely. Also, we've only had crashes on the RDS servers on these hosts - not other guests running on the same hosts.

Graphics Driver is the generic windows hyper-V one.

If you still want me to run memtest I can - guess that will need to be on the host rather than the guest...

Thanks again.
 
I can almost guarantee this is bad hardware, so the Memtest will help a lot as RAM is the first likely culprit (host machine, of course).

Anyway, the 0x3B earlier that I wanted to check the contents of the rcx register, it's actually valid:

Code:
7: kd> !pte ffffe000adef2880
                                           VA ffffe000adef2880
PXE at FFFFF6FB7DBEDE00    PPE at FFFFF6FB7DBC0010    PDE at FFFFF6FB78002B78    PTE at FFFFF6F00056F790
contains 0000000001C3D863  contains 0000000001C3C863  contains 00000005765F1863  contains 800000056A163B63
pfn 1c3d      ---DA--KWEV  pfn 1c3c      ---DA--KWEV  pfn 5765f1    ---DA--KWEV  pfn 56a163    CG-DA--KW-V

With this said, the instruction failed for a reason I cannot say. The rax register is null b/c it's a volatile register, therefore the original contents aren't stored.
 

Has Sysnative Forums helped you? Please consider donating to help us support the site!

Back
Top