1. #1

    Join Date
    Mar 2012
    Posts
    469

    0x119 VIDEO_SCHEDULER_INTERNAL_ERROR & Fence IDs

    Hi all,

    I'd like to take you all on a little adventure with me.

    I'm working with a missionary that I'm good friends with that has been experiencing some bsods. At the moment he's only given me a few. While a couple were inconclusive (despite all being DV-enabled), one sorta stuck out, which is attached and mentioned below:



    Code:
    VIDEO_SCHEDULER_INTERNAL_ERROR (119)
    The video scheduler has detected that fatal violation has occurred. This resulted
    in a condition that video scheduler can no longer progress. Any other values after
    parameter 1 must be individually examined according to the subtype.
    Arguments:
    Arg1: 0000000000000001, The driver has reported an invalid fence ID.
    Arg2: 0000000000008be2
    Arg3: 0000000000008c5c
    Arg4: 0000000000008c5b
    
    Debugging Details:
    ------------------
    
    
    CUSTOMER_CRASH_COUNT:  1
    
    DEFAULT_BUCKET_ID:  VERIFIER_ENABLED_VISTA_MINIDUMP
    
    BUGCHECK_STR:  0x119
    
    PROCESS_NAME:  svchost.exe
    
    CURRENT_IRQL:  a
    
    LAST_CONTROL_TRANSFER:  from fffff880044d822f to fffff80002e93d00
    
    STACK_TEXT:  
    fffff880`0a714de8 fffff880`044d822f : 00000000`00000119 00000000`00000001 00000000`00008be2 00000000`00008c5c : nt!KeBugCheckEx
    fffff880`0a714df0 fffff880`04137eb9 : 00000000`00000000 00000000`00008be2 00000000`00000000 00000000`00008c5c : watchdog!WdLogEvent5+0x11b
    fffff880`0a714e40 fffff880`04138125 : fffffa80`09b4f000 fffff880`0a714f70 00000000`000011ac fffff8a0`12a87c10 : dxgmms1!VidSchiVerifyDriverReportedFenceId+0xad
    fffff880`0a714e70 fffff880`04137f76 : 00000000`00008be2 fffff880`0a715001 fffffa80`09b43000 00000000`00000001 : dxgmms1!VidSchDdiNotifyInterruptWorker+0x19d
    fffff880`0a714ec0 fffff880`0403f13f : fffffa80`087a5040 fffff800`02e968a4 fffff800`00000002 fffff800`00000000 : dxgmms1!VidSchDdiNotifyInterrupt+0x9e
    fffff880`0a714ef0 fffff880`00c1ecca : 00000000`00000000 fffffa80`087a3040 00000000`00000000 fffff800`02e966ef : dxgkrnl!DxgNotifyInterruptCB+0x83
    fffff880`0a714f20 00000000`00000000 : fffffa80`087a3040 00000000`00000000 fffff800`02e966ef fffff880`03164180 : atikmpag+0x4cca
    
    
    STACK_COMMAND:  kb
    
    FOLLOWUP_IP: 
    dxgmms1!VidSchiVerifyDriverReportedFenceId+ad
    fffff880`04137eb9 c744244053eeffff mov     dword ptr [rsp+40h],0FFFFEE53h
    
    SYMBOL_STACK_INDEX:  2
    
    SYMBOL_NAME:  dxgmms1!VidSchiVerifyDriverReportedFenceId+ad
    
    FOLLOWUP_NAME:  MachineOwner
    
    MODULE_NAME: dxgmms1
    
    IMAGE_NAME:  dxgmms1.sys
    
    DEBUG_FLR_IMAGE_TIMESTAMP:  4ce799c1
    
    FAILURE_BUCKET_ID:  X64_0x119_VRF_dxgmms1!VidSchiVerifyDriverReportedFenceId+ad
    
    BUCKET_ID:  X64_0x119_VRF_dxgmms1!VidSchiVerifyDriverReportedFenceId+ad
    
    Followup: MachineOwner


    Now just so you know, I had initially hardly just as much understanding on this as you probably do while reading this. I have absolutely no clue what Fence IDs are. However, I did some lookin up and noticed the following concerning em: Windows Vista and Later Display Driver Model Operation Flow.

    So I go through the motions of it and I get a bit of an idea what a Fence ID is. It's apparently a "ticket" for the GPU to have access to process a DMA buffer. For those unaware, DMA means Direct Memory Access, which means a connection for - in this case - the GPU to be able to mess with system memory directly without havin to hassle the cpu or OS. This is the apparent process. Do you see anything familiar in relation to the call stack listed above in the crashdump?



    14.

    The DirectX graphics kernel subsystem calls the display miniport driver's DxgkDdiSubmitCommand function to queue the DMA buffer to the GPU execution unit. Each DMA buffer submitted to the GPU contains a fence identifier, which is a number. After the GPU finishes processing the DMA buffer, the GPU generates an interrupt.

    15.

    The display miniport driver is notified of the interrupt in its DxgkDdiInterruptRoutine function. The display miniport driver should read, from the GPU, the fence identifier of the DMA buffer that just completed.

    16.

    The display miniport driver should call the DxgkCbNotifyInterrupt function to notify the DirectX graphics kernel subsystem that the DMA buffer completed. The display miniport driver should also call the DxgkCbQueueDpc function to queue a deferred procedure call (DPC).

    So where in the process of this did the crash occur? As you can tell, it's during the "NotifyInterrupt" function at the very end, on step 16 - all notifying that a DMA buffer completed. Part of this notification is a pointer pointing to a data structure (DXGKARGCB_NOTIFY_INTERRUPT_DATA), and part of the data in that structure is the fence ID.

    Apparently what we have here, is that after the GPU finished processing the DMA buffer, it notified the graphics driver that it finished doing what it wanted to do and gave it the id number for the DMA buffer (the Fence ID). The graphics driver gives this as part of the notification to DirectX that it got done, DirectX took a look at the Fence ID, and bugs out, thinking, "This fence ID doesn't look familiar at all. Something ain't right!" So it tells Windows to stop everything cuz it *appears* as if the gpu got illegal access to memory.

    Part of me thinks this isn't so much a graphics driver issue as it is a graphics hardware issue. That's my initial diagnosis, and right now I'm still working with him to gather more info on this to verify what's what. As for my end, right now I'd like to know a few things in case anyone can help me:


    1. If anyone else has had similar bsods that they've resolved and the culprits behind em. Was it typically hardware, and what hardware was it? Was it the drivers?
    2. I'd like to know what the fenceID was. However, I'm unfamiliar with the dt command in Windbg and I'm not sure where to point it too and how. To those wondering, this command points to a data structure and reveals its contents and info on it. Since this is part of the notification process to DirectX about the DMA buffer completion, I should be able to see the FenceID inside the notification data structure.
    3. I'd like to know what the FenceID was prior to the DMA buffer completion. If I knew this as well as what it was after the completion (when it bugged out), I can discern if the DMA buffer access itself was bad, or if the returned FenceID from the GPU ended up gettin corrupted somehow. Not sure how or if it's even possible to get this info, though.





    This obviously isn't the end of my journey on this. I'll be continuing as I progress with finding an answer on this and extra more info from the guy about the situation.



    UPDATE:

    Comments from previous discussion on this:


    Quote Originally Posted by cluberti
    It looks like the crash is in the directx routine that reports the out of order fence returns. There are quite a number of bugs logged on this for Windows 7, and they run the gamut of ATI, Nvidia, and Intel video drivers as root causes. What is actually happening under the covers is that these FenceIDs are being returned out-of-order, and thus the bugcheck (why dx says "that's not right", because there's a proper way to return these). Again, in every case I can find, it was a driver (not hardware) issue, and the external vendor would be tasked with resolving the issues with their driver on customer hardware.

    Unfortunately, the problem happens in the external driver before it hits directx, so I can't tell you why it's happening, but the likelihood it's a hardware issue is probably almost nil if it isn't also bugchecking with a 116. It could be power-related, though, so if the machine is older checking the PSU isn't a bad idea.

    Sorry I can't provide the debug, but the directx drivers aren't public on purpose, and I don't feel comfortable putting any of that out here even amongst this small group given the protection around this source.
    Quote Originally Posted by VirGnarus
    I can see how this could be problematic, as if I recall DMA process jobs being out of order can potentially cause memory corruption. I was figuring it was maybe just a bad fence ID altogether.

    Though I wonder, what are the parameters the watchdog sent to KeBugCheckEx? Obviously the first one is the subtype of error, but are the other three the Fence IDs (like expected/received)?
    Quote Originally Posted by cluberti
    Yes, they are the received fences. I would still recommend testing a different card to be safe, but the driver is still the likely culprit.
    Attached Files Attached Files
    Last edited by Vir Gnarus; 03-16-2012 at 02:49 PM.
    axe0, Patrick and Shintaro say thanks for this.


    • Ad Bot

      advertising
      Beep.

        
       

  2. #2

    Re: 0x119 VIDEO_SCHEDULER_INTERNAL_ERROR & Fence IDs

    Currently in the middle of an analysis with a 119: http://www.techsupportforum.com/foru...ml#post3824527

    Very fun to analyze, and as always, I learned an unreasonable amount from this thread alone

    No response yet, hopefully the OP does respond... I'd really like to see how it plays out.
    writhziden says thanks for this.

  3. #3

    Re: 0x119 VIDEO_SCHEDULER_INTERNAL_ERROR & Fence IDs

    And bump, going through another..

    Rather than it being - DxgNotifyInterrupt

    it's

    DxgNotifyDpc

    in the stack..

    what does that mean?

  4. #4
    writhziden's Avatar
    Join Date
    May 2012
    Location
    Colorado
    Posts
    2,327
    • specs System Specs
      • Manufacturer:
        Sony
      • Model Number:
        VPCF232FX/B
      • Motherboard:
        Sony Corporation VAIO
      • CPU:
      • Memory:
        8.00 GB Crucial CT2KIT51264BF1339 DDR3 1333
      • Graphics:
      • Sound Card:
        Realtek High Definition Audio/nVidia High Definition Audio
      • Hard Drives:
        TOSHIBA MK5061GSY 500 GB (465 GB actual)
      • Case:
        Laptop black matte case with backlit keyboard
      • Cooling:
        Air cooling via fan and heat exchanger heatsink
      • Display:
        Laptop display
      • Operating System:
        Windows 7 Home Premium 64 Bit

    Re: 0x119 VIDEO_SCHEDULER_INTERNAL_ERROR & Fence IDs

    Hmm, I find no reference to DxgNotifyDpc through a Google search. I probably need to buy a book on debugging for that one.

    I am posting to let you both know I am also interested in this one.

  5. #5

    Re: 0x119 VIDEO_SCHEDULER_INTERNAL_ERROR & Fence IDs

    Yeah, no luck for me either..

    Thanks Mike, let me know if you find anything yourself.

  6. #6

    Join Date
    Mar 2012
    Posts
    469

    Re: 0x119 VIDEO_SCHEDULER_INTERNAL_ERROR & Fence IDs

    Interrupts (or ISRs; Interrupt Service Routines) to handle device I/O need to be done very quickly or risk holding up the entire system (because of high IRQL), so what usually happens is the interrupt is designed to merely create a DPC, or Deferred Procedure Call, to defer (hence the name) the responsibility of handling the I/O till later. The DPC itself, once it is next in the DPC queue, will then do the actual servicing of the device's I/O. The interrupt is only there to notify the system to prepare for I/O, while it is the DPC itself that does all the work. Windows Internals 5th Edition explains all this in the I/O System chapter. If you have the 6th edition, you'll have to wait until Part 2 of it comes out.

    So what's going on is that the interrupt has already done its work, and it is now the DPC (the actual I/O) that's doing the work, which DirectX is involved (obviously some form of video/audio I/O). You can check the DPC queue for each processor using !dpcs in Windbg. Obviously, this information, like most, is not available in a minidump, but if you give it the number of the processor that was currently running at the time of the crash (you can tell by the Windbg prompt which proc you're in) you may be lucky, but I doubt it.

    Example:

    Code:
    2: kd> !dpcs
    CPU Type      KDPC       Function
     5: Normal  : 0xfffffa8025ae6c28 0xfffffa600106e8f0 tcpip!TcpPeriodicTimeoutHandler
    
    2: kd> dt !_KDPC                                            < Template for KDPC data structure
    nt!_KDPC
       +0x000 Type             : UChar
       +0x001 Importance       : UChar
       +0x002 Number           : Uint2B
       +0x008 DpcListEntry     : _LIST_ENTRY
       +0x018 DeferredRoutine  : Ptr64     void 
       +0x020 DeferredContext  : Ptr64 Void
       +0x028 SystemArgument1  : Ptr64 Void
       +0x030 SystemArgument2  : Ptr64 Void
       +0x038 DpcData          : Ptr64 Void
    
    2: kd> dt !_KDPC fffffa8025ae6c28
    nt!_KDPC
       +0x000 Type             : 0x13 ''
       +0x001 Importance       : 0x1 ''
       +0x002 Number           : 0x45
       +0x008 DpcListEntry     : _LIST_ENTRY [ 0xfffffa60`01ab8580 - 0xfffffa60`01ab8580 ]
       +0x018 DeferredRoutine  : 0xfffffa60`0106e8f0     void  tcpip!TcpPeriodicTimeoutHandler+0
       +0x020 DeferredContext  : 0x00000000`00000005 Void
       +0x028 SystemArgument1  : 0x00000000`cba8328a Void
       +0x030 SystemArgument2  : 0x00000000`01ccae02 Void
       +0x038 DpcData          : 0xfffffa60`01ab8580 Void
    Understand that the KDPC data structure is opaque, in that it is an internal structure where information on it is publicly finite, and so you kinda have to walk it out, fiddle with it, and figure it out on your own. Also, it's not something that a driver is allowed to manipulate, only Windows kernel can. So if you discover that a driver has tampered with this or even is attempting to write to it, you know the driver is being unscrupulous (a driver can point to it, though, just not edit). That's not to say it's the case you're dealing with, however.
    Last edited by Vir Gnarus; 08-01-2012 at 09:33 AM.
    Shintaro, satrow, niemiro and 5 others say thanks for this.

  7. #7

    Re: 0x119 VIDEO_SCHEDULER_INTERNAL_ERROR & Fence IDs

    Brilliant explanation, thank you.

    Gah, I don't think the OP will respond though, mentioned he does not have time for hardware diagnostics / troubleshooting so it's likely I won't be able to take in much knowledge from this specific analysis. Also, I tried running a !dpcs command and got the following -

    3: kd> !dpcs
    CPU Type KDPC Function
    Failed to read DPC at 0xfffffa800615b0c8
    Failed to read DPC at 0xfffff88002fd5318
    I'm assuming that is because as you said, it's a minidump, and you cannot access this info with a minidump?

    You also mentioned you can find the current processor at the time of the crash for that specific dump from WinDbg, where do you find that information?

  8. #8

    Join Date
    Mar 2012
    Posts
    469

    Re: 0x119 VIDEO_SCHEDULER_INTERNAL_ERROR & Fence IDs

    When you load up a crashdump, the processor and thread context that's initially loaded is the one that was most recent during time of the crashdump, as in the one that was active at that time. You can tell the thread by doing !thread, but the processor is much easier, by just looking at the Windbg prompt:

    Code:
    Processor 
    \/
    2: kd> !thread
    THREAD fffffa80273ed040  Cid 0004.0a64  Teb: 0000000000000000 Win32Thread: 0000000000000000 RUNNING on processor 2
    Not impersonating
    DeviceMap                 fffff88000006150
    Owning Process            fffffa8024c1a040       Image:         System
    Attached Process          N/A            Image:         N/A
    Wait Start TickCount      5438           Ticks: 405495019 (73:05:09:22.838)
    Context Switch Count      2              IdealProcessor: 2             
    UserTime                  00:00:00.000
    KernelTime                00:00:00.000
    Win32 Start Address rpcxdr!RxWorkThread (0xfffffa6008394f88)
    Stack Init fffffa60089b4db0 Current fffffa60089b4a10
    Base fffffa60089b5000 Limit fffffa60089af000 Call 0
    Priority 12 BasePriority 12 PriorityDecrement 0 IoPriority 2 PagePriority 5
    Child-SP          RetAddr           : Args to Child                                                           : Call Site
    fffffa60`089b4a58 fffff800`01e63661 : 00000000`00000050 fffffa60`08c4d718 00000000`00000008 fffffa60`089b4b50 : nt!KeBugCheckEx
    fffffa60`089b4a60 fffff800`01e53219 : 00000000`00000008 fffffa60`019d8180 fffffa80`273ed000 fffffa80`2b379010 : nt!MmAccessFault+0x1371
    fffffa60`089b4b50 fffffa60`08c4d718 : fffffa60`0839537b fffffa80`273ed040 00000000`00000080 fffffa60`0839e350 : nt!KiPageFault+0x119 (TrapFrame @ fffffa60`089b4b50)
    fffffa60`089b4ce8 fffffa60`0839537b : fffffa80`273ed040 00000000`00000080 fffffa60`0839e350 fffffa80`2b379010 : <Unloaded_TmPreFlt.sys>+0x4718
    fffffa60`089b4cf0 fffff800`020788b3 : 00000000`00000001 00000000`0000000f 00000000`00000000 fffffa80`2b379010 : rpcxdr!RxWorkThread+0x3f3
    fffffa60`089b4d50 fffff800`01e8e7f6 : fffffa60`01966180 fffffa80`273ed040 fffffa60`0196fd40 00000000`00000001 : nt!PspSystemThreadStartup+0x57
    fffffa60`089b4d80 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16
    Also, as shown below in the windbg prompt:

    0x119 VIDEO_SCHEDULER_INTERNAL_ERROR &amp; Fence IDs-proccontext-jpg
    Last edited by Vir Gnarus; 08-01-2012 at 10:45 AM.
    blueelvis, Patrick and jcgriff2 say thanks for this.

  9. #9

    Join Date
    Mar 2012
    Posts
    469

    Re: 0x119 VIDEO_SCHEDULER_INTERNAL_ERROR & Fence IDs

    Oh, of course, if you change the processor context using ~, then the prompt will adjust accordingly, but this is the one that showed up for me when I opened this particular kernel dump. Instead of defaulting to processor 0, it automatically was set to proc 2, which was running at time of crash.

  10. #10

    Re: 0x119 VIDEO_SCHEDULER_INTERNAL_ERROR & Fence IDs

    Got it, thanks :)

  11. #11
    Wrench97's Avatar
    Join Date
    Feb 2012
    Location
    S.E. Pennsylvania
    Posts
    2,561

    Re: 0x119 VIDEO_SCHEDULER_INTERNAL_ERROR & Fence IDs

    Here's another one > http://www.sysnative.com/forums/show...0737#post30737

    Code:
    Debug session time: Sun Oct 14 09:21:02.426 2012 (UTC - 4:00)
    Loading Dump File [C:\Users\Owner\Bsodapps\101412-13431-01.dmp]
    BugCheck 119, {1, 1000060, e3963, e3961}
    Probably caused by : dxgmms1.sys ( dxgmms1!VidSchiVerifyDriverReportedFenceId+ad )
    Bugcheck code 00000119
    Arguments: 
    
    Arg1: 0000000000000001, The driver has reported an invalid fence ID.
    
    Arg2: 0000000001000060
    
    Arg3: 00000000000e3963
    
    Arg4: 00000000000e3961
    
    
    DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT
    BUGCHECK_STR:  0x119
    PROCESS_NAME:  Wow-64.exe
    MaxSpeed:     3100
    CurrentSpeed: 3100
    BiosVersion = 0506
    BiosReleaseDate = 05/07/2012
    jcgriff2 says thanks for this.

  12. #12

    Join Date
    Mar 2012
    Posts
    469

    Re: 0x119 VIDEO_SCHEDULER_INTERNAL_ERROR & Fence IDs

    Yah, I saw that, good catch. Look at Arg 2-4, which are actually the fence ids. Either Arg3 is the fence id it expected, or the fence id directly prior to the one we're dealing with. Either way, it's evident we're looking at one messed up fence id in Arg 2. Clearly it's an overwritten value, perhaps from stack overflow or some other driver nonsense.
    jcgriff2 says thanks for this.

  13. #13
    x BlueRobot's Avatar
    Join Date
    May 2013
    Location
    Minkowski Space
    Posts
    1,749

    Re: 0x119 VIDEO_SCHEDULER_INTERNAL_ERROR & Fence IDs

    Sorry to resurrect a old thread, but here's some additional information - TDR changes in Windows 8 (Windows Drivers) and Supplying Fence Identifiers (Windows Drivers)
    Patrick says thanks for this.
    Machines Can Think

    Oxygen, Nature's paradox.

  14. #14
    x BlueRobot's Avatar
    Join Date
    May 2013
    Location
    Minkowski Space
    Posts
    1,749

    Re: 0x119 VIDEO_SCHEDULER_INTERNAL_ERROR & Fence IDs

    A fence is an instruction that contains 64 bits of data and an address. The display miniport driver can insert a fence in the direct memory access (DMA) stream that is sent to the graphics processing unit (GPU). When the GPU reads the fence, the GPU writes the fence data at the specified fence address. However, before the GPU can write the fence data to memory, it must ensure that all of the pixels from the primitives that precede the fence instruction are retired and properly written to memory.
    Note The GPU is not required to stall the entire pipeline while it waits for the last pixel from the primitives that precede the fence instruction to retire; the GPU can instead run the primitives that follow the fence instruction.
    Hardware that supports per-GPU-context virtual address space must support the following types of fences:

    • Regular fences are fences that can be inserted in a DMA buffer that is created in user mode. Because the content of a DMA buffer from user mode is not trusted, fences within such a DMA buffer must refer to a virtual address in the GPU context address space and not to a physical address. Access to such a virtual address is bound by the same memory validation mechanism as any other virtual address that the GPU accesses.
    • Privileged fences are fences that can be inserted only in a DMA buffer that is created (and only accessible) in kernel mode. Fences within such a DMA buffer refer to a physical address in memory.
      Note that if the fence target address was accessible in user mode, malicious software could perform a graphics operation over the memory location for the fence and therefore override the content of what the kernel expected to receive.
    Source: DxgkDdiQueryCurrentFence routine (Windows Drivers)

    I'll keep doing some research on Fence IDs, and formulate it into a blog post if anyone is still interested or still learning about Fence IDs.
    Machines Can Think

    Oxygen, Nature's paradox.

  15. #15
    x BlueRobot's Avatar
    Join Date
    May 2013
    Location
    Minkowski Space
    Posts
    1,749

    Re: 0x119 VIDEO_SCHEDULER_INTERNAL_ERROR & Fence IDs

    Update:

    If the reason for a Stop 0x119 is parameter 2, the driver failing upon the submission of a command buffer, then the other parameters are as follows:

    2) The NTSTATUS error code returned from the failed driver call
    3) A pointer to the DXGKARG_SUBMITCOMMAND structure
    4) A pointer to an internal scheduler data structure
    Reference - DxgkDdiSubmitCommand routine (Windows Drivers)
    Machines Can Think

    Oxygen, Nature's paradox.

Similar Threads

  1. Spam Volumes: Past & Present, Global & Local
    By JMH in forum News You Can Use
    Replies: 0
    Last Post: 01-15-2013, 06:28 PM
  2. Crashing..... BSOD 0x119
    By alwazir5959 in forum BSOD, Crashes, Kernel Debugging
    Replies: 7
    Last Post: 08-12-2012, 03:47 AM

Log in

Log in