DPC Watchdog Violation BSOD

mracky · Jan 5, 2019

Hi guys,

Nice website, nice idea. I'm a linux programmer so I don't understand a lot of the Windows vernacular.
My stats:
· OS - Windows 10 Home
· x64
· Windows 7 was original OS on system.
· Original OS was OEM version , Windows 10 was the free upgrade
· Hardware is 2014; main hard disk replaced 2016
· OS installation was when Microsoft did the Windows 10 free upgrade - 2015?

· CPU - Intel Core i7 X980 3.33 GHz
· Video Card ATI Radeon
· MotherBoard -Pegatron 1PMTB-TK (hp:Truckee)
· Power Supply - HP 460 watts

· System Manufacturer HP
· Exact model number Pavilion Elite HPE-580t
· Desktop

The Sysnative zip file is attached.

I suspect I started receiving the DPC watchdog violation months ago, since i often leave the computer in sleep mode, and when I "wake" it, it was back on the log in screen. But I experienced this while working on the machine in November. I looked up what to do and most fix it sites said the SATA ACHI disk driver to be installed . My computer uses an Intel SATA RAID driver (even though I do not use RAID). I resinstalled this driver, but the DPC BSOD kept occurring.

In December, it seemed like it was more frequent. I did a chkdsk. Ok. Then I did a sfc /scannow. no errors.
Next, the dism.exe, and received the error "the wof driver encountered a corruption in the compressed files resource table" at 6%. Realizing I am in over my head, I searched for help and discovered your site.

Thanks for any help you can provide!

Patrick · Jan 5, 2019

Can we get a kernel? 0x101 is a PITA to debug without access to a lot of stuff not included in a small dump.

C:\Windows and then upload MEMORY.DMP to any favored drive website.

cwsink · Jan 5, 2019

Hi mracky and welcome to Sysnative!

The system event log shows quite a few 0x133 bugchecks, a number of 0x116 bugchecks, and a few others. The most recent 0x116 bugcheck was:

Code:

Event[3325]:
  Log Name: System
  Source: Microsoft-Windows-WER-SystemErrorReporting
  [COLOR=#008000]Date: 2018-11-24T14:18:07.337[/COLOR]
  Event ID: 1001
  Task: N/A
  Level: Error
  Opcode: N/A
  Keyword: Classic
  User: N/A
  User Name: N/A
  Computer: pony
  Description: 
The computer has rebooted from a bugcheck.  The bugcheck was: 0x00000116 (0xffffaf8377511010, 0xfffff80a42cc8efc, 0x0000000000000000, 0x0000000000000002). A dump was saved in: C:\WINDOWS\MEMORY.DMP. Report Id: f34d5866-e3cc-4672-8243-a8ae6bfe0f3c.

Bugcheck code 0x116 means Windows has detected a problem communicating with the display driver in a timely manner, tried to reset the display driver, but failed to do so. If the GPU is having problems it could also explain the 0x133 bugchecks. The other bugcheck codes are typical of memory corruption which might be explained by the display driver being successfully reset. A successful reset of the display driver could be noticeable as a brief flicker of the display and is usually accompanied by a live kernel dump which would appear in C:\Windows\LiveKernelReports or one of its subfolders. The collection app doesn't collect those dumps but does your system have that folder? If so, are there any dumps?

The only dump I see in the zip is a 0x133 bugcheck where the first parameter is 1. Minidumps aren't typically very useful for that bugcheck code and first parameter pair. Full MEMORY.DMP files aren't always, either. Sometimes a useful ETL trace can be extracted from a full MEMORY.DMP but that seems to be a 50/50 shot, in my experience.

Is the GPU a component you replaced in 2016?

mracky · Jan 5, 2019

Hi CW -

C:\Windows|LiveKernelReports and its sub-folders are all empty. I used CCleaner on my machine on 12/20 so perhaps that erased any dump files, although the BSOD happened this morning so I would expect that if the kernel dumped there would be something in there. I know -- a doh! moment. I will disable that cleanup app immediately.

Towards your question, I have not replaced the GPU; it is the original.

cwsink · Jan 5, 2019

If C:\Windows\MEMORY.DMP exists we might be able to get an etl trace from it which might show which DPC took so long. I have seen 2 minutes be the amount of time for a GPU associated DPC watchdog timer to runout, though. Can you remove the GPU and use onboard graphics (assuming the computer has that option) to see if the problems continue in that configuration? Or another known good GPU you could try?

mracky · Jan 5, 2019

Hi Patrick

Memory.zip is on dropbox at
Dropbox - Public - Simplify your life

mracky · Jan 5, 2019

Memory.zip is on dropbox at

https://www.dropbox.com/s/k9la954dh7s54yn/MEMORY.zip?dl=0

no onboard graphics...

cwsink · Jan 5, 2019

I was able to extract a trace but it's not readable by Windows Performance Analyzer, unfortunately.

I've not tried this but supposedly you can start a trace using xperf, use the system until it crashes, and then extract the trace from the resulting MEMORY.DMP file. It looks like this problem can take a long time to happen on your system, though, so we'd need to set up a circular buffer unless you know of a way to reliably reproduce the crash. We can try that but I'm not sure it will work and it might be a waste of your time, anyway. I'd be curious, though, if you're game. :)

Short of that, I do suspect the GPU is having problems based on what I see in the event logs and that the LiveKernelReports folder exists on your system. My guess would be TDR errors due to a faulty GPU. However, many things can cause TDRs as explained in this Nvidia post (it applies to AMD cards as well.)

My usual recommendation in this situation is to uninstall the GPU drivers using DDU in safe mode, reboot, and then reinstall the GPU drivers to see if that helps. If it doesn't I'd then want to try a different GPU. Those are usually the easiest things to try which commonly turn out to be the problem. I've also seen TDR's less commonly caused by failing PSU's, motherboard running an outdated BIOS, and faulty motherboards.

Perhaps we should see if we can get some live kernel dumps to verify you're getting TDR's unless the above suggestions are something you can easily try.

Patrick · Jan 5, 2019

Code:

Bugcheck code 00000133
Arguments 00000000`00000001 00000000`00001e00 fffff801`e6c5d378 00000000`00000000

0x1 first param tells us that we spent way too much time at DISPATCH_LEVEL (or above) IRQL. This is tough to debug due to the nature of the DPC's involving multiple drivers and not likely crashing when the offending code was actually running, whereas an 0x0 1st parameter would only deal with a single DPC which really only deals with one driver at the exact time of the crash.

The best way to deal with this type of crash is to use some sort of event tracing, like Xperf or WPA being two options. When using these tools, whichever you choose, you'll want to monitor for irregular DPC latency (see what DPC's from what drivers are taking way too long to resolve, etc). We actually have a good tutorial here - How to Diagnose and Fix High DPC Latency Issues with WPA (Windows Vista/7/8)

Edit: Lol, I didn't even see that cw was also discussing event tracing as well. Nice.

mracky · Jan 6, 2019

Downloaded xperf and followed directions from the sysnative how-to:
How to Diagnose and Fix High DPC Latency Issues with WPA (Windows Vista/7/8)

as recommended by Patrick. Results:

C:\WINDOWS\system32>xperf -on DiagEasy -maxbuffers 512 -buffersize 128

C:\WINDOWS\system32>xperf -d %userprofile%\Desktop\trace.etl
Merged Etl: C:\Users\mike\Desktop\trace.etl
The trace you have just captured "C:\Users\mike\Desktop\trace.etl" may contain personally identifiable information, including but not necessarily limited to paths to files accessed, paths to registry accessed and process names. Exact information depends on the events that were logged. Please be aware of this when sharing out this trace with other people.

C:\WINDOWS\system32>xperf %userprofile%\Desktop\trace.etl
xperf: error: C:\Users\mike\Desktop\trace.etl: The parameter is incorrect. (0x80070057).

Microsoft (R) Windows (R) Performance Analyzer Version 10.0.17763
Performance Analyzer Command Line
Copyright (c) 2018 Microsoft Corporation. All rights reserved.

Usage: xperf options ...

xperf -help start for logger start options
xperf -help providers for known tracing flags
xperf -help stackwalk for stack walking options
xperf -help stop for logger stop options
xperf -help merge for merge multiple trace files
xperf -help processing for trace processing options
xperf -help symbols for symbol decoding configuration
xperf -help query for query options
xperf -help mark for mark and mark-flush
xperf -help format for time and timespan formats on the command line
xperf -help profiles for profile options

C:\WINDOWS\system32>dir %userprofile%\Desktop\trace.etl
Volume in drive C is HP
Volume Serial Number is 59DB-D04F

Directory of C:\Users\mike\Desktop

01/06/2019 05:02 PM 42,074,112 trace.etl
1 File(s) 42,074,112 bytes
0 Dir(s) 1,071,517,343,744 bytes free

I had to bump up the buffer size and max buffers in order to capture all events -- without I get missing events reports on the cmd line.
i could not continue the suggested procedure in the how-to since the xperf command failed to bring up a window. Probably version differences?
However the file is there, as seen by the dir results. (I could view the file using WPA but no DPC CPU chart was there.)

cw, I would be willing to set up a circular buffer, if you can walk me through it.

I assume the GPU is on the Radeon board since the motherboard has no graphic capabilities. I have started to shop for a new graphics card but would like to verify that is the problem before pulling the trigger.

Thanks for your help.

Patrick · Jan 6, 2019

I just learned according to MSFT that xperf/view is no longer available as of Windows 8.1. Give WPA a try instead.

cwsink · Jan 7, 2019

@Patrick, my understanding is a less functional console version of wpr is included with Windows but xperf is intalled along with WPA when the latest Windows 10 SDK or Assessment and Deployment Kit is used to install the Windows Performance Toolkit. That was true after the release of Windows 10 1809, anyway, but perhaps it's changed since.

@mracky, I'm not sure where things stand on your system as far as xperf right now or which version you have - if any. This is an xperf script which should work so I'd recommend putting the script on your Desktop, right-clicking it, and choosing "Run as administrator". It should launch an elevated command prompt at which point you press any key to start an xperf trace and just leave the command prompt window open while you use the computer as you normally would until a crash happens. The command prompt window should look like this while capturing the trace:

When the computer eventually crashes the trace should be saved in the MEMORY.DMP file and can hopefully be extracted. There may also be a file called kernel.etl at the root of a drive which might be needed but we can look for that if necessary. If you want to stop the trace early for some reason just press any key again and it should generate a file called [FONT=Verdana,Arial,Tahoma,Calibri,Geneva,sans-serif]trace_417.58.etl on your Desktop.[/FONT]

Please try the script and let me know if it looks any different from the above screenshot.

mracky · Jan 7, 2019

cw - looks the same, i'll keep it running while I use the computer. Will be in touch once this bsod happens again. thanks.

cwsink · Jan 7, 2019

It should be capturing a trace then so we'll see how it goes. I did notice a mistake in the script but it shouldn't affect capturing the trace. There's a line which tries to delete a file named "trace.etl" from the Desktop which is supposed to match the filename of the file which gets created when the trace is stopped. It shouldn't be a problem unless you have a file called trace.etl file on your Desktop already. The last 2 lines of the script should generically be:

Code:

[FONT=Verdana]del "%userprofile%\Desktop\trace.etl"
xperf -stop -d "%userprofile%\Desktop\trace.etl"[/FONT]

I've been adding the Nvidia driver version number to the filename for testing I've been doing on my own system.

mracky · Feb 6, 2019

OK it finally tripped again. For a while, I thought that your script itself had altered conditions to create the problem. (Like when a debugger changes the stack so you can't debug a stack bug.)
Here is the link to the dropbox Zipped memory file:

Dropbox - MEMORY_02062019_811.zip

There were no \Windows\LiveKernelReports.

cwsink · Feb 6, 2019

Running a trace inevitably has a performance impact which can make a system behave differently than it otherwise would, unfortunately. It's why I was trying to use specific configuration parameters in the script to minimize that potential impact.

Oddly, the latest dump has a first parameter of 0x0 which normally means the callstack should show us which driver was responsible for the long running DPC. It's not showing up in the automated analysis generated callstack but looking at the raw callstack I see a few third party drivers. The one most interesting to me is e1y62x64.sys which has a timestamp that makes me nervous on a Windows 10 system:

Code:

9: kd> lmDvm e1y62x64
Browse full module list
start             end                 module name
fffff80b`d80e0000 fffff80b`d8129000   e1y62x64   (no symbols)           
    Loaded symbol image file: e1y62x64.sys
    Image path: \SystemRoot\system32\DRIVERS\e1y62x64.sys
    Image name: e1y62x64.sys
    Browse all global symbols  functions  data
    [COLOR=#ff0000]Timestamp:        Fri Jun 12 18:16:42 2009[/COLOR] (4A32FDFA)
    CheckSum:         00049736
    ImageSize:        00049000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:

That appears to be an Intel Ethernet driver according to this page. Following the download link to Intel seems to just display a page directing people to check the product support site for their computer. However, there's another driver loading named e1i63x64.sys which is also an Intel Ethernet driver according to this page which directs you to an Intel site from which downloads are available. So, I'm not sure where things stand as far as your computer and Ethernet drivers. Is there something unusual about how your network adapters are configured? Multiple Ethernet adapters? Anyway, that driver has a more reasonable timestamp:

Code:

fffff80b`d81d0000 fffff80b`d8256000   e1i63x64   (pdb symbols)          c:\my\sym\e1\e1i63x64.pdb\49686BA8D99E4229B174F2D817A2EFED1\e1i63x64.pdb
    Loaded symbol image file: e1i63x64.sys
    Image path: \SystemRoot\System32\drivers\e1i63x64.sys
    Image name: e1i63x64.sys
    Browse all global symbols  functions  data
    [COLOR=#008000]Timestamp:        Fri Mar  4 13:46:29 2016[/COLOR] (56DA0235)
    CheckSum:         00086238
    ImageSize:        00086000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:

The other drivers are Avast components and an Intel storage driver. There have been numerous occasions in the recent past where Avast seemed to be causing problems on some systems. My usual recommendation is to uninstall it and just use Windows Defender but I'd suggest at least making sure you've applied any available product updates if you want to keep it, if you haven't already.

The Intel driver is a component of Intel Rapid Storage Technology according to this page. That's another one I suggest people uninstall unless they used it to configure a RAID array. It also has been causing problems on some systems in the recent past.

As far as the trace, was a file named kernel.etl generated on one of your drives? Mine shows up at the root of C: typically. I'd need that to try and merge it with the remnants of the trace in the dump to hopefully get a useful trace. I'm not sure it will have been generated, though. Sometimes it does and sometimes it doesn't on my own system.

mracky · Feb 6, 2019

2009 timestamp? egads!! Yes, I have two ethernet cards. I have not used the inserted card for two years since i reconfigured my LAN. I'm going to pull it, so that driver will be useless.
=====================
Avast huh?
My event log has this entry before the crash:
Log Name: System
Source: Service Control Manager
Date: 2/6/2019 8:13:39 AM
Event ID: 7009
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: pony
Description:
A timeout was reached (30000 milliseconds) while waiting for the 30000!s! Update Service (avast) service to connect.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Service Control Manager" Guid="{555908d1-a6d7-4695-8e1e-26931d2012f4}" EventSourceName="Service Control Manager" />
<EventID Qualifiers="49152">7009</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x8080000000000000</Keywords>
<TimeCreated SystemTime="2019-02-06T16:13:39.463728800Z" />
<EventRecordID>18484</EventRecordID>
<Correlation />
<Execution ProcessID="964" ThreadID="5792" />
<Channel>System</Channel>
<Computer>pony</Computer>
<Security />
</System>
<EventData>
<Data Name="param1">30000</Data>
<Data Name="param2">%1!s! Update Service (avast)</Data>
<Binary>610076006100730074000000</Binary>
</EventData>
</Event>
==========================================
Will probably rely on Windows Defender as you suggest.
I do not have a RAID. According to what I've read in forums, the SATA RAID Controller is legit for non-RAID configurations. I will see what happens when I disable it.

I have put the kernel.etl up on dropbox:
Dropbox - kernel.zip

Thanks.

cwsink · Feb 7, 2019

Please let us know what you find out. I'm going to see about merging the traces to see if it can provide useful information. I've not actually done it before so it might take a bit of trial and error. I'll reply with results when I'm done.

mracky · Feb 7, 2019

Uninstalled Avast.
Pulled extra ethernet card. The 2009 driver is for the onboard ethernet port. Considering putting the card back in, and disabling that older driver.
Got rid of Intel SATA RAID controller.

New BSOD this morning while I was using it. this time TDR failure (referencing atikmpag.sys). LiveKernelReport dmp generated. zipped up both Memory.dmp and watchdog.dmp. kernel.etl in C:\ is 0 bytes.
Dropbox - WATCHDOG-20190207-0707.zip

cwsink · Feb 7, 2019

Actually, if this AMD page is to be believed and the msinfo32 information is accurate, it doesn't look like your GPU is supported on Windows 10. I do remember a system with a Radeon 4700 card having stability problems which were only resolved by replacing the GPU. It could be you're in the same situation with a Radeon 4800, unfortunately.

DPC Watchdog Violation BSOD

Member

Attachments

Sysnative Staff

Sysnative Staff, BSOD Kernel Dump Expert

Member

Sysnative Staff, BSOD Kernel Dump Expert

Member

Member

Sysnative Staff, BSOD Kernel Dump Expert

Sysnative Staff

Member

Sysnative Staff

Sysnative Staff, BSOD Kernel Dump Expert

Member

Sysnative Staff, BSOD Kernel Dump Expert

Member

Sysnative Staff, BSOD Kernel Dump Expert

Member

Sysnative Staff, BSOD Kernel Dump Expert

Member

Sysnative Staff, BSOD Kernel Dump Expert