VIDEO_TDR_FAILURE with NVIDIA driver

HapaxOromenon

Well-known member
Joined
Nov 3, 2018
Posts
81
I was watching a YouTube video when the audio suddenly froze and started stuttering; a second later, I got a BSOD with VIDEO_TDR_FAILURE in nvlddmkm.sys, i.e. the driver for my NVIDIA GPU. This was a bit surprising as this computer also has Intel integrated graphics, and it was my understanding that the integrated graphics would be used unless I were to specifically right-click on a program and choose "Run with graphics processor".

When I ran jcgriff's BSOD collection tool, it showed "file not found" when copying minidumps, and the KernelDumpList.txt file also showed no dumps, despite the fact that there is a dump file in C:\Windows\Minidump. Later during execution, Kaspersky blocked cscript.exe for suspicious behaviour, and then terminated the tool itself for the same reason. I would prefer not to disable Kaspersky at this time, so I cannot provide the tool output. But I have attached the dump file (zipped up as the forum doesn't seem to allow files with the .dmp extension), as well as the System and Application event logs (zipped up for the same reason), and the output of DriverStore Explorer. My specs are also below:

Dell Precision 3530 laptop; i5-8300H; 8 GB RAM; NVIDIA Quadro P600; 128 GB SK-Hynix SSD; 1 TB Seagate HDD; Windows 10 Pro (OEM) x64, 18363.535. This computer is about 6 months old.
 

Attachments

  • 122219-11984-01.zip
    122219-11984-01.zip
    726.1 KB · Views: 4
  • Drivers 1.png
    Drivers 1.png
    61.8 KB · Views: 3
  • Drivers 2.png
    Drivers 2.png
    55.4 KB · Views: 4
  • Drivers 3.png
    Drivers 3.png
    32.2 KB · Views: 0
  • EventLogs.zip
    EventLogs.zip
    6.2 MB · Views: 1
Rich (BB code):
4: kd> lmvm nvlddmkm
Browse full module list
start             end                 module name
fffff803`87c00000 fffff803`89291000   nvlddmkm T (no symbols)           
    Loaded symbol image file: nvlddmkm.sys
    Image path: nvlddmkm.sys
    Image name: nvlddmkm.sys
    Browse all global symbols  functions  data
    Timestamp:        Fri Dec  6 17:28:35 2019 (5DEB0043)
    CheckSum:         01634AED
    ImageSize:        01691000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:

Since this is an OEM machine, then I would suggest that you roll back to the OEM supplied driver instead and see if that provides any stability.

Download page - Support for Precision 3530 | Drivers & Downloads | Dell US

When I ran jcgriff's BSOD collection tool, it showed "file not found" when copying minidumps, and the KernelDumpList.txt file also showed no dumps, despite the fact that there is a dump file in C:\Windows\Minidump.

Interesting, I think that's the first time which I've seen that issue reported before.

Later during execution, Kaspersky blocked cscript.exe for suspicious behaviour, and then terminated the tool itself for the same reason. I would prefer not to disable Kaspersky at this time, so I cannot provide the tool output.

I wonder if Kaspersky is causing issues with the tool being able to find the dump files? @jcgriff2 have you had any similar user reports?
 
Since this is an OEM machine, then I would suggest that you roll back to the OEM supplied driver instead and see if that provides any stability.

Thanks for replying, but the driver provided by Dell appears to be a full 2 major versions behind - 24.21.13.9875 vs. current 26.21.14.4166. Given that NVIDIA drivers have had some security vulnerabilities in the past (see e.g. NVIDIA Fixes Security Flaws in GPU Driver, GeForce Experience and NVIDIA Patches High Severity Flaws in Windows GPU Display Driver), I would prefer at least to wait to see what others (such as @jcgriff2) think before installing such an outdated driver.
 
I wonder if Kaspersky is causing issues with the tool being able to find the dump files? @jcgriff2 have you had any similar user reports?

Yes, Kaspersky is blocking all of our tools and we are aware of it.

They have stayed silent to our attempts to report the issue to them in order to get it resolved.

Also, the fact that the driver is behind does not mean it's worse.
 
You will not be affected by all of these, so no, it should not matter to you at all in terms of performance or security. The primary thing is to get it working and to determine whether the driver is acting up or the card itself.
 
You will not be affected by all of these, so no, it should not matter to you at all in terms of performance or security. The primary thing is to get it working and to determine whether the driver is acting up or the card itself.

I agree, best to get the system stable and see if the issue with a particular driver update or the graphics card.

The security bulletin from Nvidia seems to provide a little more clarification on which cards are affected - Security Bulletin: NVIDIA GPU Display Driver - November 2019 | NVIDIA

Looking at some of the documentation on CVSS scoring, it appears that the scoring is designed to assess the severity of the vulnerability rather than the risk of the user being affected by it.

The CVSS Specification Document has been updated to emphasize and clarify the fact that CVSS is designed to measure the severity of a vulnerability and should not be used alone to assess risk.

Source - CVSS v3.1 User Guide
 
Note that an OEM implementation of a video chipset may mean that some vulnerabilities in the public unified driver might not impact an OEM implementation - for example, in Surface the implementation of Optimus graphics on Book and Book2 means that the Nvidia drivers are customized and do not implement the entire stack - which also means that using the public driver can cause issues the inbox driver does not. Same thing with Intel or onboard AMD graphics - implementations that aren't reference can mean only use the OEMs driver to avoid adding issues where they don't exist. Again, a Surface example is apropos here: on a Surface, the Intel or nVidia GPU drivers know how to read the calibration information for the panel, and will adjust their function accordingly for output. Direct-from-the-vendor public drivers from their respective sites *do not* read that calibration, and thus aren't as accurate and may also cause increased TDR failures and bugchecks.

Always check with the system manufacturer to ask about exposure to vulnerabilities, and when they will be providing an updated version if affected. Also, what (if anything) may happen if they use a vendor-supplied driver versus the one the system manufacturer is providing.
 
I agree, best to get the system stable and see if the issue with a particular driver update or the graphics card.

Well, I have run for a few days with Driver Verifier and got no BSODs; also ran the Unigine Heaven benchmark (UNIGINE Benchmarks) on max settings for some hours and it was uneventful. I can try the built-in Dell diagnostic for the GPU, but probably it won't give any different result.

Note that an OEM implementation of a video chipset may mean that some vulnerabilities in the public unified driver might not impact an OEM implementation - for example, in Surface the implementation of Optimus graphics on Book and Book2 means that the Nvidia drivers are customized and do not implement the entire stack - which also means that using the public driver can cause issues the inbox driver does not. Same thing with Intel or onboard AMD graphics - implementations that aren't reference can mean only use the OEMs driver to avoid adding issues where they don't exist. Again, a Surface example is apropos here: on a Surface, the Intel or nVidia GPU drivers know how to read the calibration information for the panel, and will adjust their function accordingly for output. Direct-from-the-vendor public drivers from their respective sites *do not* read that calibration, and thus aren't as accurate and may also cause increased TDR failures and bugchecks.

Always check with the system manufacturer to ask about exposure to vulnerabilities, and when they will be providing an updated version if affected. Also, what (if anything) may happen if they use a vendor-supplied driver versus the one the system manufacturer is providing.

In this case, if you compare Dell's driver with the public release of version 24.21.13.9875 (from e.g. Download NVIDIA Quadro P600 Graphics Driver 24.21.13.9875 for Windows 10), it appears that the .sys and .cat files are exactly the same, i.e. it isn't a customized version.
 
That's good, although there's still the GOP and UEFI to think about - that Dell model may or may not have a dependency, but just looking at the driver isnt sufficient in all cases. Plus, there's the possibility that the model is just a reference design in a box, and very much not custom, in which case vendor and OEM drivers will be VERY similar (and as safe as using the absolute latest Intel GPU driver as if the device is a NUC).

I can't speak for Dell, but Windows and Intel are making this difficult to do due to what I described above. Just FYI.

Also, if enabling verifier with default options "fixes" a problem, you can be almost certain the driver (and not hardware) is at fault, and short of having a debogger attached to see debug spew after enabling, it's a good start to assume the issue is either timing or corrupting heap that causes the issue, as those two things generally either aren't caught (corrupted heap versus over or underruns isn't validated by default) or get "solved" due to the additional cycles taken by verifier keeping the timing issue from occurring.

Good luck, I do hope you solve your driver issue.
 
Last edited:

Has Sysnative Forums helped you? Please consider donating to help us support the site!

Back
Top