Obscure BSOD over the past few months

Rainman65 · Oct 7, 2019

Hey Everyone,

Been trying to solve this obscure BSOD issue for a number of months now and can't seem to get to the bottom of it. Randomly why idle either overnight or while I'm at work the machine will BSOD and reboot with a bugcheck code of 126 (7E), this originally used to be a hard hang and further down the line of troubleshooting it became a reboot.

Before I replaced the mobo+cpu+psu, it started 1 day when I plugged an external 2.5" USB drive into 1 of the front header USB 3.0 port, was copying files to it overnight and next morning it did hard hang so I had to reboot it, didn't think too much of it at the time, over the coming months it started doing once a day while idle, or once every few days, maybe once a week or maybe once a month and of course it never created any dump files, this is part of the reason I can't seem to find what the issue is, it is worth noting I recently bought an RTX 2080 about 2-3 months before this started happening and my machine is custom water cooled by myself, I have monitored and temps have never been a problem.

1.) Roll-back to a known good working image of Windows I had, this is the same image that had been working for a few months
2.) Re-installed Windows from scratch, I used to be on LTSC 1809.
3.) Checked for updated drivers.
4.) Ran driver verifier which did find Logitech drivers were causing a hard hang shortly after logging in, removed that but didn't made a difference.
5.) Ran memtest86+ on each of the dimms, no issues
6.) Ran Furmark and Prime95 overnight, no issues

Some of the details are bit hazy as it's been ages and I did spend hours and hours trying different things, eventually the machine started rebooting instead of hard hanging and just 1 of the times it rebooted it created a minidump file which pointed to the nvlddmkm.sys driver, I possibly updated/reverted the nVidia drivers by this point, I was dusting out the PC and found that the slightest nudge on the ram sticks started hard hanging the machine, so I put some pressure on the dimms (4 dimms in total) to keep them seated firmly in plaace, eventually after a few days or a few weeks it had rebooted so that wasn't it, there was also the fact that the watercooled 2080 and the TITAN X before it had a heavy gpu sag on the PCI-E slot it was sitting in, also occassionally taking the 24 pin ATX connector out and dimms caused quite some bend in the motherboard, ultimately it got to the point where the machine would not even post, stripped down the PC, motherboard out on it's own on workbench, good known working PSU, known good video card, and still no post so I wrote that mobo off as I did not have another CPU to test with.

Anyways I bought a new mobo+cpu, the PSU is one of the first things I had replaced a while back when I started experiencing this issue, new build of Windows 10 1809, all drivers up to date, everything clean, working, no probs for a few months now, Friday evening I came home and plugged a USB camera into 1 of the front USB 2.0 header ports and overnight at 2:46AM it had crashed and rebooted, same bugcheck code 126, this time it did write a minidump file which points to the nvlddmkm.sys driver, however I also noticed that it created .dmp files in another location that showed a driver crashing called "USBHUB3", I then used Appcrashview which was interesting because at the very second that the event log reported the bugcheck error which was 2:46:53 there was 7 events that took place, started with the USBHUB3 driver causing a crash, I've now moved that USB camera to a USB hub I have connected (I did try changing hubs months back this did not help), I have not had any random reboots yet but it's getting to the point now where I really just want to drill down into what exactly is causing this, if it is a faulty RTX 2080 I'd like to find out because the RTX 2080 has a USB Type-C connector on it and 3 of the USB devices in Device Manager are registered under NVIDIA and using the USBHUB3 driver, also looking in the event log it's saying where the .dmp and .xml etc files are for the USBHUB3 crash but they never exist, it doesn't actually seem to create the files.

Apologies some of the details are a bit hazy, I've been trying to remember everything I've done over the past number of months. I've attached a link to the minidump file it created for nvlddmkm.sys as well as a previous USBHUB3 .dmp file it did manage to create, any help would be massively appreciated as I'm very much lost at this point, I have tried to look at both .dmp files using windbg but not experienced enough to understand what the assembly instructions are doing or what else is going on, there is also a MEMORY.DMP if it helps but it's 1.2GB so will upload upon request.

Minidump/msinfo32 below:
MEGA

If you need any more info let me know.

Thanks
Rainman65

x BlueRobot · Oct 9, 2019

I apologise that no one has been able to get to your thread, however, I'll have a look at the files tonight for you.

Rainman65 · Oct 9, 2019

No worries at all x BueRobot, and thanks.

I did run the bsodcollection app and have got a link to the results below:
MEGA

Regarding USB, below is a list of all USB devices I can remember right now connected to the machine:
USB Hub (USB 2.0) > Keyboard / Xbox Wireless Adapter / USB IR Camera
Front Header USB 2.0 > Logitech Wireless USB Receiver
Rear USB 2.0 > Audient USB Audio Interface / External 2.5" 1TB SSD
Rear USB 3.0 > 3 x Oculus Sensors / Oculus Rift USB

x BlueRobot · Oct 9, 2019

Code:

1: kd> [COLOR=rgb(65, 168, 95)]knL[/COLOR]
 # Child-SP          RetAddr           Call Site
00 ffffa383`23ff8f18 fffff806`583dabe4 nt!KeBugCheckEx
01 ffffa383`23ff8f20 fffff806`5839c399 nt!PspSystemThreadStartup$filt$0+0x44
02 ffffa383`23ff8f60 fffff806`583ca04f nt!_C_specific_handler+0xa9
03 ffffa383`23ff8fd0 fffff806`582c3555 nt!RtlpExecuteHandlerForException+0xf
04 ffffa383`23ff9000 fffff806`582c7aee nt!RtlDispatchException+0x4a5
05 ffffa383`23ff9750 fffff806`583d321d nt!KiDispatchException+0x16e
06 ffffa383`23ff9e00 fffff806`583cf405 nt!KiExceptionDispatch+0x11d
07 ffffa383`23ff9fe0 fffff806`73734bc2 [COLOR=rgb(255, 0, 0)]nt!KiPageFault+0x445[/COLOR] << We crash here!
08 ffffa383`23ffa170 ffffd009`9d654c58 [COLOR=rgb(0, 0, 255)]nvlddmkm+0x9b4bc2[/COLOR]
09 ffffa383`23ffa178 ffffd009`00000000 0xffffd009`9d654c58
0a ffffa383`23ffa180 00000000`00000000 0xffffd009`00000000

Code:

1: kd> [COLOR=rgb(65, 168, 95)].exr 0xffffa38323ff9f38[/COLOR]
ExceptionAddress: [COLOR=rgb(255, 0, 0)]fffff80673734bc2 [/COLOR](nvlddmkm+0x00000000009b4bc2)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: 0000000000000000
Attempt to read from address 0000000000000000

Code:

1: kd> [COLOR=rgb(65, 168, 95)].cxr 0xffffa38323ff9780[/COLOR]
rax=0000000000000000 rbx=000000280000001d rcx=[COLOR=rgb(255, 0, 0)]0000000000000000[/COLOR]
rdx=0000000000000000 rsi=0000000000000001 rdi=ffffd0099d647000
rip=fffff80673734bc2 rsp=ffffa38323ffa170 rbp=ffffd0099d654c58
 r8=0000000000000000  r9=0000000000000000 r10=0000000000000001
r11=fffff780000003b0 r12=ffffd0099a523938 r13=ffffd0099a51b000
r14=0000000000000000 r15=ffffd0099d489001
iopl=0         nv up ei ng nz na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00050286
nvlddmkm+0x9b4bc2:
fffff806`73734bc2 488b01          mov     rax,qword ptr [[COLOR=rgb(255, 0, 0)]rcx[/COLOR]] ds:002b:00000000`00000000=????????????????

It seems like an illegal page fault occurred because of a null pointer. This is a very common issue and a very common bugcheck. Unfortunately, it appears that your graphics card driver may be the culprit here, although, this can be due to RAM and other hardware quirks.

Code:

1: kd> [COLOR=rgb(65, 168, 95)]lmvm nvlddmkm[/COLOR]
Browse full module list
start             end                 module name
fffff806`72d80000 fffff806`7433a000   nvlddmkm T (no symbols)           
    Loaded symbol image file: nvlddmkm.sys
    Image path: \SystemRoot\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_c7bdd6222811a2ee\nvlddmkm.sys
    Image name: nvlddmkm.sys
    Browse all global symbols  functions  data
    Timestamp:        [COLOR=rgb(255, 0, 0)]Fri Sep 27 00:25:12 2019[/COLOR] (5D8D48D8)
    CheckSum:         0155803A
    ImageSize:        015BA000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:

Rainman65 said:
[...]
if it is a faulty RTX 2080 I'd like to find out because the RTX 2080 has a USB Type-C connector on it and 3 of the USB devices in Device Manager are registered under NVIDIA and using the USBHUB3 driver
[...]

Interesting, that driver may be possibly be getting blamed since it's part of the driver stack for other devices.

There also appears to be a driver update released on the 01/08/2019, could you please see if you can download and install it? I did a manual search.

Drivers | GeForce

In regards to Driver Verifier, did you follow these instructions? Driver Verifier - BSOD related - Windows 10, 8.1, 8, 7 + Vista

Rainman65 · Oct 10, 2019

Thanks for the response x BlueRobot, much appreciated, both times the crashes started was when I plugged something into the front header USB ports, and then overnight it crashed, since last Saturday it has not done it but then I have also been rebooting and trying different things, at this point I'm simply going to opt to rebuild my system on a clean Win 10 1903 and will see how that goes, if it happens again I may replace the RAM first as that will be cheaper than replacing the video card.

x BlueRobot · Oct 10, 2019

I don't think is necessarily a hardware issue at this point, it's possibility, but could easily be a software issue. What is typically connected to the machine when it crashes? Could you please try running the system with just the keyboard and mouse connected for a few days? I've seen Xbox drivers cause several BSODs in the past.

Rainman65 · Oct 10, 2019

Yeah it's a bit odd I mean I use everything that's connected to it but and sometimes the problem might only happen 1 time in a month, where just recently it was fine for 2 months and then crashed once

When it crashed on Saturday morning the following was connected:
Front Header USB 2.0 > Logitech Wireless Mouse Receiver / USB IR Camera
Rear USB 2.0 > USB 2.0 Hub (Externally powered) > Xbox Wireless Adapter / Keyboard
Rear USB 2.0 > 2.5" 1TB SSD
Rear USB 2.0 > Audient iD USB Audio Interface
Rear USB 3.0 > 3 x Oculus Rift Sensors / Rift USB

When it first started doing this months ago it's the same as above just replace the USB IR Camera with a 512gb 2.5" external SSD (I was copying files to it overnight and it hard hanged the machine)

x BlueRobot · Oct 10, 2019

Try just using the keyboard and mouse for a few days, I suspect that it may be issues with either one of the USB ports or a driver associated to one of the USB connected devices. Have you checked for BIOS updates as well?

Rainman65 · Oct 10, 2019

Hey, originally I was on a recent BIOS when it did crash, BIOS v1902 which was a July 2019 BIOS, I've since updated to v2002 which is a September 2019 BIOS.

I forgot to answer your question regarding driver verifier, I did not follow the guide you linked to but a different one a while back, I did run verifier against the USBHUB3.sys, nvlddmkm.sys and the Intel Ethernet driver .sys for about 3-4 hours which had no fault, I selected all "Test types" except for "Randomized low resources simulation" and "DDI compliance checking".

Many months back I ran verifier with those same options but against all drivers and and found that the logitech keyboard driver it installed as part of the Logitech Mouse software would cause a hard hang as soon as you login to the machine, after that I think it was SlySoft AnyDVD which cause a hang, after removing those 2, verifier did not hang the machine but later on I did still experience random restarts with the same bugcheck code.

I do see what your saying about it easily being a software issue, I mean months ago when it started, during troubleshooting it went from hard hanging to rebooting instead, I can try using just keyboard and mouse for few days, although it can often run fine for weeks to months now before it will decide to fault.

x BlueRobot · Oct 10, 2019

Could you please run Driver Verifier against all the third-party drivers with DDI Compliance checking enabled as well as the other options. The reason for doing so, is because DDI Compliance checking will check for synchronisation issues involving locks which can be a common cause of hangs.

Rainman65 · Oct 10, 2019

Interesting, will try that this evening before I begin rebuilding the machine and let you know the results.

x BlueRobot · Oct 10, 2019

Thanks, especially since Driver Verifier found issues before.

Rainman65 · Oct 11, 2019

Hey, so I did run the verifier against all 3rd party drivers and it did BSOD the machine on the Asus Essense STX II Sound card driver, below is a link to the minidump file it created, however after seeing it happen again I do remember it doing that before with that STXII.sys driver even without DDI compliance checking on, I've rebuilt my machine now on a clean Win 10 1903 and will see how it goes.

Since I had to wait for 2 BSOD's before it started auto recovery and then could boot into Safe Mode and disable verifier I proceeded instead to just start the rebuild.

MEGA

MrPepka · Oct 11, 2019

Update:
Asus Essence STX II driver (Essence STX II 7.1 Driver & Tools | Karty dźwiękowe | ASUS Polska)

Rainman65 · Oct 11, 2019

MrPepka said:
Update:
Asus Essence STX II driver (Essence STX II 7.1 Driver & Tools | Karty dźwiękowe | ASUS Polska)

Yup, that's the driver I've been running for years now, it was released in 2015 so maybe some changes that Microsoft have made in Windows since then might be affecting functionality, I mean I've never had a bluescreen during normal use which said it was that driver, the cards never crashed it's been stable.

MrPepka · Oct 11, 2019

Hmm, this driver's timestamp is from ... 2014. Apparently, Asus tested it at home for a year before publishing it on the site. You can also use some modified drivers for this card like uni xonar, and they will help

Rainman65 · Oct 11, 2019

MrPepka said:
Hmm, this driver's timestamp is from ... 2014. Apparently, Asus tested it at home for a year before publishing it on the site. You can also use some modified drivers for this card like uni xonar, and they will help

Yup, defo from 2014 so is quite old, I have come across the Xonar uni drivers you mentioned as well but was always a bit weary about using them, under normal usage I've never had any issues with this card and any bluescreen has never been attributed to it but I get that it could be doing something which might trigger something else and not get logged.

Rainman65 · Oct 12, 2019

So, after a rebuild it's done it again, happened this morning at 2:52am while idle, will test each ram stick over the next few days, have included a link to the minidump file it created but it's pointing to the same driver.

MEGA

x BlueRobot · Oct 12, 2019

Code:

19: kd> [COLOR=rgb(65, 168, 95)].cxr 0xffff988a22266780[/COLOR]
rax=0000000000000000 rbx=000000280000001d rcx=[COLOR=rgb(255, 0, 0)]0000000000000000[/COLOR]
rdx=0000000000000000 rsi=0000000000000001 rdi=ffff870f4e624000
rip=fffff8065a3e4bc2 rsp=ffff988a22267170 rbp=ffff870f4e631c58
 r8=0000000000000000  r9=0000000000000000 r10=0000000000000001
r11=fffff780000003b0 r12=ffff870f4bf23938 r13=ffff870f4bf1b000
r14=0000000000000000 r15=ffff870f4ce55001
iopl=0         nv up ei ng nz na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00050286
[COLOR=rgb(0, 0, 255)]nvlddmkm[/COLOR]+0x9b4bc2:
fffff806`5a3e4bc2 488b01          mov     rax,qword ptr [[COLOR=rgb(255, 0, 0)]rcx[/COLOR]] ds:002b:00000000`00000000=????????????????

It appears to be the exact same error as before and the same driver.

Code:

19: kd> [COLOR=rgb(65, 168, 95)]lmvm nvlddmkm[/COLOR]
Browse full module list
start             end                 module name
fffff806`59a30000 fffff806`5afea000   nvlddmkm T (no symbols)           
    Loaded symbol image file: nvlddmkm.sys
    Image path: \SystemRoot\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_c7bdd6222811a2ee\nvlddmkm.sys
    Image name: nvlddmkm.sys
    Browse all global symbols  functions  data
    Timestamp:       [COLOR=rgb(255, 0, 0)] Fri Sep 27 00:25:12 2019[/COLOR] (5D8D48D8)
    CheckSum:         0155803A
    ImageSize:        015BA000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:

You mentioned in an earlier post that you were going to begin rebuilding the system, have you managed to do this yet?

Rainman65 · Oct 12, 2019

Yup, so in the above post I mentioned I have rebuilt the system, did this on Thursday evening, overnight it was fine, Friday it was fine and in the AM hours of this morning it did it again, am losing the plot at this point, really not sure what it is, this is the newest NVIDIA driver and I have tried older ones.

Obscure BSOD over the past few months

Member

Administrator

Member

Administrator

Member

Administrator

Member

Administrator

Member

Administrator

Member

Administrator

Member

Sysnative Staff, BSOD Kernel Dump Senior Analyst

Member

Sysnative Staff, BSOD Kernel Dump Senior Analyst

Member

Member

Administrator

Member