Questions regarding recent black screen crash.

Status
Not open for further replies.

TheAnalysiser

Member
Joined
Jun 17, 2023
Posts
5
Hello.

While I was playing a video game, my computer experienced a complete system freeze that quickly transitioned into a black screen that more slowly transitioned to a successful reboot sequence. I'm trying to determine what caused it, and why things happened the way they did (more on that later). I've already implemented a fix, and it does seem to have solved the issue for now.

Below is a detailed timeline of events and troubleshooting information. If you would like to see the forum Q/A template I filled out, scroll down to the bottom.

1. I wake my system from extended period of sleep. Because the system had been asleep for about 50 minutes, I had to go through a quick boot sequence (P.O.S.T process I believe) and could only wake it via power button. I knew the system was asleep however because on Windows 11, the start menu is always comes open whenever I wake from sleep. From what I know, the reason why extended sleep sessions wake like this for me is because my SSD is set to turn off automatically after 20 minutes of inactivity by default. But I did research this and found this does occur for other owners of my system model line (Lenovo Legion) as well. Hibernation is completely off for my system and short sleep sessions work as you would expect.

2. I check Reliability Center (it's an old habit of mine) and do not see any recent errors or warnings.

3. I open the Minecraft launcher, and start playing Minecraft: Java Edition. This is the first time I've ever played any video game after an extended sleep session. I first run a solo practice session on a multiplayer server for about 5 minutes. Nothing out of the ordinary ever occurs.

4. I then decide to play a PvP match on the same server. After about 15 seconds of playing, the screen suddenly freezes. The freeze goes on for about 2 seconds, and then the screen goes black for around 10-12 seconds. No other visual or any auditory artifacts occur leading up to and during these freezes. Also, my computer fans remain quiet (they were for the entire session) and my RGB lights on my keyboard and mouse remain on (until the reboot, then they flicker once like normal). During the black screen, I do try to open task manager but it does nothing.

5. Finally, the system automatically reboots and I'm successfully able to return to my desktop. Everything is functioning perfectly, so I check reliability center. I see two new errors, one about a "Hardware Error" and the other just being about how the system did not shut down properly. Here's what I remember being listed in the first error:

Problem: Hardware error.

Description: A problem with your hardware caused Windows to stop working correctly.

Problem Signature:

1. Problem Event Name: LiveKernelEvent.

2. Code: 141.

I messed up and accidentally deleted this event log, so I unfortunately don't know what the rest of the log actually listed. I do believe it's directly connected to the .dmp I find later on though.

More investigation:

Next, I thoroughly checked both Event Viewer and Device Manager for ANY additional information regarding the crash. I find absolutely nothing useful, and I am really confident in this.

I also looked for a generated .dmp file, and I did find one in the "C:\WINDOWS\LiveKernelReports" directory. This is where things got interesting. Using WinDbg on another device (which ran on Windows 10 x64), the .dmp file has the following live dump listed: "VIDEO_TDR_TIMEOUT_DETECTED." The thing is, this implies that the recovery that comes with the TDR process was successful, which can't be true because my system force rebooted instead of sending me to the desktop and displaying the following message: "Display driver stopped responding and has recovered."

I'm hoping someone knows why/how it applies (or doesn't) to what occured to me. This is also what I meant in my second "answer goal."

As for my first "answer goal," which was finding the underlying cause, I've ruled out most of the potential causes listed in the following Microsoft article:

Bug Check 0x117 VIDEO_TDR_TIMEOUT_DETECTED - Windows drivers

Here are my ruled out causes:

Hardware issues that impact the ability of the video card to operate properly, including:

1. Over-clocked components, such as the motherboard: I don't overclock.

2. Incorrect component compatibility and settings: Never touch these for this to happen, and I have had same settings since day 1 without issue until now.

3. Insufficient system cooling: Fans were working (system wouldn't have made it past the P.O.S.T process if they weren't) but were not stressed/struggling, I was playing a light video game, I have no history of overheating, my fan filters are clean, and I was not experiencing any noticable thermal throttling. However, I never actually checked my system temperatures or touched near any potential physical "hot-zones" on my laptop, so maybe... but pretty unlikely.

4. Insufficient system power: Definitely not, I was plugged in and playing something light.

5. Defective parts: For a one time occurrence, this also is really unlikely, especially when everything performs exceptionally at the moment.

6. Visual effects, or too many programs running in the background may be slowing your PC down so that the video card can not respond as necessary: No other programs besides Minecraft, OneDrive, and Lenovo Fn keys were open, and again, it's a lightweight video game.

The one cause I did not end up ruling out was this:

"You may need to install the latest updates for your display driver, so that it properly supports the TDR process."

I didn't because, hardware issues are generally significantly less common than software ones, graphics drivers can in fact cause TDR issues, and I had no way of actually disproving it, so I decided to pursue ways to resolve this particular cause.

Before doing that, I did run SFC and DISM commands first to check for and repair any system corruption. Neither didn't reported anything wrong however.

Anyway, what I did was manually uninstall all Nvidia software via Control Panel's "Add/Remove Programs" menu, made sure they were uninstalled and that I was currently using Microsoft's Basic Display Adapter, and then installed up-to-date Nvidia drivers, making sure to use the installer's built-in "Clean Install" option.

While broken graphics drivers may have been the cause, I'm not actually sure if the crash hasn't reoccurred because of my efforts at "resolving" the potential "cause." So, I would appreciate someone providing some insight on this topic as well.

Now, while I will include a generated .txt output of the "!analyse -v" command run with the .dmp in WinDbg in this thread, I won't be providing the Sysnative file collection app's output files for the time being nor will I include the actual .dmp file. If you would like me to include .txt outputs of other commands run with the .dmp in WinDbg, feel free to ask, and if anyone actually finds a certain output file necessary from the collection app, there's no problem asking as well.

Continuation of Q/A:

Laptop or Desktop? Laptop.

Exact model number? Legion 5 15ACH6H.

OS? Windows 11 x64.

What was original installed OS on system? Windows 11 x64.

Is the OS an OEM version or full retail version? OEM.

Age of system? 6 months.

Age of OS installation? 6 months.

Have you re-installed the OS? No.

(Merged questions) System specifications?

Legion,Legion 5 15ACH6H,Model:82JU00MYUS

Note: I have a 500GB SSD and 16GB of RAM.

Is driver verifier enabled or disabled? Disabled.

What security software are you using? Windows Defender.

Are you using proxy, vpn, ipfilters or similar software? Yes.

Are you using Disk Image tools? No.

Are you currently under/overclocking? Yes, boost is disabled for my CPU using Power Plan settings.

Are there overclocking software installed on your system? No.

Other information:

Old Nvidia Driver Version: 528.02

New Nvidia Driver Version: 536.23
 

Attachments

It seems that you had a VIDEO_TDR_TIMEOUT_DETECTED bugcheck, this results when the TDR (Timeout Detection and Recovery) detects a graphics hang and recovers by resetting the adapter, in addition a Live Kernel Event dump is saved for Microsoft purposes.

There are two causes of this 0x117 bugcheck; a bad driver or a bad card. From what you've written it seems that you eventually updated the driver and that has solved the problem. FWIW, it's wisest to use DDU to uninstall all trace of earlier graphics drivers before installing a new version.
 
It seems that you had a VIDEO_TDR_TIMEOUT_DETECTED bugcheck, this results when the TDR (Timeout Detection and Recovery) detects a graphics hang and recovers by resetting the adapter, in addition a Live Kernel Event dump is saved for Microsoft purposes.

There are two causes of this 0x117 bugcheck; a bad driver or a bad card. From what you've written it seems that you eventually updated the driver and that has solved the problem. FWIW, it's wisest to use DDU to uninstall all trace of earlier graphics drivers before installing a new version.
Sorry, but this doesn't really get me any closer to finding answers. I already know what the bugcheck is, and what can cause it. It's all in the Microsoft article I linked, and I read it. However, like I said my original post, the bugcheck implies video recovery was successful, but it wasn't because my system rebooted. That is not what's supposed to happen with TDR recovery, so I'm looking for answers as to why the system lists this anyway.

I guess when comes to finding the underlying cause of the crash, it can only be generalized like the ones listed in the Microsoft article and in your reply, so I'm starting to think this isn't really possible to 100% answer anyway. But if anything was going to help, it wouldn't be repeating information.
 
But if anything was going to help, it wouldn't be repeating information.

Feel free to look for the information on a different forum.

Alternatively you can be polite to our volunteer staff, who are trying to help respond to your questions.
 
Feel free to look for the information on a different forum.

Alternatively you can be polite to our volunteer staff, who are trying to help respond to your questions.
A lot of time was spent writing out this post, but it would not have been nearly as hard if I wasn't trying to eliminate possible factors and accurately document everything I knew to save as much time for volunteer staff as possible. By providing old information, all of that effort goes to waste, because I already took the time to find and provide that information. It wastes time for everyone involved, which is why I object to it.
 
I won't be providing the Sysnative file collection app's output files for the time being nor will I include the actual .dmp file
I would like to know why not? This is what we need for more insight.
 
A lot of time was spent writing out this post, but it would not have been nearly as hard if I wasn't trying to eliminate possible factors and accurately document everything I knew to save as much time for volunteer staff as possible. By providing old information, all of that effort goes to waste, because I already took the time to find and provide that information. It wastes time for everyone involved, which is why I object to it.
TBH I skip-read your original post because (like most of us) I'm a volunteer with lots of other things that I want to get on with. Most people won't read a post that long in any case, so it's not really helping you that much.

The second point I'd make is that we have no idea of the level of expertise of the people we help, generally it's not high, but most of them have a firm idea on what their problem is - and they're usually wrong. We ask for the Sysnative File Collection output because it gives us all the information we need so that we can make a reasoned diagnosis ourselves - as a collective.

I'm sorry if you thought that my attempt to help you was worthless, but it won't keep me up at nights...
 
I would like to know why not? This is what we need for more insight.
I don't believe giving out that amount of system information is healthy (applies to both the app and the complete .dmp file itself) and also because I disagree with your point. When it comes to the app, I've already looked through anything that could have given direct info on the event, like for example Event Viewer Logs, and found nothing. Other files from the collection app only provide system info regarding installed software and hardware or network information, which would be useful in troubleshooting other BSODs, but my crash dump references nothing that would need additional context I haven't already provided. Maybe small things like driver/Windows component versions, but I can provide that manually. And outside of Nvidia drivers, I don't you'd be able to directly link anything software-related to the crash when both the crash dump and Windows logs don't even reference anything specific enough as a potential cause that isn't the display driver.
TBH I skip-read your original post because (like most of us) I'm a volunteer with lots of other things that I want to get on with. Most people won't read a post that long in any case, so it's not really helping you that much.

The second point I'd make is that we have no idea of the level of expertise of the people we help, generally it's not high, but most of them have a firm idea on what their problem is - and they're usually wrong. We ask for the Sysnative File Collection output because it gives us all the information we need so that we can make a reasoned diagnosis ourselves - as a collective.

I'm sorry if you thought that my attempt to help you was worthless, but it won't keep me up at nights...
Sorry, you make good points. I shouldn't expect good answers with these factors in mind.

Anyway, I think I know what may have been the cause. So the long-term sleep thing I mentioned was from actually a power plan setting that automatically hibernates the system after a period of time in sleep. Because this is the only real outlier in the situation, I think hibernation somehow messed up my GPU's power state. In other words, the GPU was "sleepy" and did not work hard enough, among other possible issues, leading to the error. Of course, I could be wrong on this but it does seem like a compelling theory.

But as for why the .dmp file implies a GPU reset was successful, when clearly it was not and my system force-rebooted? That's still a mystery. Answering that will probably require someone with experience with the Windows TDR process, who could theorize what made Windows give this misleading live dump instead of something more accurate like "VIDEO_TDR_FAILURE."
 
If you're not willing to provide us the information that was requested in the posting instructions, then there is not much we can do to answer your questions.

We have no idea of your level of expertise when it comes to debugging a dump file and it costs both of us a lot more time and energy to ask back and forth if you could run a command, ask for the output to analyze and repeat.

You say you looked through it all, can't find an answer, then ask us for answers without providing the initially requested data. It makes me wonder what you expect us to do. I do understand that you don't want unnecessary data online, but by declining us the data we requested we can't help without giving generic suggestions you already tried or we know are useless in advance.
 
I'm closing this thread until a more fruitful discussion can happen. If you would like us to reopen this thread, then please provide the requested information as per the posting instructions.
 
Status
Not open for further replies.

Has Sysnative Forums helped you? Please consider donating to help us support the site!

Back
Top