So, you're interested in learning to solve BSODs? A satisfying goal, and there's good job security as there's an endless supply of BSOD threads.
To be a good BSOD analyst, you don't need deep technical knowledge of how Windows works (though it doesn't hurt!). You do need a good "technician's knowledge" of computers, as there's so much more to it than "what driver was blamed?". As often as not, hardware is the cause, and you should be proficient in that regard. Instructing OPs how to swap out RAM, change memory voltages, and spot PSU problems is SO much easier when you are familiar with the processes already.
Good surface knowledge of Windows is essential. What if that driver won't install right? What if Windows won't boot right? What if you suspect malware is the cause...do you know how to spot other signs of it? What if the OP wants to do a repair install but his DVD is giving him an error message? You could just farm stuff out, but it's better if you're capable of handling it all yourself.
Perhaps even more important is a desire to get to the bottom of the case, no matter what it is. Good BSOD analysts don't feel the need to stick to the "rules" of the game. They exercise complete liberty to post whatever they want in the thread, no matter how unorthodox it might be. Feel like turning the OP into a guinea pig? Go for it! Try new things, learn what doesn't work, and remember what did work for next time. And when you see a thread someone else has solved, spend the 30 seconds and find out what symptoms the the OP was having, and what the solution was.
Ready to proceed?
Start by installing Windbg from the Windows SDK:
Debugging Tools and Symbols: Getting Started
Once installed, associate .dmp files with Windbg by entering the following in a command prompt:
Code:
"C:\Program Files (x86)\Debugging Tools for Windows (x64)\Debuggers\x64\windbg.exe" -IA
If Windbg is installed in a different location, change the command accordingly. Just a heads-up, the -IA part is case sensitive. Confused the heck out of me when I first tried it, as most commands are not case sensitive.
When done, open a copy of Windbg, go to File > Symbol file path, and copy/paste:
Code:
SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
You can replace C:\symbols with any other path you'd like the symbol cache to be stored on. If you have a low-capacity SSD, be warned the folder can grow to a couple GBs.
After that, you can just double-click on the dmps and it will open. If a driver or program is the cause of the BSODs, it will usually show up in the Probably Caused By line.
Code:
Probably caused by: e1c62x64.sys
You can look up the drivers it blames here:
Driver Reference Table
A couple other tips:
If a Windows/system driver is blamed, it's not the real problem. Use your powers of reasoning: if tcpip.sys is blamed, perhaps the network adapter drivers are at fault?
You can use Driver Verifier to try to get 3rd-party drivers blamed:
Driver Verifier - BSOD related - Windows 10, 8.1, 8, 7 & Vista - Sysnative Forums
If Verifier_Enabled dumps continue to point to system drivers, hardware is most likely the cause. The most common cause is RAM, though CPU, motherboard, PSU, video card, hard drive, and sometimes some funky ones (monitor, USB devices) can also cause problems. I wrote up some tutorials to diagnostics we use often:
https://www.sysnative.com/forums/hardware-tutorials/3909-test-ram-with-memtest86.html
https://www.sysnative.com/forums/hardware-tutorials/3908-prime95-hardware-stress-testing.html
To get a list of the running drivers on the system at the time of the crash, run from Windbg:
Spend some time looking up those drivers on the Driver Reference Table until you can quickly glance down the list and pick out the 3rd-party ones. The Windows drivers are rarely of any consequence, but you should still know what they do. One word of warning, however: don't fall into the same pitfall all too many people do, and that is putting too much emphasis on the date of the driver. Is it true that older drivers can have compatibility problems, and should be updated, but few things that I see BSOD analysts doing irritate me more than lists of drivers to update. If a 3rd-party driver is the cause, 95% of the time it will be blamed directly.
I'd be a fool not to at least mention the
!analyze -v command. Try running that on a dump, see what kind of information it reveals.
PROCESS_NAME shows which process was running at the time of the crash; usually not enough to make any conclusions, but when taken from many dumps from the same system, may reveal some circumstantial evidence.
FAILURE_BUCKET_ID and
BUCKET ID can sometimes reveal culprit drivers that are not blamed in the Probably Caused By line.
And one last command I rarely see any other BSOD analysts on the volunteer forums using: the
!sysinfo commands.
!sysinfo machineid shows information about the motherboard and OEM.
!sysinfo smbios reveals a
wealth of information about the motherboard configuration. Want to know what size DIMMs are installed in which slots, and what speed they're running at? Give it a whirl! Or run the generic
!sysinfo command for a list of supported arguments and try them out.
Finally, we ask for a full BSOD report for a reason; dumps alone are often inadequate, and the problem can often be solved faster when you have access to other information. Digging deeper into the jcgriff2 report is beyond the scope of this "getting started" guide, but I encourage you to poke into it on your own.
- MSINFO32 is good for getting hardware information and a list of installed programs. Plus a bunch of other things.
- $systeminfo.txt overlaps with MSINFO32 a fair bit, which is nice when MSINFO32 is corrupted or missing. It also contains a list of installed Windows Updates, and the date the OS was installed.
- Event Logs are priceless for BSOD analysts, especially the System one ($evtx_sys_dump.txt). Tip: do a Find for keyword "Error". When no dumps are available, this becomes your #1 resource.
- $sys_list.txt and driverq_v.txt are both good for finding information on drivers; such as, which ones are loading, what their dates are, and where they are located.
Get to know what information you have access to. Once you do, you will no longer be content to simply use the dumps. I resigned from a Moderator position and left another forum once, among other reasons, but a major part was they didn't see the point in asking for the other info, and weren't on board with my attempts to get some instructions stickied.
That's the basic idea of what we do. As you go along, you'll have dozens (if not more!) of questions; by all means, post them below, or start a new thread in the BSOD Analysis forum.
Good luck!