quick sanity check on CacheManager BSOD

keithmsn

Member
Joined
Aug 30, 2024
Posts
5
I've never analyzed a .DMP file before. Installed windbg, ran the analysis, this is what I see:

WinDBG CACHEMANAGER BSOD ANALYSIS - Pastebin.com

With hopefully the smoking gun line:

*** WARNING: Check Image - Checksum mismatch - Dump: 0x208038, File: 0x208034 - C:\ProgramData\Dbg\sym\BTHport.sys\2AFD096C202000\BTHport.sys

Blue screens started soon after I updated/installed some bluetooth drivers about a month ago, which I didn't make the connection until I saw this line. The trigger for the blue screen was something I thought pretty crazy.... running a Python script that I wrote that doesn't do very much. I could trigger the blue screen 50% of the time just by repeatedly executing the script. On the second or third run, it would blue screen. Makes me wonder if some bluetooth support within Python (is it related/called RFCOMM?) wasn't triggering a bluetooth driver, even though my script was simply accessing/processing some small binary files.

I've since uninstalled everything bluetooth I could find in device manager, updated my bluetooth driver (RZ616?) with the latest ones I could find, and then disabled bluetooth. I can't trigger the BSOD any longer. The blue screens weren't happening randomly which was nice for once. :)

I'll run through the whole data collection thing if necessary, but sure seems like I've already fixed it. The coincidence of BTHport.sys showing up in the windbg analysis, then me reinstalling bluetooth drivers, and then no longer able to reproduce it would be too much.

Appreciate the quick look.
Thanks
 
If you really want help then please follow the BSOD Posting Instructions.

Based only on what I've seen so far I would suggest a RAM test to start with....
  1. Download Memtest86 (free), use the imageUSB.exe tool extracted from the download to make a bootable USB drive containing Memtest86 (1GB is plenty big enough). Do this on a different PC if you can, because you can't fully trust yours at the moment.
  2. Then boot that USB drive on your PC, Memtest86 will start running as soon as it boots.
  3. If no errors have been found after the four iterations of the 13 different tests that the free version does, then restart Memtest86 and do another four iterations. Even a single bit error is a failure.
 
hey @ubuysa thank you for your help.

Running a custom python script I wrote often induces a CACHEMANAGER blue screen. It doesn't do very much --- opens a file, parses some fields, writes some files out. It doesn't use a bunch of RAM or CPU. I've tried many things --- updating windows, updating my graphics drivers, updating my bluetooth drivers since I thought that was related. My RAM was running at 6400mhz. When I ran memtest recently, it was failing. Moved the RAM down to 5800 and it is stable, running full memtest with no errors.

I built this desktop machine. Here's the build list: https://pcpartpicker.com/list/ZXXNRK
RAM is in A2+B2.

Windows 11 x64. This was the only OS installed on this machine. This is retail copy of the OS.

Machine is about 1 year old. OS was installed at the time of build.

Driver verifier is now enabled.

No security software beyond Windows 11 defaults.

I'm not overclocking beyond the RAM. AMD Ryzen Master allows you to overclock, but I am not using anything beyond the defaults.

http://speccy.piriform.com/results/wzFNG6o4vKw2QmJn1jMYsIW
 

Attachments

A few words about memory:

The RAM, which was on the mobo QVL, is rated for 6400. I enabled the XMP profile, which auto set it to 6400. When I first built the machine I ran a variety of memory tests, stability tests, PRIME95, OCCT, and so on. Everything was stable, ran fine, etc. I've used the machine for almost a full year in some demanding situations with tons of apps loaded, all cpu loads, a AAA title or two. Zero blue screens for the year.

When I blue screened the first time a month or so ago, I ran a memtest, and at 6400, the test failed immediately. So something has changed. I moved it to 5800 and re-ran my tests, multiple times. Passed just fine.

This message on gigabyte's site is not lost on me:

* When running EXPO/XMP at DDR5-5200 or higher, the system's stability may vary by AMD processor and memory module's margin of capabilities.

and I mostly get it -- YMMV if you're running faster. But I think I did a pretty good job of establishing that the configuration was stable.....

I set the memory to the JEDEC default of 4800.... rebooted..... and blue screened within the first 10 minutes.

G.SKILL support is offering a return under warranty. But I don't want to go swapping out hardware if this is a driver/windows/software problem.
 
This is a problem I've seen before. If you look at the detailed spec for your Ryzen 9 7950X you'll see that the fastest DDR5 RAM it supports is rated at 5200MHz (MT/s). That's the maximum speed that AMD guarantee the CPU will accept, it's the warrantied speed. Most CPUs will accept RAM transfer rates higher than this, but there is no guarantee and in your case it would seem that 6400 MHz (MT/s) is beyond the capabilities of that particular CPU.

That it's stable at 5800 MHz is a bonus because even that is faster that the CPU is warrantied for. It's actually all working as designed. Just don't go beyond 5800MHz, if you do start to get issues at that speed then drop it to 5200MHz. Any issue at 5200MHz is a genuine problem.
 
Yeah it's good that I've got it stable at 5800.... but at 4800 I'm still blue screening. So it will be helpful to figure out why.

Thanks
 
Ah OK.

I just looked at the two dumps from 8th Sept, your RAM was running at 5800MHz in both dumps. If you've had BSODs at 4800MHz can you please run the Sysnatiove file collector app again and upload a new output file.

In the two 8th Sept dumps, both fail due to a CACHE_MANAGER bugcheck and in both dumps Acronis True Image drivers were referenced in the lead-up to the bugchecks. The three drivers referenced were...
  • snapman.sys
  • fltsrv.sys
  • file_protector.sys
The versions you have installed all date from either April or June this year so they're unlikely to be out of date. If you can I would suggest completely uninstalling Acronis True Image and rebooting. Then see whether the BSODs cease.
 
Thanks @ubuysa for the continued help.

I uninstalled True Image and I'm just waiting it out..... no more blue screens by running my script.

Something occurred to me today as a pet theory, and I'd love to know your thoughts on plausibility.

I am writing an open source tool that restores files from a proprietary backup software written in the 1990s. The backup set includes a file catalog, which defines the directory hierarchy. One of the benefits my tool has over just emulating the original is that it can extract files with no catalog entries at all. I recently added catalog entry processing, however, but since corrupt catalogs contain potentially garbage directory names, my software could end up attempting to create those bad names.

so hear me out....

I wasn't super concerned because the OS has good protection against illegal file names, and just throws an error.

BUT.... what if acronis true image and their file protection (malware detection, antivirus, etc) drivers that intercept OS calls can't handle the binary-infused names.... and blue screen as a result.

Literally the only time I've blue screened is when I run my custom python script..... and it's ALMOST deterministic. I can almost crash-on-demand.
 
Well.....if your script no longer BSODs without Acronis I think your theory there may be a good one,
 

Has Sysnative Forums helped you? Please consider donating to help us support the site!

Back
Top