[SOLVED] Integrity Violations Found That Cannot Be Repaired (SFC Errors on 47 Servers)

soj4trade

Well-known member
Joined
Jul 14, 2022
Posts
133
Location
US East
I stand before you, humbled and defeated. In my 13 years as a sysadmin, unequivocally, this is the most challenging issue I've ever had to deal with...

I have 47 servers with file system corruptions. Mainly, these corruptions manifest as an inability to add/remove server roles, and/or an inability to install Windows CUs; they just throw an error. Unfortunately, there's a mixed bag of affected OSes: Server 2019 Standard, Server 2016 Standard, Server 2012 R2 Standard, and Server 2012 Standard. I've been troubleshooting intermittently for nearly one year, and it's been two years since I first realized I had a problem.

Here's a rundown of what's going on...
  • "SFC /SCANNOW" completes with the message "Windows Resource Protection found corrupt files but was unable to fix some of them."
  • "DISM /Online /Cleanup-Image /CheckHealth" completes with the message "No component store corruption detected."
  • "DISM /Online /Cleanup-Image /ScanHealth" completes with the message "No component store corruption detected."
  • "DISM /Online /Cleanup-Image /RestoreHealth" completes with the message "The source files could not be downloaded."
From this point, I began attempting repairs with the "Source" option. I targeted the .WIM file (with index specified) in a vanilla .ISO (from Microsoft Volume Licensing Service Center). I was puzzled when even this returned the error "The source files could not be downloaded.":
  • "DISM /Online /Cleanup-Image /RestoreHealth /Source:WIM:F:\Sources\Install.WIM:2 /LimitAccess" completes with the message "The source files could not be downloaded."
At this point, I paid Microsoft $500 (three separate times, totaling $1,500) to help me troubleshoot. We reviewed DISM.log and CBS.log. This was the first time I was able to see how many corruptions actually existed, and what they are. The total number of corruptions on each server ranges from 10 to 1,500. All three times, Microsoft told me that I needed to repair Windows by running an in-place upgrade. They bailed.

This is when I resorted to an in-place upgrade (a repair install), per Microsoft's suggestion. I simply mounted the .ISO and re-installed Windows. This was excruciatingly painful. Sure, in Windows 10 land it's fine, but on a server? Forget about it.

Exchange blows up. SharePoint blows up. Essentials (role) blows up. ADCS blows up. QuickBooks blows up. Hyper-V NIC teams blow up, SQL Server blows up, backup VSS writers blow up — I could go on. Needless to say, we have to dedicate an entire week of "maintenance" per client to repair their servers. Clients don't like this, and I'm aging at an accelerated rate from the resulting stress. It's like blowing up a house and rebuilding it, just to fix a slanted foundation. I can't continue doing this. If I have to run in-place upgrades on 47 more servers, I think I might quit my job instead...

So, I kept troubleshooting on my own.
  • I looked at the build number of a random corrupted server running Server 2019 Standard.
  • I looked up the corresponding monthly CU for this build number.
  • I created a Windows Server 2019 "dummy' guest VM in Hyper-V and patched it up until the specified CU (I didn't fully patch it).
  • I booted this "dummy" VM into Windows PE, and used DISKPART to assign the primary OS volume as S:.
  • and I ran "DISM /Capture-Image /ImageFile:S:\REPAIR-IMAGE-SERVER2019.WIM /CaptureDir:S:\ /Name:"Windows Server 2019 Standard Repair Image"". This completed successfully. I now have a .WIM file for a system with a build number that is identical to one of the corrupted 2019 servers.
  • I copied the aforementioned .WIM file to C:\Temp on the target corrupted server.
  • I ran "DISM /Online /Cleanup-Image /RestoreHealth /Source:WIM:C:\Temp\REPAIR-IMAGE-SERVER2019.WIM:1 /LimitAccess". This completed with the message "The source files could not be downloaded."
Argh. At this point, I reviewed CBS.log and confirmed which "components" are corrupted. I searched C:\Windows\WinSxS for a corrupted component referenced in CBS.log. Ah-hah! It doesn't exist!! Next, I checked my repair .WIM for that same corrupted component in C:\Windows\WinSxS. It also doesn't exist on the repair .WIM. This is why it's saying "the source files could not be downloaded" — because the repair .WIM is missing the files that the corrupt system is looking for! So, I kept going...
  • I fully patched the "dummy" guest VM to the latest CU. That's 2022-07 as of the time I'm writing this.
  • I looked for that same corrupted component in C:\Windows\WinSxS. Ah-hah! It's there now! Incredible. I am really making progress here.
  • I checked all of the other corrupted components referenced in CBS.log. They all exist on my fully patched repair .WIM.
  • I re-ran "DISM /Capture-Image /ImageFile:S:\REPAIR-IMAGE-SERVER2019.WIM /CaptureDir:S:\ /Name:"Windows Server 2019 Standard Repair Image"", and moved that .WIM to the corrupted server.
  • I re-ran "DISM /Online /Cleanup-Image /RestoreHealth /Source:WIM:C:\Temp\REPAIR-IMAGE-SERVER2019.WIM:1 /LimitAccess". This completed with the message "The operation completed successully."
  • I reviewed the previously missing components in C:\Windows\WinSxS, and they now exist!
  • I re-ran "DISM /Online /Cleanup-Image /RestoreHealth" and it completed successfully.
  • I re-ran "SFC /SCANNOW" and it confirmed that no integrity violations exist. "Marvelous", I thought. "I've done it."
So, I rebooted, and then installed the latest CU (2022-07), which was previously failing due to corruptions. It took all night. When I got in the next day, Windows was hung at the "Applying updates... 100%" screen. I let it sit the entire day. It rebooted itself several times, but never made any progress. Eventually, I killed the "TrustedInstaller" task from a remote shell. This ended the process and I was able to log into Windows. DISM and SFC both still report no issues, and the 2022-07 CU shows as installed. The build number is current. Seems too good to be true. So:
  • I ran "DISM /Online /Cleanup-Image /AnalyzeComponentStore", and it returned the message "16 reclaimable packages exist", and it recommended that I reboot. I did. It proceeded to boot-loop again, hung at "Applying updates... 100%".
  • I ran "DISM /Online /Cleanup-Image /AnalyzeComponentStore" again, and it said that the files could not be found, and it recommended that I reboot again. I did. It proceeded to boot-loop again, hung at "Applying updates... 100%".
  • I wasn't able to recover from this. Eventually I ran "DISM /Online /Cleanup-Image /RevertPendingActions" and it stopped boot looping. But, the 2022-07 CU doesn't show as installed, yet the "winver" build number is the 2022-07 build.
  • I manually installed the 2022-07 CU from the Microsoft Catalog. It's been stuck for about 12 hours now.
Mind you, this is just one server... There's 46 left to go.

And that's where I'm at, folks. I've been seeing a ton of helpful information on this forum specifically, but I see that it's all customized for the OP. So, I finally decided to become an OP. :)

I'd deeply appreciate any assistance that anyone can lend me!!


edited to remove an accidental emoji
 
My deepest apologies... here's my most recent CBS.log file. I'm more than happy to get you another if needed.
 

Attachments

Thank you! Also as a heads up to anyone else who might find themselves in a similar situation, to resolve the boot looping issue described above, I had to use the "DISM /Online /Cleanup-Image /StartComponentCleanup /ResetBase" command. This seems to have done the trick on that one server, at least.
 

Has Sysnative Forums helped you? Please consider donating to help us support the site!

Back
Top