I stand before you, humbled and defeated. In my 13 years as a sysadmin, unequivocally, this is the most challenging issue I've ever had to deal with...
I have 47 servers with file system corruptions. Mainly, these corruptions manifest as an inability to add/remove server roles, and/or an inability to install Windows CUs; they just throw an error. Unfortunately, there's a mixed bag of affected OSes: Server 2019 Standard, Server 2016 Standard, Server 2012 R2 Standard, and Server 2012 Standard. I've been troubleshooting intermittently for nearly one year, and it's been two years since I first realized I had a problem.
Here's a rundown of what's going on...
This is when I resorted to an in-place upgrade (a repair install), per Microsoft's suggestion. I simply mounted the .ISO and re-installed Windows. This was excruciatingly painful. Sure, in Windows 10 land it's fine, but on a server? Forget about it.
Exchange blows up. SharePoint blows up. Essentials (role) blows up. ADCS blows up. QuickBooks blows up. Hyper-V NIC teams blow up, SQL Server blows up, backup VSS writers blow up — I could go on. Needless to say, we have to dedicate an entire week of "maintenance" per client to repair their servers. Clients don't like this, and I'm aging at an accelerated rate from the resulting stress. It's like blowing up a house and rebuilding it, just to fix a slanted foundation. I can't continue doing this. If I have to run in-place upgrades on 47 more servers, I think I might quit my job instead...
So, I kept troubleshooting on my own.
And that's where I'm at, folks. I've been seeing a ton of helpful information on this forum specifically, but I see that it's all customized for the OP. So, I finally decided to become an OP. :)
I'd deeply appreciate any assistance that anyone can lend me!!
edited to remove an accidental emoji
I have 47 servers with file system corruptions. Mainly, these corruptions manifest as an inability to add/remove server roles, and/or an inability to install Windows CUs; they just throw an error. Unfortunately, there's a mixed bag of affected OSes: Server 2019 Standard, Server 2016 Standard, Server 2012 R2 Standard, and Server 2012 Standard. I've been troubleshooting intermittently for nearly one year, and it's been two years since I first realized I had a problem.
Here's a rundown of what's going on...
- "SFC /SCANNOW" completes with the message "Windows Resource Protection found corrupt files but was unable to fix some of them."
- "DISM /Online /Cleanup-Image /CheckHealth" completes with the message "No component store corruption detected."
- "DISM /Online /Cleanup-Image /ScanHealth" completes with the message "No component store corruption detected."
- "DISM /Online /Cleanup-Image /RestoreHealth" completes with the message "The source files could not be downloaded."
- "DISM /Online /Cleanup-Image /RestoreHealth /Source:WIM:F:\Sources\Install.WIM:2 /LimitAccess" completes with the message "The source files could not be downloaded."
This is when I resorted to an in-place upgrade (a repair install), per Microsoft's suggestion. I simply mounted the .ISO and re-installed Windows. This was excruciatingly painful. Sure, in Windows 10 land it's fine, but on a server? Forget about it.
Exchange blows up. SharePoint blows up. Essentials (role) blows up. ADCS blows up. QuickBooks blows up. Hyper-V NIC teams blow up, SQL Server blows up, backup VSS writers blow up — I could go on. Needless to say, we have to dedicate an entire week of "maintenance" per client to repair their servers. Clients don't like this, and I'm aging at an accelerated rate from the resulting stress. It's like blowing up a house and rebuilding it, just to fix a slanted foundation. I can't continue doing this. If I have to run in-place upgrades on 47 more servers, I think I might quit my job instead...
So, I kept troubleshooting on my own.
- I looked at the build number of a random corrupted server running Server 2019 Standard.
- I looked up the corresponding monthly CU for this build number.
- I created a Windows Server 2019 "dummy' guest VM in Hyper-V and patched it up until the specified CU (I didn't fully patch it).
- I booted this "dummy" VM into Windows PE, and used DISKPART to assign the primary OS volume as S:.
- and I ran "DISM /Capture-Image /ImageFile:S:\REPAIR-IMAGE-SERVER2019.WIM /CaptureDir:S:\ /Name:"Windows Server 2019 Standard Repair Image"". This completed successfully. I now have a .WIM file for a system with a build number that is identical to one of the corrupted 2019 servers.
- I copied the aforementioned .WIM file to C:\Temp on the target corrupted server.
- I ran "DISM /Online /Cleanup-Image /RestoreHealth /Source:WIM:C:\Temp\REPAIR-IMAGE-SERVER2019.WIM:1 /LimitAccess". This completed with the message "The source files could not be downloaded."
- I fully patched the "dummy" guest VM to the latest CU. That's 2022-07 as of the time I'm writing this.
- I looked for that same corrupted component in C:\Windows\WinSxS. Ah-hah! It's there now! Incredible. I am really making progress here.
- I checked all of the other corrupted components referenced in CBS.log. They all exist on my fully patched repair .WIM.
- I re-ran "DISM /Capture-Image /ImageFile:S:\REPAIR-IMAGE-SERVER2019.WIM /CaptureDir:S:\ /Name:"Windows Server 2019 Standard Repair Image"", and moved that .WIM to the corrupted server.
- I re-ran "DISM /Online /Cleanup-Image /RestoreHealth /Source:WIM:C:\Temp\REPAIR-IMAGE-SERVER2019.WIM:1 /LimitAccess". This completed with the message "The operation completed successully."
- I reviewed the previously missing components in C:\Windows\WinSxS, and they now exist!
- I re-ran "DISM /Online /Cleanup-Image /RestoreHealth" and it completed successfully.
- I re-ran "SFC /SCANNOW" and it confirmed that no integrity violations exist. "Marvelous", I thought. "I've done it."
- I ran "DISM /Online /Cleanup-Image /AnalyzeComponentStore", and it returned the message "16 reclaimable packages exist", and it recommended that I reboot. I did. It proceeded to boot-loop again, hung at "Applying updates... 100%".
- I ran "DISM /Online /Cleanup-Image /AnalyzeComponentStore" again, and it said that the files could not be found, and it recommended that I reboot again. I did. It proceeded to boot-loop again, hung at "Applying updates... 100%".
- I wasn't able to recover from this. Eventually I ran "DISM /Online /Cleanup-Image /RevertPendingActions" and it stopped boot looping. But, the 2022-07 CU doesn't show as installed, yet the "winver" build number is the 2022-07 build.
- I manually installed the 2022-07 CU from the Microsoft Catalog. It's been stuck for about 12 hours now.
And that's where I'm at, folks. I've been seeing a ton of helpful information on this forum specifically, but I see that it's all customized for the OP. So, I finally decided to become an OP. :)
I'd deeply appreciate any assistance that anyone can lend me!!
edited to remove an accidental emoji