Not to beat a dead horse, but I've been reading about WHEA.
And I come to find out that PSHED.dll is a central part of it. And, that it functions similarly (IMO) to hal.dll in that it isolates the hardware from the error reporting mechanisms. Hal.dll isolates the hardware from the operating system.
Some further info (from the document "Windows Hardware Error Architecture dated 23 May 2006).....
The Low Level Hardware Error Handlers (LLHEH) may exist in the kernel or the HAL (may exist in other places also)
WHEA is notified by the LLHEH of hardware errors
3 types of errors (very generic descriptions):
- Corrected Errors (fixed before Windows gets notified and then Windows generates an ETW event)
- Uncorrected but Recoverable Errors (fixed by Windows then it generates an ETW event)
- Uncorrected Fatal Errors (Windows generates an ETW event and then bugchecks the system)
(I wonder if STOP 0x117 errors are Corrected or Uncorrected but Recoverable? I suspect that they're Corrected because I seem to recall reading that the driver resets itself when it sees that it's needed.)
WHEA gives 3rd party vendors the opportunity to add additional info before the ETW event is generated
Both the LLHEH and Windows draw on the services of the Platform-Specific Hardware Error Driver (PSHED)
Platform vendors can add PSHED plug-ins. A PSHED plug-in is a "special-purpose Windows device driver" that establishes an additional call-back for PSHED. The purpose of the plug-in is to augment or override the behavior of PSHED.
Both PSHED and the plug-ins are able to interface directly with the device's firmware.
Referring to BSOD crashes:
The PSHED provides an interface through which the operating system can store and retrieve a hardware error record so that the error information is preserved during the system restart
So, it appears possible that the some things could happen to make PSHED show up "in error" in the memory dump analysis:
- that the error reporting mechanism is corrupted during the crash - making PSHED appear as non-valid for the current symbols. In other words, that the dump is slightly messed up because the PSHED doesn't maintain the error information throughout the reboot.
- that a plug-in for PSHED isn't working properly - making PSHED not valid for the current symbols. In other words, PSHED is corrupted by the plug-in.
Anyone know how to locate these plug-ins? Or to see if they are implemented in a system?