What happens when you run Prime95 on In-Place Large FFTs? You should try it. Small FFTs and Blend focus primarily on RAM-to-CPU crosstalk. Large FFTs centralizes usage on just internal CPU caches. If Large FFTs seems to be very stable but Small/Blend is not, it is more likely to be a motherboard failure than a CPU. It can still be the CPU, but not as much.
I did some perusing of the Intel
Software Developer's Manual Volume 3:System Programming Guide, because I was curious about your WHEA error, which is below:
Code:
4: kd> !errrec fffffa800a4be458
===============================================================================
Common Platform Error Record @ fffffa800a4be458
-------------------------------------------------------------------------------
Record Id : 01cdec3edb862ec7
Severity : Fatal (1)
Length : 928
Creator : Microsoft
Notify Type : Machine Check Exception
Timestamp : 1/6/2013 18:51:41 (UTC)
Flags : 0x00000002 PreviousError
===============================================================================
Section 0 : Processor Generic
-------------------------------------------------------------------------------
Descriptor @ fffffa800a4be4d8
Section @ fffffa800a4be5b0
Offset : 344
Length : 192
Flags : 0x00000001 Primary
Severity : Fatal
Proc. Type : x86/x64
Instr. Set : x64
Error Type : BUS error
Operation : Generic
Flags : 0x00
Level : 3
CPU Version : 0x0000000000600f20
Processor ID : 0x0000000000000000
===============================================================================
Section 1 : x86/x64 Processor Specific
-------------------------------------------------------------------------------
Descriptor @ fffffa800a4be520
Section @ fffffa800a4be670
Offset : 536
Length : 128
Flags : 0x00000000
Severity : Fatal
Local APIC Id : 0x0000000000000000
CPU Id : 20 0f 60 00 00 08 08 00 - 0b 32 98 3e ff fb 8b 17
00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
Proc. Info 0 @ fffffa800a4be670
===============================================================================
Section 2 : x86/x64 MCA
-------------------------------------------------------------------------------
Descriptor @ fffffa800a4be568
Section @ fffffa800a4be6f0
Offset : 664
Length : 264
Flags : 0x00000000
Severity : Fatal
Error : BUSLG_OBS_ERR_*_NOTIMEOUT_ERR (Proc 0 Bank 4)
Status : 0xba00001000020c0f
I can already interpret parts of it. It obviously involves the BUS for the generic cache (typically RAM, hence 'G' in 'LG' for Generic, much like L1, L2 cache designations), and it was not a timeout error. However, the
OBS puzzled me. That's where I ended up finding about it in the Intel CPU Dev Manual - which, btw, while it this system uses an AMD CPU, Machine Checks are commonly standard - and it was under section 15.9.2.5
Bus and Interconnect Errors. This section of the error code defines the actual participation the CPU had involved in the request that errored.
Typically you'll see this part omitted in the error printout or it'll label it as
GENERIC, but when it's not it's good to know where the bad request was coming from. The table shows that if the mnemonic is
SRC, then the local processor originated the request. If it's
RES, then it was responding too one. However, in our case, it's
OBS, in that it's observing the error as a third party. As in, there was communication where the processor was neither the end recipient nor the original requester, but was involved somewhere inbetween. Likely we're dealing with some kind of hardware I/O that the CPU was just involved with. That gives more credence to the idea that it's not the CPU itself at fault here, but some other system component, most likely the motherboard, since the error did not originate from the CPU to start with but that the CPU just picked up the error and is notifying the system.