WHEA error for a MCA fault

Capt.Jack Sparrow

BSOD Kernel Dump Expert
Joined
Apr 16, 2012
Posts
110
I was analyzing a 0x124 bugcheck

Code:
WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon.
Arguments:
Arg1: 0000000000000000, Machine Check Exception
Arg2: fffffa800cfe8028, Address of the WHEA_ERROR_RECORD structure.
Arg3: 00000000fa000000, High order 32-bits of the MCi_STATUS value.
Arg4: 0000000000400405, Low order 32-bits of the MCi_STATUS value.

Debugging Details:
------------------


BUGCHECK_STR:  0x124_GenuineIntel

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

PROCESS_NAME:  System

CURRENT_IRQL:  f

STACK_TEXT:  
fffff880`0391db58 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KeBugCheckEx


STACK_COMMAND:  kb

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: hardware

IMAGE_NAME:  hardware

DEBUG_FLR_IMAGE_TIMESTAMP:  0

FAILURE_BUCKET_ID:  X64_0x124_GenuineIntel_PROCESSOR_MAE

BUCKET_ID:  X64_0x124_GenuineIntel_PROCESSOR_MAE

Followup: MachineOwner

I looked at WHEA error record

Code:
6: kd> !errrec fffffa800cfe8028
===============================================================================
Common Platform Error Record @ fffffa800cfe8028
-------------------------------------------------------------------------------
Record Id     : 01cd6ebe45bc0523
Severity      : Fatal (1)
Length        : 928
Creator       : Microsoft
Notify Type   : Machine Check Exception
Timestamp     : 7/31/2012 2:07:01
Flags         : 0x00000000

===============================================================================
Section 0     : Processor Generic
-------------------------------------------------------------------------------
Descriptor    @ fffffa800cfe80a8
Section       @ fffffa800cfe8180
Offset        : 344
Length        : 192
Flags         : 0x00000001 Primary
Severity      : Fatal

Proc. Type    : x86/x64
Instr. Set    : x64
Error Type    : [COLOR=#ff0000]Micro-Architectural Error[/COLOR]
Flags         : 0x00
CPU Version   : 0x00000000000106a5
Processor ID  : 0x0000000000000006

===============================================================================
Section 1     : x86/x64 Processor Specific
-------------------------------------------------------------------------------
Descriptor    @ fffffa800cfe80f0
Section       @ fffffa800cfe8240
Offset        : 536
Length        : 128
Flags         : 0x00000000
Severity      : Fatal

Local APIC Id : 0x0000000000000006
CPU Id        : a5 06 01 00 00 08 10 06 - bd e3 98 00 ff fb eb bf
                00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
                00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00

Proc. Info 0  @ fffffa800cfe8240

===============================================================================
Section 2     : x86/x64 MCA
-------------------------------------------------------------------------------
Descriptor    @ fffffa800cfe8138
Section       @ fffffa800cfe82c0
Offset        : 664
Length        : 264
Flags         : 0x00000000
Severity      : Fatal

Error         : [COLOR=#ff0000]Internal unclassified (Proc 6 Bank 5)[/COLOR]
  Status      : 0xfa00000000400405

I'm not an expert but I was following http://blogs.msdn.com/b/ntdebugging...nterpreting-a-whea-error-for-a-mca-fault.aspx but I got stuck at !smt command it does not bring PRCB for processor. So my question is does it not display the info in Minidump? I know there is not a lot we can do for Hardware but I'm just curious.

View attachment 073012-8595-01.zip
 
Minidumps don't really preserve much of the PRCB, if any portion of it. It's one of the big reasons why 0x101 bugchecks are impossible to debug with a minidump, because the processor contexts are not saved, which includes the current running thread for each, the PRCBs (and therefore the PCRs), etc. The reason why the WHEA error record structure is able to hold so much information about the situation, including processor info, is because it's all generated prior to crashdump generation. It has already grabbed what it wanted from the PRCB and whatnot to generate the contents in its structure before finally the system creates the crashdump and bluescreens.
 
What you can do is parse the MCi_Status code (explained in last portion of article you linked) as sometimes that can give better hints than what WHEA provides, but often times WHEA is sufficient in figuring it out and detailing the error for you. In this case it's an internal unclassified error, which means it's not publicly documented (which is ironic given it's "unclassified"). However, yes, it's clear you're dealing with an internal CPU fault and not one that could be generated by bus issues or any other external factor other than heat and voltage. Though make dead sure motherboard software isn't interfering; sometimes mobo software will generate CPU faults of this sort because of bugs in their CPU drivers.

Other than that, no, there's really nothing else you can retrieve from this.
 

Has Sysnative Forums helped you? Please consider donating to help us support the site!

Back
Top