Page 2 of 3 First 123 Last
  1. #21
    satrow's Avatar
    Join Date
    Apr 2012
    Location
    Cymru
    Posts
    748
    • specs System Specs
      • Motherboard:
        ASRock Z77E-ITX
      • CPU:
        E3-1230 V2 3.3GHz
      • Memory:
        16GB G.Skill DDR3 2400
      • Graphics:
        Asus GTX1060
      • Sound Card:
        Onboard
      • Hard Drives:
        3x250GB SSDs, 2x 2.5 1TB HDD JBOD
      • Power Supply:
        Seasonic 360W Gold
      • Case:
        BitFenix Prodigy Black
      • Cooling:
        Be Quiet Shadow Rock Topflow + 2x case fans
      • Display:
        Dell U2412M 1900x1200 x2 (sometimes x3)
      • Operating System:
        W7 x64 Pro

    Re: PCI-E WHEA errors (0x124)

    All Dump Files.rar
    Quote Originally Posted by Vir Gnarus View Post
    Can you attach them to your post directly? They are small enough that zipping them should work. I cannot access them due to firewall restrictions against that site.
    Attached.


    • Ad Bot

      advertising
      Beep.

        
       

  2. #22

    Join Date
    Mar 2012
    Posts
    469

    Re: PCI-E WHEA errors (0x124)

    Thanks satrow. I am looking through them. So far I haven't found anything definitive, but I'm leaning on CPU/Mobo. I'll explain things later as I garner more info from these.

    Time to do some brute force hardware tests:

    RAM: Memtest86+ - 7+ passes
    CPU: Prime95 - Torture Test; Large FFTs; overnight (9+ hours)
    GPU: MemtestCL - Run twice (if any of the tests work on your GPU; ATI cards will need to install the ATI APP SDK as it requires OpenCL)
    Drives: Seatools - All basic tests aside from the Fix all or the advanced ones.

    The ones you want to run first are CPU and GPU. All of these (excluding MemtestCL) are included in the UBCD if you prefer a Live CD environment (which is the best environment to test hardware on). Note that Prime95 currently does not work on the UBCD. Also, please provide us temps/voltages using HWInfo with Sensors only option checked. Log two 30-minute instances: one for idle, and one for high load.
    zigzag3143 says thanks for this.

  3. #23

    Re: PCI-E WHEA errors (0x124)

    Quote Originally Posted by Vir Gnarus View Post
    Thanks satrow. I am looking through them. So far I haven't found anything definitive, but I'm leaning on CPU/Mobo. I'll explain things later as I garner more info from these.

    Time to do some brute force hardware tests:

    RAM: Memtest86+ - 7+ passes
    CPU: Prime95 - Torture Test; Large FFTs; overnight (9+ hours)
    GPU: MemtestCL - Run twice (if any of the tests work on your GPU; ATI cards will need to install the ATI APP SDK as it requires OpenCL)
    Drives: Seatools - All basic tests aside from the Fix all or the advanced ones.

    The ones you want to run first are CPU and GPU. All of these (excluding MemtestCL) are included in the UBCD if you prefer a Live CD environment (which is the best environment to test hardware on). Note that Prime95 currently does not work on the UBCD. Also, please provide us temps/voltages using HWInfo with Sensors only option checked. Log two 30-minute instances: one for idle, and one for high load.
    Alright, two days ago I got fed up with it all and decided to take every piece of the computer apart. I bought a de-dusting can to ensure that I could de-dust every cm^2 of the machine. I put everything back together and it hasn't crashed in two days. I will note that I pulled out the graphics card, there was an incredible amount of dust inside the actual PCI-E slot. There was also a ton of dust in the same area where a red light would activative on my Radeon 6980. I looked with a magnifying glass and the red light was marked D1000. After looking for documentation on what the D1000 red LED was supposed to signify....I found nothing.

    But yes, for the past two days it hasn't crashed since the de-dusting event.


    I'm going to get some mcndonalds now.

  4. #24

    Join Date
    Mar 2012
    Posts
    469

    Re: PCI-E WHEA errors (0x124)

    Did the red LED continue to turn on after you cleaned everything up? I personally tried to look up any official documentation on it but nothing comes up. Everything on google is where people reported that the light came on when their graphics card was suffering issues.

    It's good to know that things are stable now after dusting. I would advise that you still be vigilant, however, given that the cleaning may not actually resolve the issue but only alleviate it. The last time I diagnosed a PCI-E WHEA error from someone, they thought dusting fixed the problem too, until the crashes creeped up again, only not nearly as frequently as prior to the cleanup work. Of course, if everything works great after this, then wonderful. I just want you to be aware of this as you continue to use the system.

    I'll forgo looking at the crashdumps any further. If it was dust afterall, than no amount of debugging with WinDBG will help one discover that.

  5. #25

    Re: PCI-E WHEA errors (0x124)

    Quote Originally Posted by Vir Gnarus View Post
    Did the red LED continue to turn on after you cleaned everything up? I personally tried to look up any official documentation on it but nothing comes up. Everything on google is where people reported that the light came on when their graphics card was suffering issues.

    It's good to know that things are stable now after dusting. I would advise that you still be vigilant, however, given that the cleaning may not actually resolve the issue but only alleviate it. The last time I diagnosed a PCI-E WHEA error from someone, they thought dusting fixed the problem too, until the crashes creeped up again, only not nearly as frequently as prior to the cleanup work. Of course, if everything works great after this, then wonderful. I just want you to be aware of this as you continue to use the system.

    I'll forgo looking at the crashdumps any further. If it was dust afterall, than no amount of debugging with WinDBG will help one discover that.

    Sorry completely forgot to update you on this Vir Gnarus. My bad.

    No, the D1000 led on the GPU did not activate after I had cleaned everything out. (And I mean not just blowing it out, but taking the tip of a paper-towel and getting rid of all the dust carefully). Going on the fourth day now (straight) and no crashes thus far.

  6. #26

    Join Date
    Mar 2012
    Posts
    469

    Re: PCI-E WHEA errors (0x124)

    Sounds like a winner. Gave me the chance to dive into the PCI-E bus details, so thanks for that. It's unfortunate that it didn't lead to any solid answers, but given that this was caused by dust, there would be no way it could expose such cause. I should've recommended cleaning things up, but that was an oversight on my part, I apologize.

    I'm glad to hear everything's workin fine now. I'll retain this for future reference for others wishing to diagnose PCI/PCI-E predicaments.

  7. #27

    Re: PCI-E WHEA errors (0x124)

    Hi all,

    Ive been reading this thread and it has been helpful.

    I have a similar problem but dont know how to solve it eventhough I think I know where the source of the prob.

    It started after I resinstalled Windows 7 on my new SSD. The biggest problem is that I didnt have any windows 7 installer.

    So now I have two windows 7, 1 on my old HD and one on my new HD (SSD). The old one works fine with ocassional crash that doesnt have a bluescreen (im guessing its heat).

    My new one however is very annoying , almost everytime I plug in a hard disk in a particular USB PORT< the OS crashes with a minidump as follows :

    Code:
    Microsoft (R) Windows Debugger Version 6.2.8400.4218 AMD64
    Copyright (c) Microsoft Corporation. All rights reserved.
    
    
    Loading Dump File [C:\Windows\Minidump\072812-15038-01.dmp]
    Mini Kernel Dump File: Only registers and stack trace are available
    
    Symbol search path is: SRV*C:\Program Files (x86)\Windows Kits\8.0\Symbols*http://msdl.microsoft.com/download/symbols
    Executable search path is: 
    Windows 7 Kernel Version 7601 (Service Pack 1) MP (8 procs) Free x64
    Product: WinNt, suite: TerminalServer SingleUserTS
    Built by: 7601.17835.amd64fre.win7sp1_gdr.120503-2030
    Machine Name:
    Kernel base = 0xfffff800`02e0b000 PsLoadedModuleList = 0xfffff800`0304f670
    Debug session time: Sat Jul 28 08:56:08.271 2012 (UTC + 8:00)
    System Uptime: 0 days 0:10:29.239
    Loading Kernel Symbols
    .
    SYMSRV:  c:\program files (x86)\windows  kits\8.0\symbols*http://msdl.microsoft.com/download/symbols  needs a downstream store
    ..............................................................
    ................................................................
    .......................
    Loading User Symbols
    Loading unloaded module list
    .......
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************
    
    Use !analyze -v to get detailed debugging information.
    
    BugCheck 124, {4, fffffa800eb65038, 0, 0}
    
    *** WARNING: Unable to verify timestamp for win32k.sys
    *** ERROR: Module load completed but symbols could not be loaded for win32k.sys
    Probably caused by : GenuineIntel
    
    Followup: MachineOwner
    ---------
    
    7: kd> !analyze -v
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************
    
    WHEA_UNCORRECTABLE_ERROR (124)
    A fatal hardware error has occurred. Parameter 1 identifies the type of error
    source that reported the error. Parameter 2 holds the address of the
    WHEA_ERROR_RECORD structure that describes the error conditon.
    Arguments:
    Arg1: 0000000000000004, PCI Express Error
    Arg2: fffffa800eb65038, Address of the WHEA_ERROR_RECORD structure.
    Arg3: 0000000000000000
    Arg4: 0000000000000000
    
    Debugging Details:
    ------------------
    
    
    BUGCHECK_STR:  0x124_GenuineIntel
    
    CUSTOMER_CRASH_COUNT:  1
    
    DEFAULT_BUCKET_ID:  WIN7_DRIVER_FAULT
    
    PROCESS_NAME:  System
    
    CURRENT_IRQL:  7
    
    STACK_TEXT:  
    fffff880`031aea78 fffff800`03405a3b : 00000000`00000124 00000000`00000004 fffffa80`0eb65038 00000000`00000000 : nt!KeBugCheckEx
    fffff880`031aea80 fffff800`02f97b03 : 00000000`00000001  fffffa80`0e5680c0 00000000`00000000 fffffa80`0e567b60 :  hal!HalBugCheckSystem+0x1e3
    fffff880`031aeac0 fffff880`00f2fbcf : fffffa80`00000750  fffffa80`0e5680c0 00000000`00000001 fffffa80`0eb64ab0 :  nt!WheaReportHwError+0x263
    fffff880`031aeb20 fffff880`00f2f5f6 : 00000000`00000000  fffff880`031aec70 fffffa80`0ecfed80 fffffa80`0f7721a0 :  pci!ExpressRootPortAerInterruptRoutine+0x27f
    fffff880`031aeb80 fffff800`02e8601c : fffff880`03186180  00000000`ffffffff fffffa80`0ecfed80 fffff880`04198501 :  pci!ExpressRootPortInterruptRoutine+0x36
    fffff880`031aebf0 fffff800`02e81ea2 : fffff880`03186180  fffff880`00000002 00000000`00000002 fffff800`00000000 :  nt!KiInterruptDispatch+0x16c
    fffff880`031aed80 00000000`00000000 : 00000000`00000000  00000000`00000000 00000000`00000000 00000000`00000000 :  nt!KiIdleLoop+0x32
    
    
    STACK_COMMAND:  kb
    
    FOLLOWUP_NAME:  MachineOwner
    
    MODULE_NAME: GenuineIntel
    
    IMAGE_NAME:  GenuineIntel
    
    DEBUG_FLR_IMAGE_TIMESTAMP:  0
    
    FAILURE_BUCKET_ID:  X64_0x124_GenuineIntel_PCIEXPRESS
    
    BUCKET_ID:  X64_0x124_GenuineIntel_PCIEXPRESS
    
    Followup: MachineOwner
    ---------
    
    7: kd> !errrec fffffa800eb65038
    ===============================================================================
    Common Platform Error Record @ fffffa800eb65038
    -------------------------------------------------------------------------------
    Record Id     : 01cd6c5a4fe290f5
    Severity      : Fatal (1)
    Length        : 672
    Creator       : Microsoft
    Notify Type   : PCI Express Error
    Timestamp     : 7/28/2012 0:56:08 (UTC)
    Flags         : 0x00000000
    
    ===============================================================================
    Section 0     : PCI Express
    -------------------------------------------------------------------------------
    Descriptor    @ fffffa800eb650b8
    Section       @ fffffa800eb65148
    Offset        : 272
    Length        : 208
    Flags         : 0x00000001 Primary
    Severity      : Fatal
    
    Port Type     : Root Port
    Version       : 1.1
    Command/Status: 0x4010/0x0506
    Device Id     :
      VenId:DevId : 8086:340c
      Class code  : 030400
      Function No : 0x00
      Device No   : 0x05
      Segment     : 0x0000
      Primary Bus : 0x00
      Second. Bus : 0x00
      Slot        : 0x0000
    Dev. Serial # : 0000000000000000
    Express Capability Information @ fffffa800eb6517c
      Device Caps : 00008021 Role-Based Error Reporting: 1
      Device Ctl  : 0107 ur FE NF CE
      Dev Status  : 0005 ur FE nf CE
       Root Ctl   : 0008 fs nfs cs
    
    AER Information @ fffffa800eb651b8
      Uncorrectable Error Status    : 00040000 ur ecrc MTLP rof uc ca cto fcp ptlp sd dlp und
      Uncorrectable Error Mask      : 00000000 ur ecrc mtlp rof uc ca cto fcp ptlp sd dlp und
      Uncorrectable Error Severity  : 00062010 ur ecrc MTLP ROF uc ca cto FCP ptlp sd DLP und
      Correctable Error Status      : 00000040 adv rtto rnro dllp TLP re
      Correctable Error Mask        : 00000000 adv rtto rnro dllp tlp re
      Caps & Control                : 00000012 ecrcchken ecrcchkcap ecrcgenen ecrcgencap FEP
      Header Log                    : 3f000000 04000030 00000000 00000000
      Root Error Command            : 00000000 fen nfen cen
      Root Error Status             : 00000000 MSG# 00 fer nfer fuf mur ur mcr cer
      Correctable Error Source ID   : 00,00,00
      Correctable Error Source ID   : 00,00,00
    
    ===============================================================================
    Section 1     : Processor Generic
    -------------------------------------------------------------------------------
    Descriptor    @ fffffa800eb65100
    Section       @ fffffa800eb65218
    Offset        : 480
    Length        : 192
    Flags         : 0x00000000
    Severity      : Informational
    
    Proc. Type    : x86/x64
    Instr. Set    : x64
    CPU Version   : 0x00000000000106a5
    Processor ID  : 0x0000000000000007
    The source is definately the USB port but I just dont know how to fix it. Can anybody give some pointers? Thanks in advance

  8. #28

    Re: PCI-E WHEA errors (0x124)

    Quote Originally Posted by stucko View Post
    Hi all,

    Ive been reading this thread and it has been helpful.

    I have a similar problem but dont know how to solve it eventhough I think I know where the source of the prob.

    It started after I resinstalled Windows 7 on my new SSD. The biggest problem is that I didnt have any windows 7 installer.

    So now I have two windows 7, 1 on my old HD and one on my new HD (SSD). The old one works fine with ocassional crash that doesnt have a bluescreen (im guessing its heat).

    My new one however is very annoying , almost everytime I plug in a hard disk in a particular USB PORT< the OS crashes with a minidump as follows :

    Code:
    Microsoft (R) Windows Debugger Version 6.2.8400.4218 AMD64
    Copyright (c) Microsoft Corporation. All rights reserved.
    
    
    Loading Dump File [C:\Windows\Minidump\072812-15038-01.dmp]
    Mini Kernel Dump File: Only registers and stack trace are available
    
    Symbol search path is: SRV*C:\Program Files (x86)\Windows Kits\8.0\Symbols*http://msdl.microsoft.com/download/symbols
    Executable search path is: 
    Windows 7 Kernel Version 7601 (Service Pack 1) MP (8 procs) Free x64
    Product: WinNt, suite: TerminalServer SingleUserTS
    Built by: 7601.17835.amd64fre.win7sp1_gdr.120503-2030
    Machine Name:
    Kernel base = 0xfffff800`02e0b000 PsLoadedModuleList = 0xfffff800`0304f670
    Debug session time: Sat Jul 28 08:56:08.271 2012 (UTC + 8:00)
    System Uptime: 0 days 0:10:29.239
    Loading Kernel Symbols
    .
    SYMSRV:  c:\program files (x86)\windows  kits\8.0\symbols*http://msdl.microsoft.com/download/symbols  needs a downstream store
    ..............................................................
    ................................................................
    .......................
    Loading User Symbols
    Loading unloaded module list
    .......
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************
    
    Use !analyze -v to get detailed debugging information.
    
    BugCheck 124, {4, fffffa800eb65038, 0, 0}
    
    *** WARNING: Unable to verify timestamp for win32k.sys
    *** ERROR: Module load completed but symbols could not be loaded for win32k.sys
    Probably caused by : GenuineIntel
    
    Followup: MachineOwner
    ---------
    
    7: kd> !analyze -v
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************
    
    WHEA_UNCORRECTABLE_ERROR (124)
    A fatal hardware error has occurred. Parameter 1 identifies the type of error
    source that reported the error. Parameter 2 holds the address of the
    WHEA_ERROR_RECORD structure that describes the error conditon.
    Arguments:
    Arg1: 0000000000000004, PCI Express Error
    Arg2: fffffa800eb65038, Address of the WHEA_ERROR_RECORD structure.
    Arg3: 0000000000000000
    Arg4: 0000000000000000
    
    Debugging Details:
    ------------------
    
    
    BUGCHECK_STR:  0x124_GenuineIntel
    
    CUSTOMER_CRASH_COUNT:  1
    
    DEFAULT_BUCKET_ID:  WIN7_DRIVER_FAULT
    
    PROCESS_NAME:  System
    
    CURRENT_IRQL:  7
    
    STACK_TEXT:  
    fffff880`031aea78 fffff800`03405a3b : 00000000`00000124 00000000`00000004 fffffa80`0eb65038 00000000`00000000 : nt!KeBugCheckEx
    fffff880`031aea80 fffff800`02f97b03 : 00000000`00000001  fffffa80`0e5680c0 00000000`00000000 fffffa80`0e567b60 :  hal!HalBugCheckSystem+0x1e3
    fffff880`031aeac0 fffff880`00f2fbcf : fffffa80`00000750  fffffa80`0e5680c0 00000000`00000001 fffffa80`0eb64ab0 :  nt!WheaReportHwError+0x263
    fffff880`031aeb20 fffff880`00f2f5f6 : 00000000`00000000  fffff880`031aec70 fffffa80`0ecfed80 fffffa80`0f7721a0 :  pci!ExpressRootPortAerInterruptRoutine+0x27f
    fffff880`031aeb80 fffff800`02e8601c : fffff880`03186180  00000000`ffffffff fffffa80`0ecfed80 fffff880`04198501 :  pci!ExpressRootPortInterruptRoutine+0x36
    fffff880`031aebf0 fffff800`02e81ea2 : fffff880`03186180  fffff880`00000002 00000000`00000002 fffff800`00000000 :  nt!KiInterruptDispatch+0x16c
    fffff880`031aed80 00000000`00000000 : 00000000`00000000  00000000`00000000 00000000`00000000 00000000`00000000 :  nt!KiIdleLoop+0x32
    
    
    STACK_COMMAND:  kb
    
    FOLLOWUP_NAME:  MachineOwner
    
    MODULE_NAME: GenuineIntel
    
    IMAGE_NAME:  GenuineIntel
    
    DEBUG_FLR_IMAGE_TIMESTAMP:  0
    
    FAILURE_BUCKET_ID:  X64_0x124_GenuineIntel_PCIEXPRESS
    
    BUCKET_ID:  X64_0x124_GenuineIntel_PCIEXPRESS
    
    Followup: MachineOwner
    ---------
    
    7: kd> !errrec fffffa800eb65038
    ===============================================================================
    Common Platform Error Record @ fffffa800eb65038
    -------------------------------------------------------------------------------
    Record Id     : 01cd6c5a4fe290f5
    Severity      : Fatal (1)
    Length        : 672
    Creator       : Microsoft
    Notify Type   : PCI Express Error
    Timestamp     : 7/28/2012 0:56:08 (UTC)
    Flags         : 0x00000000
    
    ===============================================================================
    Section 0     : PCI Express
    -------------------------------------------------------------------------------
    Descriptor    @ fffffa800eb650b8
    Section       @ fffffa800eb65148
    Offset        : 272
    Length        : 208
    Flags         : 0x00000001 Primary
    Severity      : Fatal
    
    Port Type     : Root Port
    Version       : 1.1
    Command/Status: 0x4010/0x0506
    Device Id     :
      VenId:DevId : 8086:340c
      Class code  : 030400
      Function No : 0x00
      Device No   : 0x05
      Segment     : 0x0000
      Primary Bus : 0x00
      Second. Bus : 0x00
      Slot        : 0x0000
    Dev. Serial # : 0000000000000000
    Express Capability Information @ fffffa800eb6517c
      Device Caps : 00008021 Role-Based Error Reporting: 1
      Device Ctl  : 0107 ur FE NF CE
      Dev Status  : 0005 ur FE nf CE
       Root Ctl   : 0008 fs nfs cs
    
    AER Information @ fffffa800eb651b8
      Uncorrectable Error Status    : 00040000 ur ecrc MTLP rof uc ca cto fcp ptlp sd dlp und
      Uncorrectable Error Mask      : 00000000 ur ecrc mtlp rof uc ca cto fcp ptlp sd dlp und
      Uncorrectable Error Severity  : 00062010 ur ecrc MTLP ROF uc ca cto FCP ptlp sd DLP und
      Correctable Error Status      : 00000040 adv rtto rnro dllp TLP re
      Correctable Error Mask        : 00000000 adv rtto rnro dllp tlp re
      Caps & Control                : 00000012 ecrcchken ecrcchkcap ecrcgenen ecrcgencap FEP
      Header Log                    : 3f000000 04000030 00000000 00000000
      Root Error Command            : 00000000 fen nfen cen
      Root Error Status             : 00000000 MSG# 00 fer nfer fuf mur ur mcr cer
      Correctable Error Source ID   : 00,00,00
      Correctable Error Source ID   : 00,00,00
    
    ===============================================================================
    Section 1     : Processor Generic
    -------------------------------------------------------------------------------
    Descriptor    @ fffffa800eb65100
    Section       @ fffffa800eb65218
    Offset        : 480
    Length        : 192
    Flags         : 0x00000000
    Severity      : Informational
    
    Proc. Type    : x86/x64
    Instr. Set    : x64
    CPU Version   : 0x00000000000106a5
    Processor ID  : 0x0000000000000007
    The source is definately the USB port but I just dont know how to fix it. Can anybody give some pointers? Thanks in advance
    Vir Gnarus can correct me if I'm wrong, as he is a far better kernel researcher than I am. However, by over viewing your memory dump, it appears that the issue is coming from a faulty PCI-E bus port. Then again, I might be incorrect.

  9. #29

    Join Date
    Mar 2012
    Posts
    469

    Re: PCI-E WHEA errors (0x124)

    Is this one of those SSD drives that's a PCI/PCI-E card? Or do you have your SSD drive plugged into a drive controller card? If either, that could probably explain things that's going on here. Unfortunately I cannot tell yet from the WHEA info here what exactly caused the issue, yet I do know what the issue is.

    This is caused by a malformed TLP, as in a packet of data sent through the PCI/PCI-E bus got altered during transit. T3lnet and I discovered this can be attributed to physical issues such as dirty PCI/PCI-E port or a card that's not inserted properly. Make sure these are not the case. In addition, look online for any firmware updates associated with your SSD drive and/or drive controller card, as well as chipset drivers for your motherboard and your BIOS. All of these - especially firmware for your SSD drive - are well capable of causing instability when using said drive.

    After this is done you may need to do a repair install Windows 7 on your SSD drive of do an SFC scan to ensure Windows files are not corrupted by your drive's potential instability.

    There is also, of course, the possibility we're dealing with a bad SSD drive that may need replacing.

  10. #30

    Join Date
    Feb 2012
    Posts
    2,086
    Blog Entries
    7

    Re: PCI-E WHEA errors (0x124)

    Just an observation - but when STOP 0x124 first came out (in Vista), the majority of the errors were with Arg1 = 4
    It wasn't until much, much later (probably a year or two) that the Arg1 = 1 errors became more common.

    Back then they were suggestive of a video problem - so you might want to check your video stuff also.

  11. #31

    Join Date
    Mar 2012
    Posts
    469

    Re: PCI-E WHEA errors (0x124)

    Just a guess here, but that probably was because when Vista came out PCI-E was rather early in its stages, and driver developers haven't yet ironed out everything to work properly with the new standard, so more bugs were bound to happen, especially video drivers. Video driver developers will tell you that their drivers are designed to fix issues in their hardware's architecture, not to enhance it (though fixing often does do so).
    usasma, niemiro and LilBambi say thanks for this.

  12. #32

    Re: PCI-E WHEA errors (0x124)

    Hi guys,

    thanks for the replies.

    FYI,

    I have 2 OS (windows 7) running on the computer,
    (1) on a normal HDD (ST31000528AS - 1TB) running on SATA 3 Gb/s (This OS is installed by default by Dell)
    (2) on an SSD (OCZ-AGILITY) running on SATA 6 Gb/s (This OS is the one I recently installed by me)

    I was using this computer for about 1.5 years now using the OS on (1), and didnt encounter any BSOD. (although once or twice a week I get crashes (non-bsod, just restarts) and i assume is due to heat because i rarely turn my PC off)

    After I inserted the SSD (not PCI) the old OS still works fine. But on the OS (2) a BSOD comes out everytime i plug in to a USB 3.0 slot.

    I was puzzled as to why my OS(1) is fine while my OS(2) had problems. I compared the drivers and seems like my new OS had an older version.
    So I did some net searching and finally found a driver for my NEC->Renesas USB port which runs on a PCIE card. After installing it my USB can now be used without occuring a BSOD Most of the time, it still sometimes crash with a BSOD but not during plugin but sometimes when running some programs or exploring the disk.

    The USB runs fine on the old OS (1) eventhough its running an older driver.

    Anyone still thinks its the VideoCard's fault or the SSD's fault?
    How do I double check/ confirm if the SSD is faulty? or the BUS PCI card is faulty. (id cry if its the SSD, since its new and expensive)

    I noticed that the PCI card is a bit dusty, so ive cleaned it up and reslotted it. But im still wondering why OS(1) works fine while the new one OS(2) using latest drivers is not working so well.

    Since the BSODs are not frequent anymore (hard to reproduce), il just use my computer as it is and will report if things have improved and will come here and see if theres any other suggestions on how to fix it.

    Again thanks for the replies.

  13. #33

    Join Date
    Mar 2012
    Posts
    469

    Re: PCI-E WHEA errors (0x124)

    How did you install the OS on the SSD? Did you make any adjustments to the BIOS concerning your drive (such as switching from SATA IDE to AHCI) since then? Also, did you apply the SSD firmware updates before or after installing the OS? The thing is you need to set up the SSD drive properly first (install firmware updates, set BIOS settings, etc.), and then install the OS. Having the OS installed prior or during manipulation to the SSD or BIOS will affect the stability of the OS. There's also the potential either the OS got corrupted during installation. Overall, you'll want to make sure to reinstall the OS for the SSD drive after you've completed all the changes to it.

    One cannot discern very well if an SSD drive is bad or not because there hasn't been any reliable diagnostic tests for it. Check OCZ's website and see if they have one, otherwise there's nothing really you can do aside from swapping drives. I also should let you know that OCZ has a reputation for having faster SSD drives than most competitors - but at the price of reduced reliability; their drives have a tendency to bug out moreso than others. Hopefully you have a warranty still available to put to use.

    Btw, you want to make sure that the slots themselves are clean, not so much the cards (though the connector should be clean). It's making sure the connection between the slot and the card is not impeded by any residue.

  14. #34

    Re: PCI-E WHEA errors (0x124)

    It's worth noting that a lot of motherboards hang off certain controllers (USB3, SATA) on the PCI-E bus internally, so bad drives and bad (or misbehaving) USB devices on those buses can cause issues. I've seen this on a lot of motherboards running Vista or 7 on an SSD where it worked fine with a spinner, and XP had no issues either (mostly because it made no distinction between SSD and mechanical, whereas Vista and higher at least make an attempt to behave differently). Not saying it's the issue here, but it can be, esp. on OEM boards.
    Vir Gnarus and LilBambi say thanks for this.
    MCTS Windows Internals, MCITP Server 2008 EA, MCTS MDT/BDD, MCSE/MCSA Server 2003, Server 2012, Windows 8

  15. #35

    Re: PCI-E WHEA errors (0x124)

    Hello all,

    This thread has really a wealth of information. I stumbled onto it while digging into my problem which is very similar.
    I have an HP laptop that is BSOD-ing (STOP 0x124), and the eventviewer is full of repeated messages regarding WHEA, correcting a hardware error etc.
    The laptop is fairly new, Win7 64bit OS, and had its MB, and graphics card replaced without good results.

    I am not sure how to interpret the following :

    Device Id :
    VenIdevId : 8086:d138
    Class code : 030400

    PCIdatabase .com does not recognize d138 device id. I consulted the PCIExpress base specification document, but could not decipher Class code: 030400.

    Perhaps you can help me shed some more light on my WHEA listing.

    Code:
    ===============================================================================
    Common Platform Error Record @ fffffa8007b41038
    -------------------------------------------------------------------------------
    Record Id     : 01cd6fb01727dcad
    Severity      : Fatal (1)
    Length        : 672
    Creator       : Microsoft
    Notify Type   : PCI Express Error
    Timestamp     : 8/1/2012 9:28:17
    Flags         : 0x00000000
    
    ===============================================================================
    Section 0     : PCI Express
    -------------------------------------------------------------------------------
    Descriptor    @ fffffa8007b410b8
    Section       @ fffffa8007b41148
    Offset        : 272
    Length        : 208
    Flags         : 0x00000001 Primary
    Severity      : Recoverable
    
    Port Type     : Root Port
    Version       : 1.1
    Command/Status: 0x0010/0x0407
    Device Id     :
      VenId:DevId : 8086:d138
      Class code  : 030400
      Function No : 0x00
      Device No   : 0x03
      Segment     : 0x0000
      Primary Bus : 0x00
      Second. Bus : 0x00
      Slot        : 0x0000
    Dev. Serial # : 0000000000000000
    Express Capability Information @ fffffa8007b4117c
      Device Caps : 00008021 Role-Based Error Reporting: 1
      Device Ctl  : 0107 ur FE NF CE
      Dev Status  : 0003 ur fe NF CE
       Root Ctl   : 0008 fs nfs cs
    
    AER Information @ fffffa8007b411b8
      Uncorrectable Error Status    : 00014000 ur ecrc mtlp rof UC ca CTO fcp ptlp sd dlp und
      Uncorrectable Error Mask      : 00000000 ur ecrc mtlp rof uc ca cto fcp ptlp sd dlp und
      Uncorrectable Error Severity  : 00062010 ur ecrc MTLP ROF uc ca cto FCP ptlp sd DLP und
      Correctable Error Status      : 00002000 ADV rtto rnro dllp tlp re
      Correctable Error Mask        : 00000000 adv rtto rnro dllp tlp re
      Caps & Control                : 0000000e ecrcchken ecrcchkcap ecrcgenen ecrcgencap FEP
      Header Log                    : 4a000001 01000004 00180000 00000000
      Root Error Command            : 00000000 fen nfen cen
      Root Error Status             : 00000000 MSG# 00 fer nfer fuf mur ur mcr cer
      Correctable Error Source ID   : 00,00,00
      Correctable Error Source ID   : 00,00,00
    
    ===============================================================================
    Section 1     : Processor Generic
    -------------------------------------------------------------------------------
    Descriptor    @ fffffa8007b41100
    Section       @ fffffa8007b41218
    Offset        : 480
    Length        : 192
    Flags         : 0x00000000
    Severity      : Informational
    
    Proc. Type    : x86/x64
    Instr. Set    : x64
    CPU Version   : 0x00000000000106e5
    Processor ID  : 0x0000000000000001

  16. #36

    Join Date
    Mar 2012
    Posts
    469

    Re: PCI-E WHEA errors (0x124)

    @Cluberti:

    Thanks, Carl. I did notice the SATA class codes in the PCI-E classification, but I didn't think much about it because I figured it was related to SATA controller cards and not any actual onboard controller. In retrospect, I now understand certain incidents I've come across dealing with these WHEA errors and fixing it by fiddling with the HD. Again, thanks!

    @rijeka051:

    Hi mate, the PCI database is helpful often, but isn't exactly the most robust database for this, as it depends primarily on voluntary contributions. Though I'm afraid it won't help regardless if you figure them out or not, as from my previous research I've found it only is defining the root port that reported the issue and not the actual device that this problem originated from (if there even was a device it came from). Now if the WHEA Error mentions that an actual device reported this error and not a bridge or root port, then it'll be helpful. I assume our only option is to look at the Header Log under the AER information and try and interpret it to figure out the device that sent this packet.

    As for the error, there's two: UC and CTO are capitalized (highlighted) in Uncorrectable Error Status. Since this is from a root port, we'll look at PCI_EXPRESS_ROOT_PORT_AER_CAPABILITY structure. These are found to be Unexpected Completion and Completion TimeOut, which I would venture to guess the latter caused the former. If a transaction got timed out, then that probably means a packet either went the wrong way and never was sent to its proper destination, or the packet was lost or discarded.

    As for translating this into potential causes, I'm not sure. I would figure anything that would cause a packet to be lost unexpectedly during transit (PSU instability, motherboard failure, overheating, dust/foreign material) would be initially suspect. As Cluberti has explained this could also be related to USB or SATA since they can commonly be connected on OEM mobos through the PCI-E bus.

    I'd like to dabble in it but I'll have to scrutinize over it later.

  17. #37
    Wrench97's Avatar
    Join Date
    Feb 2012
    Location
    S.E. Pennsylvania
    Posts
    2,586

    Re: PCI-E WHEA errors (0x124)

    It's only on OEM boards can be the ports be connected through the PCIe bus but most if not all retail boards as well.

  18. #38

    Re: PCI-E WHEA errors (0x124)

    @VirGnarus:
    I also should let you know that OCZ has a reputation for having faster SSD drives than most competitors - but at the price of reduced reliability; their drives have a tendency to bug out moreso than others
    Haha guess i didnt do enough research to find that out. I think I did all the required steps in order to install the OS on the SDD as you have stated. I guess ill go about in making sure that my SDD is not faulty.

    @Cluberti:
    Thanks for pointing out that windows vista and higher behave differently on Spinner/SDD and that motherboards hang off certain controllers. Mine is OEM. Owh well I guess ill have to live with it, ill be a couple of years till i get a new rig.

    Thanks guys for being very helpful :)

  19. #39
    writhziden's Avatar
    Join Date
    May 2012
    Location
    Colorado
    Posts
    2,328
    • specs System Specs
      • Manufacturer:
        Sony
      • Model Number:
        VPCF232FX/B
      • Motherboard:
        Sony Corporation VAIO
      • CPU:
      • Memory:
        8.00 GB Crucial CT2KIT51264BF1339 DDR3 1333
      • Graphics:
      • Sound Card:
        Realtek High Definition Audio/nVidia High Definition Audio
      • Hard Drives:
        TOSHIBA MK5061GSY 500 GB (465 GB actual)
      • Case:
        Laptop black matte case with backlit keyboard
      • Cooling:
        Air cooling via fan and heat exchanger heatsink
      • Display:
        Laptop display
      • Operating System:
        Windows 7 Home Premium 64 Bit

    Re: PCI-E WHEA errors (0x124)

    stucko, have you tried a power cycle with the SSD? I have seen OCZ SSD firmware updates and the like cause problems between the BIOS/SSD interface that can result in 0x124 and 0x7A crashes. Resetting the BIOS/SSD connection can resolve the problem and the steps to do so involve power cycling the SSD. I just gave these steps to another SSD OCZ user, and I gave them to an OCZ user a couple months back getting a 0x124 crash that was resolved with the power cycle.

    SSD Troubleshooting:
    Try doing a power cycle of the SSD. The following steps should be carried out and take ~1 hour to complete.
    1. Power off the system.
    2. Remove all power supplies (ac adapter then battery for laptop, ac adapter for desktop)
    3. Hold down the power button for 30 seconds to close the circuit and drain all components of power.
    4. Reconnect all power supplies (battery then ac adapter for laptop, ac adapter for desktop)
    5. Turn on the system and enter the BIOS (see your manual for the steps to enter the BIOS)
    6. Let the computer remain in the BIOS for 20 minutes.
    7. Follow steps 1-3 and physically remove the SSD from the system by disconnecting the cables for a desktop or disconnecting the drive from the junction for a laptop.
    8. Leave the drive disconnected for 30 seconds to let all power drain from it.
    9. Replace the drive connection(s) and then do steps 4-8 again.
    10. Repeat steps 1-4.
    11. Start your computer normally and run Windows.




    The above steps were a result of: Why did my SSD "disappear" from my system? - Crucial Community


    While that may not be your drive, a power cycle should be the same on all SSD drives. See how the system responds after the SSD power cycle.

  20. #40
    Wrench97's Avatar
    Join Date
    Feb 2012
    Location
    S.E. Pennsylvania
    Posts
    2,586

    Re: PCI-E WHEA errors (0x124)

    Not PCIe x124's but an interesting read from Intel on 124 MCE's > http://download.intel.com/design/int...ERS/324077.pdf
    jcgriff2 and usasma say thanks for this.

Page 2 of 3 First 123 Last

Similar Threads

  1. i5 3570k WHEA BSOD's
    By Skorov in forum BSOD, Crashes, Kernel Debugging
    Replies: 3
    Last Post: 06-18-2013, 06:38 AM
  2. Windows 7 x64 BSOD - 0x124 - Please Help
    By divine123 in forum BSOD, Crashes, Kernel Debugging
    Replies: 14
    Last Post: 05-14-2013, 11:55 PM
  3. BSOD 0x124 what causes this problem?
    By divine123 in forum BSOD, Crashes, Kernel Debugging
    Replies: 10
    Last Post: 05-13-2013, 11:28 AM
  4. [SOLVED] BSOD 0x124 need help
    By Damke in forum BSOD, Crashes, Kernel Debugging
    Replies: 29
    Last Post: 11-01-2012, 12:57 PM
  5. WHEA error for a MCA fault
    By Capt.Jack Sparrow in forum BSOD, Crashes, Kernel Debugging
    Replies: 4
    Last Post: 07-31-2012, 02:54 PM

Log in

Log in