[SOLVED] Been getting MCE and a few clock interrupt BSOD's

wlysix · May 24, 2014

My problems started occurring quite a while back. At first I thought it was my CPU cooler which I had gotten replaced which massively reduced them but still I would receive these blue screens. Now something strange is that a while back the computer would give a blue screen every day but now it's become very random. Sometimes I can do light computing for days (surfing, documents, watching videos) and nothing happens. Other times a random event (opening a gif, a video etc) might set off a blue screen. However, the computer always goes directly to a "machine check exception" error when I run a game.

I ran memtest for around 8 hours which didn't produce any memory error. Also diagnosed HDD and didn't find any errors in it.

· OS: Windows 7 64-bit Professional
· I've always had the same OS, recently reinstalled since I started getting many blue screens
· It's a custom configuration made in June 2010
· Age of system (hardware): 2010
· I have re-installed the OS

· Intel i7 930 @ 2.8GHZ stock
· MSIGeforce GTX 275 currently (Was using Zotac Geforce GTX 670 AMP but sent it for RMA after it was causing overheating issues)
· MotherBoard: Gigabyte X58-UD7 Rev 1.0
· CoolerMaster Real Power Pro 1000W

· System Manufacturer: Custom Configuration

Note about Perfmon report:
My computer gives a clock interrupt not received on secondary process so I can never complete the Windows Experience test which might be the reason for the poor ratings. I also have Kaspersky Internet Security (antivirus+firewall) installed.

Secondary note:
I stopped receiving the "clock interrupt not received on secondary process" after replacing my faulty graphics card and replacing it with my older one. I still get the machine check exception errors when running 3D games. The "clock interrupt" error I got recently when I tried to get the Windows Experience Index score. Very recently the computer has also been locking up for a few seconds and then resuming normally.

I also noticed in the Machine Check Exceptions that FFFFF8800(*)c70 address is always present (* being different addresses).

Bios at optimized defaults led to the machine being unstable so I had to set the RAM XMP profile for it to stabilize.

I suspect I might have a permanently damaged CPU or motherboard but would like a professional opinion on it. Thanks.

Patrick · May 24, 2014

Hi,

I would need a kernel-dump to analyze the 0x101's at a deeper level for more information, however, we have a lot of MACHINE_CHECK_EXCEPTION (9c) bug checks.

This bug check indicates that a fatal machine check exception has occurred.

Code:

3: kd> k
Child-SP          RetAddr           Call Site
fffff880`037dbc38 fffff800`03602818 nt!KeBugCheckEx
fffff880`037dbc40 fffff800`03601f57 hal!HalpMcaReportError+0x164
fffff880`037dbd90 fffff800`035f5e88 hal!HalpMceHandlerWithRendezvous+0x9f
fffff880`037dbdc0 fffff800`0307f4ac hal!HalHandleMcheck+0x40
fffff880`037dbdf0 fffff800`0307f313 nt!KxMcheckAbort+0x6c
fffff880`037dbf30 fffff880`075159c2 nt!KiMcheckAbort+0x153
fffff880`037fbc98 00000000`00000000 [COLOR=#ff0000]intelppm!C1Halt+0x2[/COLOR]

In every single one of your crashes, your system calls the bug check as soon as the processor calls the halt state instruction. At this time, the CPU is supposed to remain stopped and/or idle, not doing anything at all. As soon as the CPU receives an interrupt that needs to be serviced (remember, OS'/hardware work based on interrupts that are fired), it will wake and service said interrupt(s).

Given you are crashing with an 0x9C every single time this halt transition occurs, my theories are as follows:

1. Your CPU is faulty in the fact that it simply cannot transition to a halt state properly.

2. Your CPU is faulty in the fact that whilst one core is asleep, another core(s) is waiting for it to wake up and service the interrupt, but it doesn't. I cannot confirm this one without a kernel, but I digress.

Anyway, with this said, I'm 99% sure your CPU is faulty.

wlysix · May 24, 2014

Hi,
Thanks for the detailed analysis, Patrick! I zipped up the memory.dmp and it got compressed from 600 to 200mb but I think the manage attachment dialog box has glitched out on me. I would have replied sooner if not for the attachment which has been stuck like that for the past hour and a half. Which other site would you recommend for uploading it?

I know you said it is 99% a CPU issue but I just want to make it official before I dump the cpu.

Thanks again

Patrick · May 24, 2014

Mediafire, please.

Regards,

Patrick

wlysix · May 24, 2014

Hi,
Sorry it took so long. My upload speed is slow. Here is the link: https://www.mediafire.com/?ge3b7oaycd7g4ve

I think this dmp might be the one for today morning where I tried running the windows experience index and I got the "clock interrupt not received on the secondary processor" BSOD.

Thanks

Patrick · May 24, 2014

Thanks very much!

Unfortunately, the dump is completely corrupt. This is either due to the fact that as I noted above your processor simply refused to wake up for the dump process, therefore no registers, call stack, etc, happened to be properly dumped, or dumps aren't configured properly.

1. Windows key + Pause key. This should bring up System. Click Advanced System Settings on the left > Advanced > Performance > Settings > Advanced > Ensure there's a check-mark for 'Automatically manage paging file size for all drives'.

2. Windows key + Pause key. This should bring up System. Click Advanced System Settings on the left > Advanced > Startup and Recovery > Settings > System Failure > ensure there is a check mark next to 'Write an event to the system log'.

3. Double check that the WERS is ENABLED:

Start > Search > type services.msc > Under the name tab, find Windows Error Reporting Service > If the status of the service is not Started then right click it and select Start. Also ensure that under Startup Type it is set to Automatic rather than Manual. You can do this by right clicking it, selecting properties, and under General selecting startup type to 'Automatic', and then click Apply.

If you cannot get into normal mode to do any of this, please do this via Safe Mode.

Regards,

Patrick

wlysix · May 25, 2014

Hi Patrick,
Sorry about the corrupt dmp file. Anyway, the first two steps were checked already but the third step was not. I did that and started a 3D game to induce a MCE. Here is the dmp file: https://www.mediafire.com/?ge3b7oaycd7g4ve

It was all done in normal mode, there was no need to go into safe mode as otherwise for normal and less demanding tasks the computer is fine.

Thanks

Edit:
Just after posting this and going back to regular browsing, the computer hung up on me and gave me a "clock interrupt" bluescreen which I found to be very strange.

Patrick · May 25, 2014

Unfortunately, it's an 0x9C and there's really no debugging to be done here. I was hoping for an 0x101, but I digress.

Honestly, I'd be willing to bet money that this is a faulty CPU. That's how sure I am. If you really want to wait for an 0x101 though, I can do that.

Regards,

Patrick

wlysix · May 25, 2014

Hi,
As soon as I read the thread a few hours back, I got a 0x101 error and got another one while uploading it. Guess I'm posting it because I wanted a confirmation from you about whether it's only the processor at fault or it's both the motherboard and processor.

Here are both:
MEMORY-Clock Interrupt
MEMORY CI 101

Thanks again

Patrick · May 25, 2014

Ah, great! Let's begin.

CLOCK_WATCHDOG_TIMEOUT (101)

This indicates that an expected clock interrupt on a secondary processor, in a multi-processor system, was not received within the allocated interval.

Code:

BugCheck 101, {[COLOR=#ff0000]19[/COLOR], 0, [COLOR=#4b0082]fffff880038f7180[/COLOR], 6}

19 clock ticks in regards to the timeout.

fffff880038f7180 is the PRCB address of the hung processor, let's keep this address in mind.

Code:

0: kd> !prcb 6
PRCB for Processor 6 at [COLOR=#4b0082]fffff880038f7180[/COLOR]:
Current IRQL -- 0
Threads--  Current fffff880039020c0 Next fffffa8005955750 Idle fffff880039020c0
Processor Index 6 Number (0, 6) GroupSetMember 40
Interrupt Count -- 003627cd
Times -- Dpc    0000001d Interrupt 00000025 
         Kernel 000bc5cc User      0000754d

For reference, I did not do !prcb 0 through 4. That would have been very tedious. Instead, you can use !running -it. The "i" argument causes it to display idle processors too, and "t" displays the stack trace for the thread running on each processor. If we run that extension, it shows the is an 8 core box.

Hint: At times, the 4th parameter of the bug check will show you the responsible processor. For example, in your *101 here, it was correct as the 4th parameter was 6.

Hint #2: You can also generally tell the amount of cores on the box by checking the bugcheck_string - FAILURE_BUCKET_ID: X64_CLOCK_WATCHDOG_TIMEOUT_8_PROC_ANALYSIS_INCONCLUSIVE

As this matches the 3rd parameter of the bug check, processor #6 is the responsible processor. Now with the information we have here thus far, we know that processor #6 reached 19 clock ticks without responding, therefore the system crashed. Before we go further, what is a clock tick? A clock interrupt is a form of interrupt which involves counting the the cycles of the processor core, which is running a clock on the processors to keep them all in sync. A clock interrupt is handed out to all processors and then they must report in, and when one doesn't report in, you then crash.

Let's now look at the stacks of the different processors to see what the threads were involved in:

Code:

0: kd> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
fffff800`00b9c088 fffff800`030e0a4a : 00000000`00000101 00000000`00000019 00000000`00000000 fffff880`038f7180 : nt!KeBugCheckEx
fffff800`00b9c090 fffff800`030936f7 : 0000057f`00000000 fffff800`00000006 00000000`00002710 fffffa80`077ad8f0 : nt! ?? ::FNODOBFM::`string'+0x4e3e
fffff800`00b9c120 fffff800`03603895 : fffff800`03629460 fffff800`00b9c2d0 fffff800`03629460 fffffa80`00000000 : nt!KeUpdateSystemTime+0x377
fffff800`00b9c220 fffff800`03086113 : 00000000`1ac652cf fffff800`00b9c2d0 fffffa80`06f21128 fffff800`00b9cc58 : hal!HalpHpetClockInterrupt+0x8d
fffff800`00b9c250 fffff800`03070c50 : 00000000`00000000 00000000`000000ff 00000000`00000000 00000000`00000801 : nt!KiInterruptDispatchNoLock+0x163 (TrapFrame @ [COLOR=#ff0000]fffff800`00b9c250[/COLOR])
fffff800`00b9c3e0 fffff800`03092d19 : 00000000`00000002 00000000`00000000 fffffa80`04f34c58 fffff800`0307fdfe : nt!KxWaitForSpinLockAndAcquire+0x10
fffff800`00b9c410 fffff880`089a1f66 : fffffa80`076f1900 fffff800`00b9cc58 fffff800`00b9c4c0 fffffa80`06f201a0 : nt!KeAcquireSpinLockRaiseToDpc+0x89
fffff800`00b9c460 fffff880`089a315d : fffffa80`071c51a0 fffff880`089b1f11 fffffa80`095512a0 fffffa80`06f20050 : USBPORT!USBPORT_AcquireEpListLock+0x2e
fffff800`00b9c490 fffff880`089bc83f : fffffa80`095512a0 fffffa80`06f201a0 fffffa80`06f20050 fffffa80`08d3d800 : USBPORT!USBPORT_ReferenceEndpoint+0x29
fffff800`00b9c4e0 fffff880`089c5454 : fffffa80`08d3d800 fffffa80`08d3d800 00000000`00000000 fffffa80`08d3d800 : USBPORT!USBPORT_Ev_Rh_IntrEp_Invalidate+0xf3
fffff800`00b9c540 fffff800`0309485c : fffff800`00b9c600 00000000`00000000 00000000`00000001 fffff800`00b9c600 : USBPORT!USBPORT_AsyncTimerDpc+0xb8
fffff800`00b9c570 fffff800`030946f6 : fffffa80`08d3d820 00000000`000c3b64 00000000`00000000 00000000`00000000 : nt!KiProcessTimerDpcTable+0x6c
fffff800`00b9c5e0 fffff800`030945de : 0000001d`1debc6da fffff800`00b9cc58 00000000`000c3b64 fffff800`03207f08 : nt!KiProcessExpiredTimerList+0xc6
fffff800`00b9cc30 fffff800`030943c7 : 00000007`f1271bc1 00000007`000c3b64 00000007`f1271b5d 00000000`00000064 : nt!KiTimerExpiration+0x1be
fffff800`00b9ccd0 fffff800`030818ca : fffff800`03204e80 fffff800`03212cc0 00000000`00000000 fffff880`0899edb0 : nt!KiRetireDpcList+0x277
fffff800`00b9cd80 00000000`00000000 : fffff800`00b9d000 fffff800`00b97000 fffff800`00b9cd40 00000000`00000000 : nt!KiIdleLoop+0x5a

There it is! Let's move forward:

Code:

0: kd> .trap fffff800`00b9c250
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000002 rbx=0000000000000000 rcx=0000000000000001
rdx=000000004f444648 rsi=0000000000000000 rdi=0000000000000000
[COLOR=#ff0000]rip=fffff80003070c50[/COLOR] rsp=fffff80000b9c3e0 rbp=fffffa8006f21128
 r8=000000004f444648  r9=0000000000000000 r10=fffff80003205801
r11=0000000000000002 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl nz na pe nc
[COLOR=#0000cd]nt!KxWaitForSpinLockAndAcquire+0x10[/COLOR]:
[COLOR=#ff0000]fffff800`03070c50 [/COLOR]ffc3            inc     ebx

Here we can find the stored registers and the stack at the time of the interrupt.

This is where we're going to do some instruction disassembling:

Code:

0: kd> u @rip
nt!KxWaitForSpinLockAndAcquire+0x10:
fffff800`03070c50 ffc3            inc     ebx
fffff800`03070c52 851d70072500    test    dword ptr [nt!HvlLongSpinCountMask (fffff800`032c13c8)],ebx
fffff800`03070c58 0f848f4dffff    je      nt! ?? ::FNODOBFM::`string'+0x5de0 (fffff800`030659ed)
fffff800`03070c5e f390            [COLOR=#006400]pause[/COLOR]
fffff800`03070c60 488b0f          mov     rcx,qword ptr [rdi]
fffff800`03070c63 4885c9          test    rcx,rcx
fffff800`03070c66 75e8            [COLOR=#ff0000]jne[/COLOR]     [COLOR=#ff0000]nt!KxWaitForSpinLockAndAcquire+0x10[/COLOR] (fffff800`03070c50)
fffff800`03070c68 f0480fba2f00    lock bts qword ptr [rdi],0

Disassembling the first few instructions reveals a jump if not zero (jne) that is back up in KxWaitForSpinLockAndAcquire+0x10. It appears at the time of the bug check, the thread was executing a pause (a CPU delay), and doing this in a loop waiting for a release. Why? Let's keep going to find out!

So, what's the summary so far? Processor #0 was the thread that created the bugcheck itself, and must have been interrupted by a clock interrupt in order to trigger the CLOCK_WATCHDOG_TIMEOUT bug check.

Let's take a look into Processor #1's call stack like we did Processor #0:

Code:

1: kd> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
fffff880`0371bc98 fffff800`03092709 : 00000000`00299ead fffffa80`071f13f8 fffffa80`06f77a02 fffffa80`06f77a18 : intelppm!C1Halt+0x2
fffff880`0371bca0 fffff800`0308189c : fffff880`009e9180 fffff880`00000000 00000000`00000000 fffff880`0899edb0 : nt!PoIdle+0x52a
fffff880`0371bd80 00000000`00000000 : fffff880`0371c000 fffff880`03716000 fffff880`0371bd40 00000000`00000000 : nt!KiIdleLoop+0x2c

Processor #1 was idle/asleep.

Let's check Processor #2:

Code:

2: kd> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
fffff880`0e5f14e0 fffff800`030a9251 : 00000000`00000000 00000000`00000008 00000000`00000001 fffff880`02025f00 : nt!KeFlushMultipleRangeTb+0x260
fffff880`0e5f15b0 fffff800`030abc98 : 00000000`00000008 fffff880`0e5f1700 fffff8a0`0eba6000 00000000`00000001 : nt!MiFlushTbAsNeeded+0x1d1
fffff880`0e5f16c0 fffff800`031baf86 : 00000000`00008000 fffffa80`04e77000 00000000`00000009 fffff800`0308ca8a : nt!MiAllocatePagedPoolPages+0x4cc
fffff880`0e5f17e0 fffff800`030a99b0 : 00000000`00008000 fffffa80`04e77000 00000000`00000009 20206553`0307f5f2 : nt!MiAllocatePoolPages+0x906
fffff880`0e5f1920 fffff800`031be43e : 00000000`00000000 fffff880`07bf2090 00000000`00000000 00000000`00008000 : nt!ExpAllocateBigPool+0xb0
fffff880`0e5f1a10 fffff800`0309cf56 : 00000000`00000000 00000000`00000009 fffff8a0`025ab060 fffff800`03375c5f : nt!ExAllocatePoolWithTag+0x82e
fffff880`0e5f1b00 fffff800`032f5f86 : 00000000`00000000 00000000`00008000 00000000`00000000 00000000`00000001 : nt!ExAllocatePoolWithQuotaTag+0x56
fffff880`0e5f1b50 fffff800`0334db94 : fffff8a0`0b80f800 fffff800`00008000 fffff880`0e5f1c01 fffff800`03557da0 : nt!PiControlGetInterfaceDeviceList+0x92
fffff880`0e5f1bd0 fffff800`03088e53 : fffffa80`092773b0 00000000`014aede0 fffff880`0e5f1ca0 00000000`014aee68 : nt!NtPlugPlayControl+0x100
fffff880`0e5f1c20 00000000`7778230a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`0e5f1c20)
00000000`014aeda8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x7778230a

It looks like Processor #2 is waiting to flush the translation lookaside buffer cache. Why? A/some TLB entries likely became invalid due to whatever's going wrong here, therefore the next step is to flush the cache.

Let's check Processor #3:

Code:

3: kd> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
fffff880`037fbc98 fffff800`03092709 : 00000000`00299ead fffffa80`071f1738 00000000`ffffffed 00001fc8`b0289b80 : intelppm!C1Halt+0x2
fffff880`037fbca0 fffff800`0308189c : fffff880`037d3180 fffff880`00000000 00000000`00000000 fffff880`00dd9f78 : nt!PoIdle+0x52a
fffff880`037fbd80 00000000`00000000 : fffff880`037fc000 fffff880`037f6000 fffff880`037fbd40 00000000`00000000 : nt!KiIdleLoop+0x2c

Processor #3 was idle/asleep.

Let's check Processor #4:

Code:

4: kd> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
fffff880`009fcc98 fffff800`03092709 : 00000000`00299ead fffffa80`072031a8 00000000`ffffffed 00001fc8`ca2157d7 : intelppm!C1Halt+0x2
fffff880`009fcca0 fffff800`0308189c : fffff880`009b1180 fffff880`00000000 00000000`00000000 fffff800`03141430 : nt!PoIdle+0x52a
fffff880`009fcd80 00000000`00000000 : fffff880`009fd000 fffff880`009f7000 fffff880`009fcd40 00000000`00000000 : nt!KiIdleLoop+0x2c

Processor #4 was idle/asleep.

Let's check Processor #5:

Code:

5: kd> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
fffff880`0d481670 fffff800`03345bdf : 00000000`00000000 fffff880`0d481ca0 00000000`00000000 00000000`00000000 : nt!KeFlushProcessWriteBuffers+0x65
fffff880`0d4816e0 fffff800`03395416 : 00000000`032a0090 fffff800`00010400 fffff880`0d481870 00000000`00000000 : nt!ExpGetProcessInformation+0x7f
fffff880`0d481830 fffff800`03395e6d : 00000000`032a0090 fffff960`001555e3 00000000`00000005 00000000`0018f828 : nt!ExpQuerySystemInformation+0xfb4
fffff880`0d481be0 fffff800`03088e53 : fffffa80`08ab3340 00000000`7efdb000 00000000`00000020 00000000`00000000 : nt!NtQuerySystemInformation+0x4d
fffff880`0d481c20 00000000`7778161a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`0d481c20)
00000000`0008e218 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x7778161a

Processor #5 appears to be waiting to flush the write queue of each processor that is running a thread of the current process. The reason it cannot do this quite yet is evidently due to the fact that we have two processors waiting so far for Processor #0 to do its job, but it's not.

Let's now take a look at the problematic processor (#6):

Code:

6: kd> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
00000000`00000000 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0

Code:

6: kd> r
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000
 r8=0000000000000000  r9=0000000000000000 r10=0000000000000000
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up di pl nz na pe nc
cs=0000  ss=0000  ds=0000  es=0000  fs=0000  gs=0000             efl=00000000
00000000`00000000 ??              ???

We have a zerod stack + registers, so this will be problematic. Usually this occurs on the problem processor because the IRQL is too high, OR the processor was too hung at the time of the crash to report its information, etc. We will need to get the raw stack.

With this said, we're going to need to dump the raw stack:

Code:

6: kd> !pcr
KPCR for Processor 6 at fffff880038f7000:
    Major 1 Minor 1
    NtTib.ExceptionList: fffff88003902640
        NtTib.StackBase: fffff880038fc040
       NtTib.StackLimit: 000000000038f0f8
     NtTib.SubSystemTib: fffff880038f7000
          NtTib.Version: 00000000038f7180
      NtTib.UserPointer: fffff880038f77f0
          NtTib.SelfTib: 000000007efad000

                SelfPcr: 0000000000000000
                   Prcb: fffff880038f7180
                   Irql: 0000000000000000
                    IRR: 0000000000000000
                    IDR: 0000000000000000
          InterruptMode: 0000000000000000
                    IDT: 0000000000000000
                    GDT: 0000000000000000
                    TSS: 0000000000000000

[COLOR=#ff0000]          CurrentThread: fffff880039020c0[/COLOR]
             NextThread: fffffa8005955750
             IdleThread: fffff880039020c0

              DpcQueue:

Code:

6: kd> !thread [COLOR=#ff0000]fffff880039020c0[/COLOR]
THREAD fffff880039020c0  Cid 0000.0000  Teb: 0000000000000000 Win32Thread: 0000000000000000 RUNNING on processor 6
Not impersonating
DeviceMap                 fffff8a000008ca0
Owning Process            fffff80003213180       Image:         Idle
Attached Process          fffffa8004f03040       Image:         System
Wait Start TickCount      0              Ticks: 802058 (0:03:28:32.184)
Context Switch Count      7064479        IdealProcessor: 6             
UserTime                  00:00:00.000
KernelTime                03:18:57.804
Win32 Start Address nt!KiIdleLoop (0xfffff80003081870)
Stack Init fffff8800391fdb0 Current fffff8800391fd40
Base [COLOR=#006400]fffff88003920000 [/COLOR]Limit [COLOR=#4b0082]fffff8800391a000 [/COLOR]Call 0
Priority 16 BasePriority 0 UnusualBoost 0 ForegroundBoost 0 IoPriority 0 PagePriority 0
Child-SP          RetAddr           : Args to Child                                                           : Call Site
00000000`00000000 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0

Code:

fffff880`0391ec68  fffff880`045982ac [COLOR=#ff0000]klim6+0x22ac[/COLOR]
fffff880`0391eca0  fffffa80`080ec800
fffff880`0391eca8  fffff880`01a024d4 ndis!NdisFSendNetBufferLists+0x64
fffff880`0391ecd0  fffffa80`092792a0
fffff880`0391ecd8  fffff880`01f68800 tcpip!Ipv6Global
fffff880`0391ece0  fffffa80`08d9f520
fffff880`0391ece8  fffff880`04571199 pacer!PcFilterSendNetBufferLists+0x29
fffff880`0391ed10  fffffa80`08b121a0
fffff880`0391ed18  fffff880`01a02419 ndis!ndisSendNBLToFilter+0x69
fffff880`0391ed20  00000000`00000028
fffff880`0391ed70  00000000`00000000
fffff880`0391ed78  fffff880`01abe5d5 ndis!NdisSendNetBufferLists+0x85
fffff880`0391ed80  00000000`00000000
fffff880`0391ede0  fffffa80`08b121a0
fffff880`0391ede8  fffff880`01a02419 ndis!ndisSendNBLToFilter+0x69
fffff880`0391edf0  00000000`0000d140
fffff880`0391edf8  00000000`00000003
fffff880`0391ee00  fffff800`0323f180 nt!MiSystemPteInfo
fffff880`0391ee08  00000000`00010600
fffff880`0391ee10  00000000`0000d140
fffff880`0391ee18  fffff800`030a46d7 nt!MmMapLockedPagesSpecifyCache+0x50c
fffff880`0391ee40  00000000`00000000
fffff880`0391ee48  fffff880`01abe5d5 ndis!NdisSendNetBufferLists+0x85

In the raw stack, we can see a lot of network stuff going on, topping off with a klim6.sys call which is the Kaspersky Lab Intermediate Network driver. Kaspersky may very well be causing NETBIOS conflicts, and holding a lock, preventing the CPU from doing its work, and causing the rest to be held up. However, it may be false and just so happened to be in the stack.

As for Processor #7, we also had a zeroed stack, which is not good. When you have 2 zeroed cores on a processor, it's usually bad news.

Code:

fffff880`0398d968  fffff880`07acc6c1*** ERROR: Symbol file could not be found.  Defaulted to export symbols for nvlddmkm.sys - 
 nvlddmkm+0xc76c1
fffff880`0398d970  00000000`00000000
fffff880`0398d978  fffff880`089513fd dxgmms1!VidSchiProcessIsrVSync+0xc9
fffff880`0398da00  fffff880`0398dc80
fffff880`0398da08  fffff880`08951083 dxgmms1!VidSchDdiNotifyInterruptWorker+0xef
fffff880`0398da50  fffff880`0398dc80
fffff880`0398da58  fffff880`08950f82 dxgmms1!VidSchDdiNotifyInterrupt+0x9e
fffff880`0398da80  fffff880`0398dc80
fffff880`0398da88  fffff880`0885813f dxgkrnl!DxgNotifyInterruptCB+0x83
fffff880`0398da90  fffffa80`06f26480
fffff880`0398da98  fffff880`0895769d dxgmms1!VidSchiUpdateCurrentIsrFrameTime+0x95
fffff880`0398daa0  00000000`02060000
fffff880`0398daa8  fffffa80`07075e28
fffff880`0398dab0  00000000`00000000
fffff880`0398dab8  fffff880`07acc6c1 nvlddmkm+0xc76c1

We have various DirectX Kernel/MMS calls, topping off with an nVidia video driver call.

1. Remove and replace Kaspersky with Microsoft Security Essentials for temporary troubleshooting purposes as it may very likely be causing NETBIOS conflicts:

Kaspersky removal - Service articles

MSE - Microsoft Security Essentials - Microsoft Windows

2. Ensure you have the latest video card drivers. If you are already on the latest video card drivers, uninstall and install a version or a few versions behind the latest to ensure it's not a latest driver only issue. If you have already experimented with the latest video card driver and many previous versions, please give the beta driver for your card a try.

3. If the above fails, please uninstall your video card drivers, and then physically remove the video card and use either integrated graphics, or a secondary video card (if available).

4. If the above fails, it's still a faulty processor to me given the two zeroed stacks.

Regards,

Patrick

wlysix · May 26, 2014

Thanks a lot for such detailed analysis and clearing all doubts! Since it is only the processor at fault, would you recommend I search for another 1366 socket processor or buy a new motherboard and processor?

Patrick · May 26, 2014

I presume since you jumped right to #4, you've tried the other steps before?

How new is the processor, is it still under warranty? If so, Intel should replace it. If not, it's up to you. You can get a new CPU on its own for your socket, or choose this time to upgrade.

Regards,

Patrick

wlysix · May 26, 2014

Yes, it was a no-go as you suspected. Uninstalling kaspersky made no difference and borrowed a GFX card from a friend as well as tried a few drivers which led to the same result. Unfortunately, I purchased the processor in June 2010 and the 3 year warranty is up.

I'm not able to edit the thread but can you please mark it as solved? Thanks once again for all your help, Patrick.

Patrick · May 26, 2014

Solved!

Sorry to hear it was the processor, but again, it was my pleasure.

Regards,

Patrick

[SOLVED] Been getting MCE and a few clock interrupt BSOD's

wlysix

Member

Patrick

Sysnative Staff

wlysix

Member

Patrick

Sysnative Staff

wlysix

Member

Patrick

Sysnative Staff

wlysix

Member

Patrick

Sysnative Staff

wlysix

Member

Patrick

Sysnative Staff

wlysix

Member

Patrick

Sysnative Staff

wlysix

Member

Patrick

Sysnative Staff