First off, let me start by saying that in the middle of debugging your kernel-dump, I accidentally hit F5 and refreshed the page.
..then I saw this, the oasis in the desert :grin1:
Moving on, thanks very much for the kernel-dump. This is actually one of the most (if not the most) difficult debuggings I have done as it was not a fairly 'standard' *101, so this was a phenomenal experience for me. With *101 forum bug checks, in most cases, it isn't
this hard, honestly. The call stacks on basic user systems are fairly to the point and are straight-forward without requiring much intensive debugging. Even with that, this isn't the type of bug check that's as easy as saying 'it was caused by this', you have to explain. I mean, technically you could, but that's no good for the user.
Right, so as per usual, the attached DMP file is of the
CLOCK_WATCHDOG_TIMEOUT (101) bugcheck.
This indicates that an expected clock interrupt on a secondary processor, in a multi-processor system, was not received within the allocated interval.
BugCheck 101, {
31, 0,
fffff88002fd3180, 3}
31 clock ticks in regards to the timeout.
fffff88002fd3180 is the PRCB address of the hung processor, let's keep this address in mind.
Code:
0: kd> !prcb 3
PRCB for Processor 3 at [COLOR=#ff0000][I][B]fffff88002fd3180[/B][/I][/COLOR]:
[COLOR=#ff0000][I][B]Current IRQL -- 0[/B][/I][/COLOR]
Threads-- Current fffffa80077b29a0 Next fffffa8007368b50 Idle fffff88002fddfc0
Processor Index 3 Number (0, 3) GroupSetMember 8
Interrupt Count -- 0083298b
Times -- Dpc 0000142d Interrupt 00000746
Kernel 00049f9a User 0000ce12
For reference, I did not do !prcb 0 through 3. That would have been very tedious. Instead, you can run the
!running -it command. The "i" argument causes it to display idle procs too, and "t" displays the stack trace for the thread running on each proc. Running that specific command shows you're on a
4 core box.
Hint: At times, the 4th parameter of the bug check will show you the responsible processor. For example, in your *101 here, it was correct as the 4th parameter was 3.
Hint #2: You can also generally tell the amount of cores on the box by checking the bugcheck_string - BUGCHECK_STR: CLOCK_WATCHDOG_TIMEOUT_4_PROC
As this matches the 3rd parameter of the bugcheck, processor #3 is the responsible processor. Now with the information we have here thus far, we know that processor #3 reached 31 clock ticks without responding, therefore the system crashed. Before we go further, what is a clock tick? A clock interrupt is a form of interrupt which involves counting the the cycles of the processor core, which is running a clock on the processors to keep them all in sync. A clock interrupt is handed out to all processors and then they must report in, and when one doesn't report in, you then crash.
If we look specifically at processor #3, we can see it did...well... nothing:
Code:
3: kd> kv
Child-SP RetAddr : Args to Child : Call Site
00000000`00000000 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0
If we check the IRQL of the problem processor:
Code:
3: kd> !irql 3
Debugger saved IRQL for processor 0x3 -- [COLOR=#ff0000][I][B]0 (LOW_LEVEL)[/B][/I][/COLOR]
We can also see the IRQL in the !prcb output seen earlier.
Let's also check the rest:
Proc 0:
Code:
0: kd> !prcb 0
PRCB for Processor 0 at fffff80002df6e80:
[COLOR=#ff0000][I][B]Current IRQL -- 13[/B][/I][/COLOR]
Threads-- Current fffffa8003747040 Next 0000000000000000 Idle fffff80002e04cc0
Processor Index 0 Number (0, 0) GroupSetMember 1
Interrupt Count -- 004e37d8
Times -- Dpc 00000031 Interrupt 0000006b
Kernel 0004f290 User 00007dbd
Proc 1:
Code:
0: kd> !prcb 1
PRCB for Processor 1 at fffff880009e7180:
[COLOR=#ff0000][I][B]Current IRQL -- 0[/B][/I][/COLOR]
Threads-- Current fffffa8007654330 Next fffffa8006863060 Idle fffff880009f1fc0
Processor Index 1 Number (0, 1) GroupSetMember 2
Interrupt Count -- 00538b04
Times -- Dpc 00000024 Interrupt 00000088
Kernel 0004c3fc User 0000abfe
Proc 2:
Code:
0: kd> !prcb 2
PRCB for Processor 2 at fffff88002f63180:
[COLOR=#ff0000][I][B]Current IRQL -- 0[/B][/I][/COLOR]
Threads-- Current fffffa80037b7660 Next 0000000000000000 Idle fffff88002f6dfc0
Processor Index 2 Number (0, 2) GroupSetMember 4
Interrupt Count -- 0052489a
Times -- Dpc 00000139 Interrupt 000000de
Kernel 0004d305 User 00009ced
So, from this, we can see that Processors 1, 2, and 3 were at IRQL 0. IRQL 0 = PASSIVE_LEVEL, which is where user threads and most kernel-mode operations take place. Any interrupt can occur at PASSIVE_LEVEL. User-mode code executes at PASSIVE_LEVEL.
However, we can see that Processor 0 was at IRQL 13. IRQL 13 = CLOCK_LEVEL for x64 processors. On an 0x86 processor, CLOCK_LEVEL is 28.
Let's now look at the stacks of the different processors to see what the threads were involved in:
We can use
knL and go through a grueling method of obtaining the trap frame, but we don't like having to put in more work, so let's use
kv instead on Proc 0:
Code:
0: kd> kv
Child-SP RetAddr : Args to Child : Call Site
fffff880`03184a48 fffff800`02cd2a4a : 00000000`00000101 00000000`00000031 00000000`00000000 fffff880`02fd3180 : nt!KeBugCheckEx
fffff880`03184a50 fffff800`02c856f7 : fffff880`00000000 fffff800`00000003 00000000`00002710 fffffa80`0371e2e0 : nt! ?? ::FNODOBFM::`string'+0x4e3e
fffff880`03184ae0 fffff800`031f5895 : fffff800`0321b460 fffff880`03184c90 fffff800`0321b460 fffffa80`00000000 : nt!KeUpdateSystemTime+0x377
fffff880`03184be0 fffff800`02c78113 : 00000000`8ebf83fc fffff800`02df6e80 00000000`02710000 00000000`00000000 : hal!HalpHpetClockInterrupt+0x8d
fffff880`03184c10 fffff800`02c809f0 : fffff800`02df6e80 00000000`00000001 00000000`00000000 fffff880`03184e98 : nt!KiInterruptDispatchNoLock+0x163 ([COLOR=#ff0000][I][B]TrapFrame @ fffff880`03184c10[/B][/I][/COLOR])
fffff880`03184da0 fffff800`02cbf95c : 00000000`0000000d fffff800`02df6e80 fffff8a0`002df790 00000000`00200100 : nt!KeFlushMultipleRangeTb+0x260
fffff880`03184e70 fffff880`0126869c : fffffa80`03747040 0000000b`4c696fe4 00000000`0000000d fffff800`02df6e80 : nt!MmSetAddressRangeModified+0x2b0
fffff880`03184f70 fffff880`01313dfb : fffff8a0`002df790 0000000b`4c696f63 00000000`00000000 00000000`00000000 : Ntfs!LfsFlushLfcb+0x5d8
fffff880`031850f0 fffff880`01315f10 : fffff8a0`002851f0 0000000b`4c696f63 fffff880`03185550 fffff880`03185550 : Ntfs!LfsFlushToLsnPriv+0x143
fffff880`03185180 fffff880`01264274 : fffff8a0`002851f0 0000000b`4c696f63 0000000b`4c696f63 fffff8a0`0272bb40 : Ntfs!LfsFlushToLsn+0xa0
fffff880`031851b0 fffff880`01264e73 : fffff880`03185390 fffffa80`07a5d7a0 fffff880`03185500 00000000`00001000 : Ntfs!NtfsCommonWrite+0x2d63
fffff880`03185360 fffff880`010debcf : fffffa80`07a5db40 fffffa80`07a5d7a0 fffffa80`044b9d30 00000000`00000000 : Ntfs!NtfsFsdWrite+0x1c3
fffff880`031855e0 fffff880`010dd6df : fffffa80`043f08e0 fffffa80`071c8f20 fffffa80`043f0800 fffffa80`07a5d7a0 : fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x24f
fffff880`03185670 fffff800`02c60ecf : fffffa80`07a5d7a0 fffff880`03185c58 fffff880`03185880 fffff800`02df6e80 : fltmgr!FltpDispatch+0xcf
fffff880`031856d0 fffff800`02cbedbb : 00000000`00000000 fffff880`03185c58 fffffa80`01198020 00000000`00000000 : nt!IoSynchronousPageWrite+0x24f
fffff880`03185750 fffff800`02cbd2f8 : fffff8a0`029f76c8 fffff8a0`029f76d0 fffffa80`0722cf90 fffffa80`0722cf90 : nt!MiFlushSectionInternal+0xb7b
fffff880`03185990 fffff800`02cbc7d9 : 00000000`00056e3d 00000000`00000000 00000000`00001000 00000000`00000000 : nt!MmFlushSection+0xa4
fffff880`03185a50 fffff800`02cc0136 : fffffa80`07046f68 00000000`00000001 fffffa80`00000001 00000000`00001000 : nt!CcFlushCache+0x5e9
fffff880`03185b50 fffff800`02cc0af8 : fffff880`00000000 fffff880`03185c58 fffffa80`06a71220 fffff800`02e82918 : nt!CcWriteBehind+0x1c6
fffff880`03185c00 fffff800`02c85261 : fffffa80`03740530 fffff800`02f72101 fffff800`02e82920 00000000`00000002 : nt!CcWorkerThread+0x1c8
fffff880`03185cb0 fffff800`02f182ea : 00000000`00000000 fffffa80`03747040 00000000`00000080 fffffa80`036ec040 : nt!ExpWorkerThread+0x111
fffff880`03185d40 fffff800`02c6c8e6 : fffff880`02f63180 fffffa80`03747040 fffff880`02f6dfc0 00000000`00000000 : nt!PspSystemThreadStartup+0x5a
fffff880`03185d80 00000000`00000000 : fffff880`03186000 fffff880`03180000 fffff880`031859e0 00000000`00000000 : nt!KxStartSystemThread+0x16
There it is! Let's move forward:
Code:
0: kd> .trap [COLOR=#ff0000][I][B]fffff880`03184c10[/B][/I][/COLOR]
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000001 rbx=0000000000000000 rcx=fffff88003184de0
rdx=fffff88003184ea0 rsi=0000000000000000 rdi=0000000000000000
rip=fffff80002c809f0 rsp=fffff88003184da0 rbp=0000000000000001
r8=fffff88003184ea0 r9=fffffffffffffffb r10=0000000000000008
r11=fffff80002c8a760 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei ng nz na pe nc
nt!KeFlushMultipleRangeTb+0x260:
fffff800`02c809f0 8b8780200000 mov eax,dword ptr [rdi+2080h] ds:00000000`00002080=????????
Code:
0: kd> knL
*** Stack trace for last set context - .thread/.cxr resets it
# Child-SP RetAddr Call Site
00 fffff880`03184da0 fffff800`02cbf95c nt!KeFlushMultipleRangeTb+0x260
01 fffff880`03184e70 fffff880`0126869c nt!MmSetAddressRangeModified+0x2b0
02 fffff880`03184f70 fffff880`01313dfb Ntfs!LfsFlushLfcb+0x5d8
03 fffff880`031850f0 fffff880`01315f10 Ntfs!LfsFlushToLsnPriv+0x143
04 fffff880`03185180 fffff880`01264274 Ntfs!LfsFlushToLsn+0xa0
05 fffff880`031851b0 fffff880`01264e73 Ntfs!NtfsCommonWrite+0x2d63
06 fffff880`03185360 fffff880`010debcf Ntfs!NtfsFsdWrite+0x1c3
07 fffff880`031855e0 fffff880`010dd6df fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x24f
08 fffff880`03185670 fffff800`02c60ecf fltmgr!FltpDispatch+0xcf
09 fffff880`031856d0 fffff800`02cbedbb nt!IoSynchronousPageWrite+0x24f
0a fffff880`03185750 fffff800`02cbd2f8 nt!MiFlushSectionInternal+0xb7b
0b fffff880`03185990 fffff800`02cbc7d9 nt!MmFlushSection+0xa4
0c fffff880`03185a50 fffff800`02cc0136 nt!CcFlushCache+0x5e9
0d fffff880`03185b50 fffff800`02cc0af8 nt!CcWriteBehind+0x1c6
0e fffff880`03185c00 fffff800`02c85261 nt!CcWorkerThread+0x1c8
0f fffff880`03185cb0 fffff800`02f182ea nt!ExpWorkerThread+0x111
10 fffff880`03185d40 fffff800`02c6c8e6 nt!PspSystemThreadStartup+0x5a
11 fffff880`03185d80 00000000`00000000 nt!KxStartSystemThread+0x16
^^ Here we can find the stored registers and the stack at the time of the interrupt.
This is where we're going to do some instruction disassembling:
Code:
0: kd> u @rip
nt!KeFlushMultipleRangeTb+0x260:
fffff800`02c809f0 8b8780200000 mov eax,dword ptr [rdi+2080h]
fffff800`02c809f6 85c0 test eax,eax
fffff800`02c809f8 75e6 jne nt!KeFlushMultipleRangeTb+0x250 (fffff800`02c809e0)
fffff800`02c809fa e955ffffff [COLOR=#ff0000][I][B]jmp nt!KeFlushMultipleRangeTb[/B][/I][/COLOR]+0x1c4 ([COLOR=#ff0000][I][B]fffff800`02c80954[/B][/I][/COLOR])
[COLOR=#ff0000][I][B]fffff800`02c809ff[/B][/I][/COLOR] 41f6c304 test r11b,4
fffff800`02c80a03 0f85d93c0500 jne nt! ?? ::FNODOBFM::`string'+0xaaab (fffff800`02cd46e2)
fffff800`02c80a09 41f6c306 test r11b,6
fffff800`02c80a0d 0f85a73d0500 jne nt! ?? ::FNODOBFM::`string'+0xab8f (fffff800`02cd47ba)
We can see there was a jump (jmp) that is back up in the
nt!KeFlushMultipleRangeTb function.
Let's go ahead and take the jmp location and instruction after the jump as the bound, to disassemble the loop:
Code:
0: kd> u fffff800`02c80954 fffff800`02c809ff
nt!KeFlushMultipleRangeTb+0x1c4:
fffff800`02c80954 410fb6c4 movzx eax,r12b
fffff800`02c80958 440f22c0 mov cr8,rax
fffff800`02c8095c 4c8bbc24a0000000 mov r15,qword ptr [rsp+0A0h]
fffff800`02c80964 4c8bb424a8000000 mov r14,qword ptr [rsp+0A8h]
fffff800`02c8096c 4c8ba424d8000000 mov r12,qword ptr [rsp+0D8h]
fffff800`02c80974 488bac24d0000000 mov rbp,qword ptr [rsp+0D0h]
fffff800`02c8097c 4c8bac24e0000000 mov r13,qword ptr [rsp+0E0h]
fffff800`02c80984 4881c4b0000000 add rsp,0B0h
fffff800`02c8098b 5f pop rdi
fffff800`02c8098c 5e pop rsi
fffff800`02c8098d 5b pop rbx
fffff800`02c8098e c3 ret
fffff800`02c8098f 8b0577262300 mov eax,dword ptr [nt!KeNumberProcessors (fffff800`02eb300c)]
fffff800`02c80995 3bc5 cmp eax,ebp
fffff800`02c80997 76a5 jbe nt!KeFlushMultipleRangeTb+0x1ae (fffff800`02c8093e)
fffff800`02c80999 4c8bd3 mov r10,rbx
fffff800`02c8099c 4c8bce mov r9,rsi
fffff800`02c8099f 4d8bc2 mov r8,r10
fffff800`02c809a2 8bd5 mov edx,ebp
fffff800`02c809a4 488bcf mov rcx,rdi
fffff800`02c809a7 48c744242805000000 mov qword ptr [rsp+28h],5
fffff800`02c809b0 4c896c2420 mov qword ptr [rsp+20h],r13
fffff800`02c809b5 e8b69b0100 call nt!KiIpiSendRequest (fffff800`02c9a570)
fffff800`02c809ba 4c8d1d9f9d0000 lea r11,[nt!KiFlushRangeWorker (fffff800`02c8a760)]
fffff800`02c809c1 4d85db test r11,r11
fffff800`02c809c4 740a je nt!KeFlushMultipleRangeTb+0x240 (fffff800`02c809d0)
fffff800`02c809c6 488d4c2440 lea rcx,[rsp+40h]
fffff800`02c809cb e8909d0000 call nt!KiFlushRangeWorker (fffff800`02c8a760)
fffff800`02c809d0 8b8780200000 mov eax,dword ptr [rdi+2080h]
fffff800`02c809d6 85c0 test eax,eax
fffff800`02c809d8 0f8476ffffff je nt!KeFlushMultipleRangeTb+0x1c4 (fffff800`02c80954)
fffff800`02c809de 6690 xchg ax,ax
fffff800`02c809e0 ffc3 inc ebx
fffff800`02c809e2 851de0292300 test dword ptr [nt!HvlLongSpinCountMask (fffff800`02eb33c8)],ebx
fffff800`02c809e8 0f84da3c0500 je nt! ?? ::FNODOBFM::`string'+0xaa91 (fffff800`02cd46c8)
fffff800`02c809ee f390 pause
fffff800`02c809f0 8b8780200000 mov eax,dword ptr [rdi+2080h]
fffff800`02c809f6 85c0 test eax,eax
fffff800`02c809f8 75e6 [COLOR=#ff0000][I][B]jne nt!KeFlushMultipleRangeTb+0x250[/B][/I][/COLOR] (fffff800`02c809e0)
fffff800`02c809fa e955ffffff [COLOR=#ff0000][I][B]jmp nt!KeFlushMultipleRangeTb+0x1c4[/B][/I][/COLOR] (fffff800`02c80954)
fffff800`02c809ff 41f6c304 test r11b,4
Unfortunately, this is where my dissassembly skills as far as x64 go with this specific processor call stack. The only thing I can possibly see is that
nt!KeFlushMultipleRangeTb+0x250 took the jump to stay in a loop.
So, what's the summary so far? Processor #0 was the thread that created the bugcheck itself, and must have been interrupted by a clock interrupt in order to trigger the CLOCK_WATCHDOG_TIMEOUT bugcheck.
Let's take a look into Processor #1's call stack like we did Processor #0:
Code:
1: kd> kv
Child-SP RetAddr : Args to Child : Call Site
fffff880`05e77210 fffff800`02c9b251 : 00000000`00000000 00000000`00000005 00000000`00000001 00000000`00224608 : nt!KeFlushMultipleRangeTb+0x266
fffff880`05e772e0 fffff800`02dacb5e : 00000000`00000005 fffff880`05e77440 fffff6fd`400423e0 fffffa80`0516c000 : nt!MiFlushTbAsNeeded+0x1d1
fffff880`05e773f0 fffff800`02c9b9b0 : 00000000`00004640 fffff800`02e0b580 00000000`00000000 00000000`00000000 : nt!MiAllocatePoolPages+0x4de
fffff880`05e77530 fffff800`02db043e : 00000000`00000000 fffff880`00000000 00000000`00000000 00000000`00004640 : nt!ExpAllocateBigPool+0xb0
fffff880`05e77620 fffff800`02c5bd9a : fffffa80`04198508 00000000`00000000 00000000`c2646641 00000000`000043b6 : nt!ExAllocatePoolWithTag+0x82e
fffff880`05e77710 fffff880`02c8d750 : fffffa80`04429a80 00000000`00000000 00000000`000043b6 00000000`00000000 : nt!ExAllocatePoolWithTagPriority+0x4a
fffff880`05e777a0 fffff880`02c9d932 : fffffa80`041986a8 fffff880`05e77ca0 fffffa80`04195010 fffffa80`041986a8 : afd!AfdGetBufferSlow+0xc0
fffff880`05e777e0 fffff800`02f983a7 : fffffa80`076ca600 fffff880`05e77ca0 fffffa80`076ca600 fffffa80`076ca600 : afd! ?? ::GFJBLGFE::`string'+0xb73f
fffff880`05e77a10 fffff800`02f98c06 : fffff880`05e77bf8 00000000`000015f0 00000000`00000001 00000000`08e01478 : nt!IopXxxControlFile+0x607
fffff880`05e77b40 fffff800`02c7ae53 : fffff880`05e77ca0 fffffa80`04baeae0 fffff880`05e77bf8 fffff880`05e77c00 : nt!NtDeviceIoControlFile+0x56
fffff880`05e77bb0 00000000`74be2e09 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 ([COLOR=#ff0000][I][B]TrapFrame @ fffff880`05e77c20[/B][/I][/COLOR])
00000000`04caea38 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x74be2e09
Code:
1: kd> .trap fffff880`05e77c20
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=0000000074be2e09 rsp=0000000004caea38 rbp=00000000050ceff0
r8=0000000000000000 r9=0000000000000000 r10=0000000000000000
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl zr na po nc
0033:00000000`74be2e09 ?? ???
Code:
1: kd> knL
*** Stack trace for last set context - .thread/.cxr resets it
# Child-SP RetAddr Call Site
00 00000000`04caea38 00000000`00000000 0x74be2e09
We can see the stack and a lot of the registers on proc #1 are zeroed.
Code:
1: kd> u @rip
00000000`74be2e09 ?? ???
^ Memory access error in 'u @rip'
:frown6: No dice on Processor #1.
Let's check Processor #2:
Code:
2: kd> kv
Child-SP RetAddr : Args to Child : Call Site
fffff880`03316840 fffff800`02c80af2 : 00000000`00000000 00000000`00000001 00000000`00000000 00000000`000015fc : nt!KiIpiSendRequestEx+0x98
fffff880`03316880 fffff800`02c65705 : 00000000`00000002 26500000`975c8025 fffff880`03316ba0 00000000`00000265 : nt!KeFlushMultipleRangeTb+0x362
fffff880`03316950 fffff800`02cfcf25 : fffffa80`03d6f3f8 fffff880`00000001 00000000`00000001 fffff880`03316bb0 : nt!MiAgeWorkingSet+0x37b
fffff880`03316b00 fffff800`02c65b06 : 00000000`000015b0 00000000`00000000 fffffa80`00000000 00000000`00000003 : nt! ?? ::FNODOBFM::`string'+0x4c7f6
fffff880`03316b80 fffff800`02c65fb3 : 00000000`00000008 fffff880`03316c10 00000000`00000001 fffffa80`00000000 : nt!MmWorkingSetManager+0x6e
fffff880`03316bd0 fffff800`02f182ea : fffffa80`037b7660 00000000`00000080 fffffa80`036ec040 00000000`00000001 : nt!KeBalanceSetManager+0x1c3
fffff880`03316d40 fffff800`02c6c8e6 : fffff880`02f63180 fffffa80`037b7660 fffff880`02f6dfc0 00000000`00000000 : nt!PspSystemThreadStartup+0x5a
fffff880`03316d80 00000000`00000000 : fffff880`03317000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16
Processor #2 appears to be trying to send an IPI using
nt!KiIpiSendRequestEx. Interestingly enough, we can see that it is caught in the same function that Processor #0 is in (
nt!KeFlushMultipleRangeTb). Again, this may be caused by a loop.
Code:
2: kd> !irql
Debugger saved IRQL for processor 0x2 -- 0 (LOW_LEVEL)
Either it's running at 0 or the IRQL despite saying 'saved' really didn't get saved. Windows Internals notes this is a possibility.
Code:
2: kd> u @rip
nt!KiIpiSendRequestEx+0x98:
fffff800`02c8aa08 8b8780200000 mov eax,dword ptr [rdi+2080h]
fffff800`02c8aa0e 85c0 test eax,eax
fffff800`02c8aa10 749e je nt!KiIpiSendRequestEx+0x40 (fffff800`02c8a9b0)
fffff800`02c8aa12 ffc3 inc ebx
fffff800`02c8aa14 851dae892200 test dword ptr [nt!HvlLongSpinCountMask (fffff800`02eb33c8)],ebx
fffff800`02c8aa1a 0f8455cffcff je nt! ?? ::FNODOBFM::`string'+0x5d50 (fffff800`02c57975)
fffff800`02c8aa20 f390 pause
fffff800`02c8aa22 ebe4 jmp nt!KiIpiSendRequestEx+0x98 (fffff800`02c8aa08) [COLOR=#ff0000][I][B]<--- @rip[/B][/I][/COLOR]
We can see here that it executes a pause instruction, and then tries again (most likely waiting for something to set a flag). With that said, it appears to be looping.
Let's now take a look at the problematic proc (#3):
Code:
3: kd> kv
Child-SP RetAddr : Args to Child : Call Site
00000000`00000000 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0
Code:
3: kd> r
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000
r8=0000000000000000 r9=0000000000000000 r10=0000000000000000
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up di pl nz na pe nc
cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000
00000000`00000000 ?? ???
Remember this? All zeroed out. As far as I know, this happens when either the bugcheck thread is unable to interrupt the target processor to gather the context (usually due to running at a high IRQL), OR because the processor itself is hung, therefore very little was preserved.
Code:
3: kd> !pcr
KPCR for Processor 3 at fffff88002fd3000:
Major 1 Minor 1
NtTib.ExceptionList: fffff88002fde540
NtTib.StackBase: fffff88002fd7f40
NtTib.StackLimit: 0000000002cef3b8
NtTib.SubSystemTib: fffff88002fd3000
NtTib.Version: 0000000002fd3180
NtTib.UserPointer: fffff88002fd37f0
NtTib.SelfTib: 000007fffffae000
SelfPcr: 0000000000000000
Prcb: fffff88002fd3180
Irql: 0000000000000000
IRR: 0000000000000000
IDR: 0000000000000000
InterruptMode: 0000000000000000
IDT: 0000000000000000
GDT: 0000000000000000
TSS: 0000000000000000
[COLOR=#ff0000][I][B] CurrentThread: fffffa80077b29a0[/B][/I][/COLOR]
NextThread: fffffa8007368b50
IdleThread: fffff88002fddfc0
DpcQueue: 0xfffff88002fd79e8 0xfffff80002d9ca90 [Normal] nt!PpmPerfAction
Code:
3: kd> !thread
[COLOR=#ff0000][I][B]THREAD fffffa80077b29a0[/B][/I][/COLOR] Cid 16e4.1404 Teb: 000007fffffae000 Win32Thread: fffff900c1b66c20 RUNNING on processor 3
Not impersonating
DeviceMap fffff8a000008ca0
Owning Process fffffa8003d6f060 Image: SearchFilterHost.exe
Attached Process N/A Image: N/A
Wait Start TickCount 355857 Ticks: 573 (0:00:00:08.938)
Context Switch Count 1079 IdealProcessor: 3 LargeStack
UserTime 00:00:00.109
KernelTime 00:00:00.046
Win32 Start Address 0x00000000770cfbf0
Stack Init fffff88008b36db0 Current fffff88008b367c0
Base [COLOR=#ff0000][I][B]fffff88008b37000 [/B][/I][/COLOR]Limit [COLOR=#ff0000][I][B]fffff88008b2f000 [/B][/I][/COLOR]Call 0
Priority 4 BasePriority 4 UnusualBoost 0 ForegroundBoost 0 IoPriority 0 PagePriority 1
Child-SP RetAddr : Args to Child : Call Site
00000000`00000000 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0
At this point, you'd go ahead and run '
dps fffff88008b2f000 fffff88008b37000' to dump the raw stack, which you'd sift through to find the first symbol after the 0xffffffff entries which has a value above it that points to a location within this stack. On an x64 processor, this is
unbelievably difficult (and the raw stack is also
huge). I will tell you however that it contained quite a few file system calls (fltmgr and Ntfs as we described much earlier).
With all of this said:
- Processor #1 is waiting for a flag to be set (likely an IPI flag). It also appears to possibly be hung.
- Processor #2 is waiting for Processors #0, #1, and/or #3 to respond to its IPI. In addition to this, it also ran at CLOCK_LEVEL (13).
- Processor #3 has been tagged as being the cause by the bugcheck. In addition to this, its registers and call stack are zeroed (either it was permanently hung or ran at a high IRQL).
Just from a face analysis, this looks like hardware. However, let's be sure of this:
1. After checking the modules list, on the software side of things, everything looks fairly clean and there isn't much mention of 3rd party. However, there is something that I would recommend removing in
THIS specific situation. Uninstall any and all Asus software you have as it's in most cases problematic bloatware. Just to give you an example, I see Asus PC Probe.
Before doing so, you may want to create a Restore Point: Windows 7 - START | type create | select "Create a Restore Point"
2. Update to the beta nVidia chipset driver from Asus' website for your motherboard -
Motherboards - M3N72-D
3. If you're still crashing after the above, I believe we may actually have a faulty processor.
-- If we reach step #3, before skipping straight to a faulty processor, I would like to run some HDD diagnostics most likely due to the file system routines. They may be entirely unrelated, however, you always want to be sure.
Regards,
Patrick