Page 1 of 2 12 LastLast
Results 1 to 20 of 28
12Thanks

Thread: Class 101 for 0x101 Bugchecks

  1. #1
    Moderator
    BSOD Kernel Dump Expert

    Join Date
    Mar 2012
    Posts
    465

    Class 101 for 0x101 Bugchecks



    NOTICE: 0x101 BUGCHECKS CANNOT BE ANALYZED USING MINIDUMPS!!! YOU WILL NEED AT LEAST A KERNEL DUMP!


    Hi guys, I'd figure I'd run you through my personal experience delving into a 0x101 bugcheck, also known as a CLOCK_WATCHDOG_TIMEOUT. This will cover:

    - Interrupts & IRQLs
    - Simple disassembly
    - Looking at PRCB and PCR
    - Observing the context of multiple processors

    Warning: At the time of writing this, I'm only 95% sure on my observation, so take this with a grain of salt. This is one of my more thorough investigations that I've come across lately on TSF, and I'm still rusty with dealing with interrupts, so I wouldn't doubt that some of my approach or understanding may be erroneous. Keep that in mind when reading this, and any corrections are welcome. I don't want to end up being the blind leading the blind. Always refer to the book Windows Internals if you want solid understanding of this stuff. Otherwise, what I see here is just my exploration into it and serves for navigating a crashdump and collecting information more effectively (it's hard doing all this without someone that knows better pointing your faults :) ).

    The example I'd like to use is the case mentioned in this thread. Note that all my work involved operating with the client's kernel dump he uploaded to a 3rd-party filesharing site. Since I doubt everyone is eager or has the resources or time to scrutinize over a whole kernel dump, I will be more descriptive in my presentation to compensate.

    So why did I use a kernel dump? Because minidumps do not retain the data required to analyze 0x101 bugchecks. I repeat: you cannot analyze 0x101 bugchecks from minidumps! Minidumps only save the processor context for the processor core that was awake and able to successfully run KeBugCheckEx and dump the crashdump. Why is this a problem? You'll understand once you start reading.

    As a primer that I used to get myself in the right direction, this article was used. It was extremely beneficial in giving me my bearings and what data I should be most concerned about when analyzing this type of bugcheck.



    _____



    We start off by viewing exactly what an 0x101 bugcheck is. Let's run the initial !analyze -v on it.

    Code:
    0: kd> !analyze -v
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************
    
    CLOCK_WATCHDOG_TIMEOUT (101)
    An expected clock interrupt was not received on a secondary processor in an
    MP system within the allocated interval. This indicates that the specified
    processor is hung and not processing interrupts.
    Arguments:
    Arg1: 0000000000000060, Clock interrupt time out interval in nominal clock ticks.
    Arg2: 0000000000000000, 0.
    Arg3: fffff880009ea180, The PRCB address of the hung processor.
    Arg4: 0000000000000001, 0.
    
    ...

    I'm sure the explanation may not be understood by some here, so I'll go through an explanation of the explanation. :)

    - An expected clock interrupt was not received on a secondary processor -
    To understand this, one must understand what interrupts are. As the name implies, an interrupt is an operation that interrupts what a processor was doing at the time. This is often crucial because sometimes a processor may be working on something relatively unimportant and something more time-sensitive and of higher priority needs to get fulfilled real quick, such as hardware I/O.

    Say a processor core is running an operation that involves drawing a window on your screen for a user application like Skype. Then, all of a sudden, the core receives a request (interrupt) to do some disk I/O to move stuff from paging file to memory. The core obviously isn't going to neglect the request and continue drawing the window. It sees the higher priority and puts everything on standby while it gets done with the request (interrupt). Once that gets done, it will move back to the lesser priority stuff it was doing.

    Note:

    As you can understand, having everything stop to take care of an interrupt sounds like it'd hang the PC. It can, and is often referred too as interrupt storms. The reason why it doesn't do it all the time is because an interrupt shouldn't be designed to satisfy the request Windows received from the hardware. If it did this all the time then nothing gets done quickly. Instead, an interrupt will construct and issue a DPC (Deferred Procedure Call) that gets queued up and serviced later in an orderly fashion. That way, the interrupt finishes VERY quickly (it has too), and the actual workload with dealing with the hardware's interrupt for service (the actual I/O work) is done in a fashion that won't halt everything else up in the system.
    In a fast-paced, multi-threaded and highly complex environment like Windows on newer hardware, it's easy to see this process of handling interrupts is more than just simply looking at a table of priorities and seeing which ones needs checking off first (which kinda does exist - called IRQLs - which we'll get too shortly). There's algorithms and whatnot involved in an attempt to provide the snappiest and smoothest means of computer operation for the user without having everything become imbalanced and chaotic swirling into a crash. But aside from all this complex methodology, the general idea is that some stuff has higher priority than others and needs faster and/or longer attention than the rest, and all of this is done to make sure that all gets taken care of decently and in order.

    Now that I've given a general idea what interrupts are and what they're for, a clock interrupt is a type of interrupt in which involves counting the actual cycles of the processor core, which is essentially running a clock on the processors to keep everything in sync. If you read in the NTDebugging blog I used as a primer, you'll read that the clock interrupt gets handed out to all the cores and all of them have to report in. When one doesn't report at all in a designated time frame, this crash happens. Now it also mentions that the clock interrupt in x86 systems has an IRQL of 28. So what's an IRQL?

    A good explanation for an IRQL is by this blog entry. It is essentially called the Interrupt Request Level, and is merely the priority of an interrupt given by Windows. The higher the number, the higher the priority. All typical activity running on a processor core is at level 0, which is called Passive or LOW_LEVEL (note that this isn't even an interrupt. An interrupt is anything above IRQL 0). When an interrupt kicks in, the IRQL reaches the level specified by the interrupt, the interrupt is serviced and then when it's done it drops back down to 0. If there were any other interrupts that popped up but at lower or same IRQLs, it will service those afterwards in an orderly fashion, dropping down the list till it goes back to IRQL 0. So while every interrupt has its own IRQL, a processor core also has it's own IRQL "state" if you will. It will service anything with the same IRQL as the state that it's in, and then will lower itself to service lower priority IRQLs until it's all back to 0 again. There's a bit more to it than that, but that's the gist of it.

    - in an MP system -
    A multi-processor system.

    - within the allocated interval -
    As mentioned before, it didn't respond in a designated time frame.

    - This indicates that the specified processor is hung and not processing interrupts -
    This can be a bit misleading. While yes, it does mean the processor is hung, it doesn't actually mean that it won't service any interrupts. In many cases it could be stuck servicing a higher IRQL interrupt but simply won't deal with anything lower than it.


    Now that we got that description taken care of, now on to business. Ultimately, there's steps we should carry out to figure this out. In essence it should be, in order:

    - what happened,
    - how it happened, and lastly,
    - why it happened.


    WHAT
    _____

    First, what happened? There was a CLOCK_WATCHDOG_TIMEOUT. Yes, we know this. How long did it timeout? 60 clock ticks the bugcheck says. Now, exactly what processor buggered out? Well, there's a couple places we can look, all of which is in the !analyze -v output:

    Code:
    CLOCK_WATCHDOG_TIMEOUT (101)
    An expected clock interrupt was not received on a secondary processor in an
    MP system within the allocated interval. This indicates that the specified
    processor is hung and not processing interrupts.
    Arguments:
    Arg1: 0000000000000060, Clock interrupt time out interval in nominal clock ticks.
    Arg2: 0000000000000000, 0.
    Arg3: fffff880009ea180, The PRCB address of the hung processor.
    Arg4: 0000000000000001, 0.
    
    Debugging Details:
    ------------------
    
    
    BUGCHECK_STR:  CLOCK_WATCHDOG_TIMEOUT_2_PROC
    
    DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT
    
    PROCESS_NAME:  System
    
    CURRENT_IRQL:  d
    
    STACK_TEXT:  
    fffff800`0402f938 fffff800`02a39463 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff880`009ea180 : nt!KeBugCheckEx
    fffff800`0402f940 fffff800`02a93c87 : fffff880`00000000 fffff800`00000000 00000000`00000000 00000000`00000000 : nt! ?? ::FNODOBFM::`string'+0x4d8e
    fffff800`0402f9d0 fffff800`0300b090 : 00000000`00000000 fffff800`0402fb80 fffff800`00000000 00000000`00000000 : nt!KeUpdateSystemTime+0x377
    fffff800`0402fad0 fffff800`02a87ab3 : 00000000`00000000 fffff800`03026460 fffffa80`041baf00 00000000`00000001 : hal!HalpRtcClockInterrupt+0x130
    fffff800`0402fb00 fffff880`03f991f2 : fffff800`02a990ca 00000000`00369e99 fffffa80`04eb44e8 fffff800`02c13c40 : nt!KiInterruptDispatchNoLock+0x163
    fffff800`0402fc98 fffff800`02a990ca : 00000000`00369e99 fffffa80`04eb44e8 fffff800`02c13c40 00000000`00000001 : amdk8!C1Halt+0x2
    fffff800`0402fca0 fffff800`02a93d5c : fffff800`02c05e80 fffff800`00000000 00000000`00000000 fffff880`03e76480 : nt!PoIdle+0x53a
    fffff800`0402fd80 00000000`00000000 : fffff800`04030000 fffff800`0402a000 fffff800`0402fd40 00000000`00000000 : nt!KiIdleLoop+0x2c
    
    
    STACK_COMMAND:  kb
    
    SYMBOL_NAME:  ANALYSIS_INCONCLUSIVE
    
    FOLLOWUP_NAME:  MachineOwner
    
    MODULE_NAME: Unknown_Module
    
    IMAGE_NAME:  Unknown_Image
    
    DEBUG_FLR_IMAGE_TIMESTAMP:  0
    
    FAILURE_BUCKET_ID:  X64_CLOCK_WATCHDOG_TIMEOUT_2_PROC_ANALYSIS_INCONCLUSIVE
    
    BUCKET_ID:  X64_CLOCK_WATCHDOG_TIMEOUT_2_PROC_ANALYSIS_INCONCLUSIVE
    
    Followup: MachineOwner
    ---------
    I highlighted in bold the places you can find the answer. Here's a rundown of each:

    Code:
    Arg3: fffff880009ea180, The PRCB address of the hung processor.
    The PRCB - or PRocessor Control Block - contains data regarding the state of the processor (note that we've been talking about logical processors this whole time, not physical). It is an extension of the PCR - Processor Control Region - though it does contain the bulk of the data concerning that particular processor. You can do a dt command for nt!_KPRCB or nt!_KPCB to get details on each structure. You can also use the Windbg extensions !prcb and !pcr to get such details on the PRCB and PCR, respectively. Just type in the number of the processor you want and it'll give you the details. Let's try !prcb, since that's what the bugcheck is referring too. Do it for each existing processor (in this case, 2 exist):

    Code:
    0: kd> !prcb 0
    PRCB for Processor 0 at fffff80002c05e80:
    Current IRQL -- 13
    Threads--  Current fffff80002c13c40 Next 0000000000000000 Idle fffff80002c13c40
    Processor Index 0 Number (0, 0) GroupSetMember 1
    Interrupt Count -- 0131a6aa
    Times -- Dpc    00000368 Interrupt 0000036d 
             Kernel 000b11d9 User      0003e82a 
    
    0: kd> !prcb 1
    PRCB for Processor 1 at fffff880009ea180:
    Current IRQL -- 0
    Threads--  Current fffff880009f4f40 Next fffffa8005a95b60 Idle fffff880009f4f40
    Processor Index 1 Number (0, 1) GroupSetMember 2
    Interrupt Count -- 0176aaf8
    Times -- Dpc    00002cc2 Interrupt 00001e01 
             Kernel 000d4b6e User      0001ac92
    See the address for the PRCB for Processor 1? That matches the address mentioned in this third argument for the bugcheck code, so that's the irresponsible processor right there.

    Note:

    - Processor numbers start at 0, so the first processor will be called "Processor 0". There are a couple exceptions, as we'll see later, but that's usually the case.
    - The information provided by these commands are basic. If you want all the nitty gritty on the processor, you'll want to dump the contents of the PRCB structure using the dt command. Take the address of the PRCB mentioned to use as the address for the nt!_KPRCB symbols to feed on. In this case, the syntax for the dt command will be:

    dt nt!_KPRCB fffff880009ea180
    Code:
    Arg4: 0000000000000001, 0.
    This is a bit of a hidden surprise. It's an undocumented argument for this bugcheck, but it apparently reveals the processor number right there. Again, processor numbers start at 0, so this would be the second processor.

    Code:
    BUGCHECK_STR:  CLOCK_WATCHDOG_TIMEOUT_2_PROC
    
    FAILURE_BUCKET_ID: X64_CLOCK_WATCHDOG_TIMEOUT_2_PROC_ANALYSIS_INCONCLUSIVE
    
    BUCKET_ID:  X64_CLOCK_WATCHDOG_TIMEOUT_2_PROC_ANALYSIS_INCONCLUSIVE
    Bucket IDs are just what Microsoft uses to get an idea on the cause of crash when reports are submitted. In many cases they have been helpful in discovering clues on cause, but in this case it's just the usual fare. As you can see, there's a discrepancy between the number of the processor provided here and the one in the rest of the data. This is one of those few and far inbetween exceptions to the rule. It's just to clarify for individuals looking at it as to exactly what processor it is, without having to remember the numbering starts at 0. So 2 would just mean "second processor".


    HOW
    _____

    Ok, so far we've explained what just happened: Processor 1 (second processor) reached 60 clock ticks without responding to the clock interrupt and so the system BSODs. Now, how did this take place? To determine this, we need to look at what was going on during this time before Windows intervened and crashed the system.

    Our first step in this venture involves checking the IRQL of each processor before the system decided to crash. During the crash process, the IRQL is altered so nothing else gets in its way, so it is pointless to see the latest existing IRQL. Fortunately, Windows developers are aware of this and made it so that the IRQL of the processor before the crash process kicked in was saved. You can grab this saved IRQL using the !irql command, followed by the number of the processor.

    Code:
    0: kd> !irql 0
    Debugger saved IRQL for Processor 0x0 -- 13
    0: kd> !irql 1
    Debugger saved IRQL for Processor 0x1 -- 0 (LOW_LEVEL)
    As you can see, the IRQL of the first processor is 13 (which is CLOCK for x64 processors) and the second is at 0. So we can see that only Processor 0 was at CLOCK level. In the NTDebugging blog - and previously mentioned - the clock interrupt for x86 processors is 28, but since this is an x64 environment, things change around a bit (fewer IRQL levels) and it is 13 for x64 environments.

    Note: For information about interrupts, consult the Windows Internals book, pages 85-114 (5th edition)
    Note: compare the IRQL mentioned in !irql with what you see from !pcr and !prcb. Notice something? The IRQL for Processor 1 is listed as 0 in the PCR, but 13 in the PRCB and in the !irql output. That's because the PCR saves the current IRQL, but the PRCB stores the one previously before control passed to the bugcheck code. Details are mentioned in Windows Internals page 95 (5th edition). You may also consult the help documentation on Windbg for !irql.
    Now that we have the IRQL, let's garner some more information. We'll look at the callstack present in each processor to see what was the latest activity taking place on them. You can use a variation of the k command to do this. I use kv as it's most verbose. Let's start with Processor 0:

    Code:
    0: kd> kv
    Child-SP          RetAddr           : Args to Child                                                           : Call Site
    fffff800`0402f938 fffff800`02a39463 : 00000000`00000101 00000000`00000060 00000000`00000000 fffff880`009ea180 : nt!KeBugCheckEx
    fffff800`0402f940 fffff800`02a93c87 : fffff880`00000000 fffff800`00000001 00000000`00002626 00000000`00000000 : nt! ?? ::FNODOBFM::`string'+0x4d8e
    fffff800`0402f9d0 fffff800`0300b090 : 00000000`00000000 fffff800`0402fb80 fffff800`03026460 00000000`00000000 : nt!KeUpdateSystemTime+0x377
    fffff800`0402fad0 fffff800`02a87ab3 : 00000000`00000000 fffff800`03026460 fffffa80`041baf00 00000000`00000001 : hal!HalpRtcClockInterrupt+0x130
    fffff800`0402fb00 fffff880`03f991f2 : fffff800`02a990ca 00000000`00369e99 fffffa80`04eb44e8 fffff800`02c13c40 : nt!KiInterruptDispatchNoLock+0x163 (TrapFrame @ fffff800`0402fb00)
    fffff800`0402fc98 fffff800`02a990ca : 00000000`00369e99 fffffa80`04eb44e8 fffff800`02c13c40 00000000`00000001 : amdk8!C1Halt+0x2
    fffff800`0402fca0 fffff800`02a93d5c : fffff800`02c05e80 fffff800`00000000 00000000`00000000 fffff880`03e76480 : nt!PoIdle+0x53a
    fffff800`0402fd80 00000000`00000000 : fffff800`04030000 fffff800`0402a000 fffff800`0402fd40 00000000`00000000 : nt!KiIdleLoop+0x2c
    Now let's see latest thread running on Processor 1. In order to do so, we should change the processor context of Windbg. Currently Windbg is displaying all the data on what's taking place on Processor 0. You can tell because of the number visible in the Windbg prompt:

    Code:
    0: kd>
    The number displays which processor is current. You can switch to the other processor using the tilde (~) key, followed by the number of the processor. So let's do that and dump the callstack afterwards:

    Code:
    0: kd> ~1
    1: kd> kv
    Child-SP          RetAddr           : Args to Child                                                           : Call Site
    fffff880`02f1bc98 fffff800`02a990ca : 00000000`00369e99 fffffa80`04ec1ab8 fffff880`009f4f40 00000000`00000001 : amdk8!C1Halt+0x2
    fffff880`02f1bca0 fffff800`02a93d5c : fffff880`009ea180 fffff880`00000000 00000000`00000000 fffff880`03e76480 : nt!PoIdle+0x53a
    fffff880`02f1bd80 00000000`00000000 : fffff880`02f1c000 fffff880`02f16000 fffff880`02f1bd40 00000000`00000000 : nt!KiIdleLoop+0x2c
    Ok, so we got the callstack for each existing thread at the time of the crash for both our processors. Now we have to interpret what's going on with each. Here's some stuff we can conjure up from these two callstacks:


    1. Both started with the IdleLoop routine, which is essentially the start of the System Idle Process you see in Task Manager. So they both were sitting and waiting to do something.
    2. Processor 0 then received an interrupt. This interrupt happened to be a clock interrupt.
    3. The clock interrupt then involved updating the system time. This is something that is replicated across all processors so that all the processors update their own timers and things are kept track of.
    4. Processor 0 was the one that performed the bugcheck.



    So from what we can tell by this, Processor 0 took care of a systematic clock interrupt that keeps track of the timing of things, while Processor 1 apparently didn't even both with it and just stayed dormant. This is further expressed by the IRQL for Processor 0 being 13 (clock interrupt) and the IRQL for Processor 1 being 0. This is getting suspicious. But for now, let's move on.


    WHY
    _____

    So far we've gained enough information to where we can see that a clock interrupt (as expected) popped up and tried to do its routine thing of keeping the system timer up-to-date. Processor 0 had no problem handling it, and did so just fine. However, we can see that when it came time for Processor 1 to do the same, nothing happened. The interrupt did not pop up on the callstack, nor did the IRQL change. Things continued to lie dormant, and Processor 1 was sound asleep. Why did this happen?

    The initial and immediate suspect right here is the 3rd-party driver and its routine present in both callstacks:

    Code:
    amdk8!C1Halt+0x2
    This is present as part of the whole Idle Loop (System Idle Process) that goes on when there's a wait on doing stuff. Instead of the default routine that takes place here (I think it's hal!HalProcessorIdle), it has been replaced by the one provided by amdk8.sys. Since this has been a consistent issue with the victim, it seems rather evident that we're dealing with a possibly buggy driver. However, we will need to dig even further to verify this suspicion. We will actually need to disassemble the code here to see what the problem is. So let's go ahead and do that.

    There are a number of ways you can disassemble code in Windbg. The most common (and relatively easiest) is to use the Disassembly window provided by Windbg. Go to View followed by Disassembly to bring it up. Then in the prompt labeled Offset, type in the address you desire to start at. We can easily discover this by looking at the callstack for the thread on Processor 0 (not 1). What we want is the return address in the frame (each row in the callstack is called a frame of the callstack) above the amdk8!C1Halt frame. What these return addresses show is that when a routine performs a return instruction after it's done its work, it returns to the frame below it at address specified. We can use this address to look at what's inside amdk8!C1Halt.

    Code:
    0: kd> kv
    Child-SP          RetAddr           : Args to Child                                                           : Call Site
    fffff800`0402f938 fffff800`02a39463 : 00000000`00000101 00000000`00000060 00000000`00000000 fffff880`009ea180 : nt!KeBugCheckEx
    fffff800`0402f940 fffff800`02a93c87 : fffff880`00000000  fffff800`00000001 00000000`00002626 00000000`00000000 : nt! ??  ::FNODOBFM::`string'+0x4d8e
    fffff800`0402f9d0 fffff800`0300b090 : 00000000`00000000  fffff800`0402fb80 fffff800`03026460 00000000`00000000 :  nt!KeUpdateSystemTime+0x377
    fffff800`0402fad0 fffff800`02a87ab3 : 00000000`00000000  fffff800`03026460 fffffa80`041baf00 00000000`00000001 :  hal!HalpRtcClockInterrupt+0x130
    fffff800`0402fb00 fffff880`03f991f2 : fffff800`02a990ca  00000000`00369e99 fffffa80`04eb44e8 fffff800`02c13c40 :  nt!KiInterruptDispatchNoLock+0x163 (TrapFrame @ fffff800`0402fb00) < the return address we want (it's above the amdk8!C1Halt+0x2 frame)
    fffff800`0402fc98 fffff800`02a990ca : 00000000`00369e99 fffffa80`04eb44e8 fffff800`02c13c40 00000000`00000001 : amdk8!C1Halt+0x2 < the frame
    fffff800`0402fca0 fffff800`02a93d5c : fffff800`02c05e80 fffff800`00000000 00000000`00000000 fffff880`03e76480 : nt!PoIdle+0x53a
    fffff800`0402fd80 00000000`00000000 : fffff800`04030000  fffff800`0402a000 fffff800`0402fd40 00000000`00000000 :  nt!KiIdleLoop+0x2c
    Why did I decide that we take a look at the callstack for Processor 0 instead of 1 for the answer? Because as you can tell, the very last frame in the callstack for Processor 1 is amdk8!C1Halt. We need the return address from a frame above amdk8!C1Halt in order for it to help us. If we grabbed the return address from the frame with amdk8!C1Halt in it, it would bring us to the instruction address that amdk8!C1Halt would return from if it was finished. This address would direct towards the function below it, which is specifically nt!PoIdle+0x53a. So we'd be looking in the code for nt!PoIdle and not amdk8!C1Halt.

    If you're wondering if there's an easier and more reliable answer, there is. You can simply take the symbol name for the routine that's mentioned in the callstack and use that instead of the return address for the Offset requested in the Disassembly window. So if we wanna look exactly at amdk8!C1Halt+0x2, we'll just slap that exactly as the offset and it'll figure it out from there. This also works for the u command and other variations mentioned next. In addition, it will also work the same for routines listed in the callstack that lack symbols. So if this is so much easier, why explain the alternative? Because it'll teach you some things about what's going on in a callstack and code flow.

    Another common way to get details on this is to use the uf command followed by the address of where you want to start (which we got from that return address). In this case, it will show the entire function (in this case, amdk8!C1Halt) and that's it. We really don't need to see anything more (like what the Disassembly window displays) so this option is conveniently concise.

    You can also do u and variants thereof, but I'd rather have the Windbg help documentation explain it for ya.

    Let's try the uf command:

    Code:
    1: kd> uf fffff880`03f991f2
    amdk8!C1Halt:
    fffff880`03f991f0 fb              sti
    fffff880`03f991f1 f4              hlt
    fffff880`03f991f2 c3              ret
    Notice even though we gave it address fffff880`03f991f2 it started with fffff880`03f991f0. That's because it simply presents the entire function whatever address we gave it is sitting in. So no matter where in the function we give it, it'll display the whole thing from start to finish (more or less). Now what to explain from this output. We see three instructions: sti (set interrupt flag), hlt (halt) and ret (return). Do a google and you'll find that this is also present in the generic hal!HalProcessorIdle routine that amdk8!C1Halt replaced.

    Honestly, with research you will find there's no code difference or really anything that could bug this code out. It's standard fare, and nothing suspect about it. This has also been confirmed by the person that has been victimized by these crashes: they updated their chipset drivers that covers this specific driver, and no change has been made at all to the code (nor has the problem been resolved). So we're left with a rather innocuous little bit of code.

    But wait now, what does this code actually do? Well, the first instruction is sti, or the set interrupt flag instruction. It sets the processor to be ready to process any interrupts that comes its way. The next instruction, hlt, halts the processor until the next interrupt pops up. Then there's the standard ret or return. So, wait a minute, if it was set to accept any interrupts and wait until it gets one, why hasn't the clock interrupt popped up on Processor 1's callstack? Wasn't this all done to get it to wait and ready to grab on the next available interrupt? Why didn't it even bother starting to deal with the clock interrupt?

    We can get even more info on this by understanding how the sti instruction works. It does so by setting a flag register (which is only either 0 or 1, hence the name 'flag register') called the interrupt flag (if) from 0 to 1 if it has not already been 1, otherwise it just leaves it as 1. Once this flag is set, this processor should accept any interrupts coming its way. So, the last thing we can check here is the interrupt flag. Maybe, perhaps, the sti instruction set the interrupt flag to 1 and something else later changed it back to 0! Let's take a look. Remember to be in the processor context of Processor 1 when you do this. Type the r command (register) followed by the name of the register you want, which in this case is if.

    Code:
    1: kd> r if
    if=1
    So there it is. The interrupt flag at the time was in fact 1. It should've handled an interrupt to it, but it didn't. Nothing here reveals that it should've stopped Processor 1 from accepting the clock interrupt which would've prevented this crash. Instead, it continued to sleep, and crashed it went.




    What you see here is the example of what very well appears to be a perfectly good condition, but somehow something still went wrong. It means there had to be a hardware issue involved, and since everything else says "everything's cool" we can only point blame at the CPU simply not responding to the interrupt despite being given the green light to do so.

    And sure enough, that appears to have been the case. At the time of this writing, the client has installed a new replacement CPU and has had no issues since.

    Thanks for reading this, and I hope it has entertained and enlightened you on some stuff, as well as given you questions that hopefully I or someone else can answer. Again, if you suspected any errors (typos, misconceptions, etc.), don't hesitate to pass comment to me. Have fun, guys!


    Note:

    One thing I want to add concerning this particular 0x101 bugcheck is for people to observe the context for Processor 1 (the processor core that hung). Notice that the entire context is intact: the registers, the callstack of the latest thread, etc., all are present and preserved. If Processor 1 was hung, how could this be? Just so you know, while the entire processor context is stored into memory (which can be later dumped like everything else into a crashdump during a crash), the processor still is responsible to fill in that context. So how did everything get preserved if the processor involved was stuck?

    The only way I see it could happen, is that Processor 1 did wake up during the crash to report everything in its context to fill in for the crashdump. What this could mean is that Processor 1 core wasn't exactly hung permanently, but rather it took too long to be aware that there was a clock interrupt that need servicing (or any interrupt for that matter) and it took its sweet time to wake up. It did eventually wake up for the crash, but it failed to do so in a prompt manner for the clock interrupt. It would mean the processor itself is out of sync, which is a hardware malfunction (hence why the answer was replacing the CPU).

    You will see 0x101 bugchecks in which for whatever reason (hardware or driver) a processor core or two was not responsive during the crashdump (and so it was really hung), and so not everything for its context was preserved (a clear example is if you attempt to dump the callstack using kv you'll get only one frame, which is empty, and all registers will be zeroed). This is explained in the article I used as a primer that I linked too at the beginning of this little 'tutorial', as in that case the processor contexts weren't saved all the way, so the only way for the person to get a callstack was to find the last thread that ran on it and dump the raw stack and then reconstruct it from there using a special variation of the kv command. Unfortunately, because the example in that primer was on a x32 system, that attempt only applies to x32 systems. I have yet to figure out how to do the same for x64 systems.

    Food for thought.
    Last edited by Vir Gnarus; 07-17-2012 at 09:17 AM.
    niemiro, tom982, Shintaro and 1 others say thanks for this.

  2. #2
    Administrator
    Windows Update Expert
    Developer
    niemiro's Avatar
    Join Date
    Mar 2012
    Location
    District 12
    Posts
    5,262

    Re: Class 101 for 0x101 Bugchecks

    Thanks a lot for this very informative post.

    I have a user with this kernel dump: http://www.mediafire.com/?ubbabtx5tpwty3a

    It looks almost identical to the one you showed here, and I wondered whether you could think of anything other than hardware? I guess I will ask to update chipset drivers, and see if that does it. A bit desperate, I know.

    Also, the OP claims that the computer works fine in Safe Mode, but crashes in normal mode. Finally, I notice that this computer seems to have 8 cores. That seems like quite a few. Do you think this may even be a multi-processor machine, perhaps even a small server? I will ask the OP.

    Thanks a lot for any insight you may be able to offer.

    Code:
    Microsoft (R) Windows Debugger Version 6.2.8229.0 AMD64
    Copyright (c) Microsoft Corporation. All rights reserved.
    
    Loading Dump File [D:\MEMORY (2).DMP]
    Kernel Summary Dump File: Only kernel address space is available
    Symbol search path is: SRV*D:\Symbols*http://msdl.microsoft.com/download/symbols
    Executable search path is: 
    Windows Server 2008/Windows Vista Kernel Version 6002 (Service Pack 2) MP (8 procs) Free x64
    Product: WinNt, suite: TerminalServer SingleUserTS Personal
    Built by: 6002.18607.amd64fre.vistasp2_gdr.120402-0336
    Machine Name:
    Kernel base = 0xfffff800`03003000 PsLoadedModuleList = 0xfffff800`031c7dd0
    Debug session time: Sun Jun 24 14:10:20.781 2012 (UTC + 1:00)
    System Uptime: 0 days 0:01:39.562
    Loading Kernel Symbols
    ...............................................................
    ..................................................
    Loading User Symbols
    Loading unloaded module list
    .....
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************
    Use !analyze -v to get detailed debugging information.
    BugCheck 101, {18, 0, fffffa60019d8180, 3}
    *** ERROR: Module load completed but symbols could not be loaded for intelppm.sys
    Probably caused by : Unknown_Image ( ANALYSIS_INCONCLUSIVE )
    Followup: MachineOwner
    ---------
    0: kd> !analyze -v
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************
    CLOCK_WATCHDOG_TIMEOUT (101)
    An expected clock interrupt was not received on a secondary processor in an
    MP system within the allocated interval. This indicates that the specified
    processor is hung and not processing interrupts.
    Arguments:
    Arg1: 0000000000000018, Clock interrupt time out interval in nominal clock ticks.
    Arg2: 0000000000000000, 0.
    Arg3: fffffa60019d8180, The PRCB address of the hung processor.
    Arg4: 0000000000000003, 0.
    Debugging Details:
    ------------------
    
    BUGCHECK_STR:  CLOCK_WATCHDOG_TIMEOUT_8_PROC
    DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT
    PROCESS_NAME:  System
    CURRENT_IRQL:  d
    STACK_TEXT:  
    fffff800`04416a98 fffff800`030193a0 : 00000000`00000101 00000000`00000018 00000000`00000000 fffffa60`019d8180 : nt!KeBugCheckEx
    fffff800`04416aa0 fffff800`030543aa : 00000000`00000000 fffff800`04416bc0 fffffa80`08765330 fffff800`03548320 : nt! ?? ::FNODOBFM::`string'+0x2de4
    fffff800`04416ae0 fffff800`0352b8af : 00000000`00000000 fffff800`04416bc0 fffff800`03548320 fffffa80`08d91170 : nt!KeUpdateSystemTime+0xea
    fffff800`04416b10 fffff800`03053b6d : 00000000`00000000 fffff800`03548320 00000000`00000000 fffffa60`0390b6d6 : hal!HalpRtcClockInterrupt+0x127
    fffff800`04416b40 fffffa60`00d407a2 : fffffa60`00d3f685 fffff800`04410000 00000000`00000000 00000000`00000001 : nt!KiInterruptDispatchNoLock+0x14d
    fffff800`04416cd8 fffffa60`00d3f685 : fffff800`04410000 00000000`00000000 00000000`00000001 00000000`0000000c : intelppm+0x37a2
    fffff800`04416ce0 fffff800`0305f173 : 0000003d`d5f3b80e 00000000`00000000 fffffa80`00000001 fffff800`03179a80 : intelppm+0x2685
    fffff800`04416d10 fffff800`0305ee91 : fffff800`03176680 fffff800`00000000 00000000`0f088bae 00000000`00000000 : nt!PoIdle+0x183
    fffff800`04416d80 fffff800`0322e860 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x21
    fffff800`04416db0 00000000`fffff800 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!zzz_AsmCodeRange_End+0x4
    fffff800`044100b0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00680000`00000000 : 0xfffff800
    
    STACK_COMMAND:  kb
    SYMBOL_NAME:  ANALYSIS_INCONCLUSIVE
    FOLLOWUP_NAME:  MachineOwner
    MODULE_NAME: Unknown_Module
    IMAGE_NAME:  Unknown_Image
    DEBUG_FLR_IMAGE_TIMESTAMP:  0
    FAILURE_BUCKET_ID:  X64_CLOCK_WATCHDOG_TIMEOUT_8_PROC_ANALYSIS_INCONCLUSIVE
    BUCKET_ID:  X64_CLOCK_WATCHDOG_TIMEOUT_8_PROC_ANALYSIS_INCONCLUSIVE
    Followup: MachineOwner
    ---------
    
    0: kd> !prcb 0
    PRCB for Processor 0 at fffff80003176680:
    Current IRQL -- 13
    Threads--  Current fffff8000317bb80 Next 0000000000000000 Idle fffff8000317bb80
    Number 0 SetMember 1
    Interrupt Count -- 0001471f
    Times -- Dpc    000000bc Interrupt 00000018 
             Kernel 000018ab User      00000000 
    
    0: kd> !prcb 1
    PRCB for Processor 1 at fffffa60005ec180:
    Current IRQL -- 0
    Threads--  Current fffffa60005f5d40 Next 0000000000000000 Idle fffffa60005f5d40
    Number 1 SetMember 2
    Interrupt Count -- 0000c511
    Times -- Dpc    00000000 Interrupt 00000000 
             Kernel 0000189a User      00000000 
    
    0: kd> !prcb 2
    PRCB for Processor 2 at fffffa6001966180:
    Current IRQL -- 0
    Threads--  Current fffffa600196fd40 Next fffffa80054d7bb0 Idle fffffa600196fd40
    Number 2 SetMember 4
    Interrupt Count -- 0000bc7e
    Times -- Dpc    00000015 Interrupt 00000000 
             Kernel 000012ec User      00000000 
    
    0: kd> !prcb 3
    PRCB for Processor 3 at fffffa60019d8180:
    Current IRQL -- 0
    Threads--  Current fffffa8009df3bb0 Next 0000000000000000 Idle fffffa60019e1d40
    Number 3 SetMember 8
    Interrupt Count -- 0000bf6c
    Times -- Dpc    00000001 Interrupt 00000002 
             Kernel 000012d2 User      00000000 
    
    0: kd> !prcb 4
    PRCB for Processor 4 at fffffa6001a43180:
    Current IRQL -- 0
    Threads--  Current fffffa80054e6210 Next 0000000000000000 Idle fffffa6001a4cd40
    Number 4 SetMember 10
    Interrupt Count -- 0000939e
    Times -- Dpc    00000000 Interrupt 00000000 
             Kernel 00001897 User      00000000 
    
    0: kd> !prcb 5
    PRCB for Processor 5 at fffffa6001ab5180:
    Current IRQL -- 0
    Threads--  Current fffffa6001abed40 Next 0000000000000000 Idle fffffa6001abed40
    Number 5 SetMember 20
    Interrupt Count -- 000091dc
    Times -- Dpc    00000000 Interrupt 00000010 
             Kernel 00001895 User      00000000 
    
    0: kd> !prcb 6
    PRCB for Processor 6 at fffffa6001b27180:
    Current IRQL -- 0
    Threads--  Current fffffa80054eebb0 Next 0000000000000000 Idle fffffa6001b30d40
    Number 6 SetMember 40
    Interrupt Count -- 0000bdf3
    Times -- Dpc    00000001 Interrupt 00000004 
             Kernel 00001155 User      00000000 
    
    0: kd> !prcb 7
    PRCB for Processor 7 at fffffa6001b99180:
    Current IRQL -- 0
    Threads--  Current fffffa80069e6bb0 Next 0000000000000000 Idle fffffa6001ba2d40
    Number 7 SetMember 80
    Interrupt Count -- 0000bff6
    Times -- Dpc    00000000 Interrupt 00000001 
             Kernel 0000114e User      00000000 
    
    0: kd> !prcb 8
    Cannot get PRCB address
    
    
    0: kd> !irql 0
    Debugger saved IRQL for processor 0x0 -- 13
    0: kd> !irql 1
    Debugger saved IRQL for processor 0x1 -- 0 (LOW_LEVEL)
    0: kd> !irql 2
    Debugger saved IRQL for processor 0x2 -- 0 (LOW_LEVEL)
    0: kd> !irql 3
    Debugger saved IRQL for processor 0x3 -- 0 (LOW_LEVEL)
    0: kd> !irql 4
    Debugger saved IRQL for processor 0x4 -- 0 (LOW_LEVEL)
    0: kd> !irql 5
    Debugger saved IRQL for processor 0x5 -- 0 (LOW_LEVEL)
    0: kd> !irql 6
    Debugger saved IRQL for processor 0x6 -- 0 (LOW_LEVEL)
    0: kd> !irql 7
    Debugger saved IRQL for processor 0x7 -- 0 (LOW_LEVEL)
    
    
    0: kd> ~0
    0: kd> kv
    Child-SP          RetAddr           : Args to Child                                                           : Call Site
    fffff800`04416a98 fffff800`030193a0 : 00000000`00000101 00000000`00000018 00000000`00000000 fffffa60`019d8180 : nt!KeBugCheckEx
    fffff800`04416aa0 fffff800`030543aa : 00000000`00000000 fffff800`04416bc0 fffffa80`08765330 fffff800`03548320 : nt! ?? ::FNODOBFM::`string'+0x2de4
    fffff800`04416ae0 fffff800`0352b8af : 00000000`00000000 fffff800`04416bc0 fffff800`03548320 fffffa80`08d91170 : nt!KeUpdateSystemTime+0xea
    fffff800`04416b10 fffff800`03053b6d : 00000000`00000000 fffff800`03548320 00000000`00000000 fffffa60`0390b6d6 : hal!HalpRtcClockInterrupt+0x127
    fffff800`04416b40 fffffa60`00d407a2 : fffffa60`00d3f685 fffff800`04410000 00000000`00000000 00000000`00000001 : nt!KiInterruptDispatchNoLock+0x14d (TrapFrame @ fffff800`04416b40)
    fffff800`04416cd8 fffffa60`00d3f685 : fffff800`04410000 00000000`00000000 00000000`00000001 00000000`0000000c : intelppm+0x37a2
    fffff800`04416ce0 fffff800`0305f173 : 0000003d`d5f3b80e 00000000`00000000 fffffa80`00000001 fffff800`03179a80 : intelppm+0x2685
    fffff800`04416d10 fffff800`0305ee91 : fffff800`03176680 fffff800`00000000 00000000`0f088bae 00000000`00000000 : nt!PoIdle+0x183
    fffff800`04416d80 fffff800`0322e860 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x21
    fffff800`04416db0 00000000`fffff800 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!zzz_AsmCodeRange_End+0x4
    fffff800`044100b0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00680000`00000000 : 0xfffff800
    
    0: kd> ~1
    1: kd> kv
    Child-SP          RetAddr           : Args to Child                                                           : Call Site
    fffffa60`0191bcd8 fffffa60`00d3f685 : fffffa80`054d7720 fffffa60`005f5d40 fffffa60`00000001 fffffa60`0191bd50 : intelppm+0x37a2
    fffffa60`0191bce0 fffff800`0305f173 : 00000000`00000001 fffffa80`054d7818 fffffa80`054d7720 fffffa60`005f5d40 : intelppm+0x2685
    fffffa60`0191bd10 fffff800`0305ee91 : fffffa60`005ec180 fffffa60`00000000 00000000`0f096483 00000000`00000000 : nt!PoIdle+0x183
    fffffa60`0191bd80 fffff800`0322e860 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x21
    fffffa60`0191bdb0 00000000`fffffa60 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!zzz_AsmCodeRange_End+0x4
    fffffa60`005efd00 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00680000`00000000 : 0xfffffa60
    
    
    1: kd> uf fffffa60`00d407a2
    intelppm+0x37a0:
    fffffa60`00d407a0 fb              sti
    fffffa60`00d407a1 f4              hlt
    fffffa60`00d407a2 c3              ret
    
    1: kd> uf fffffa60`00d3f685
    intelppm+0x267c:
    fffffa60`00d3f67c 4883ec28        sub     rsp,28h
    fffffa60`00d3f680 e81b110000      call    intelppm+0x37a0 (fffffa60`00d407a0)
    fffffa60`00d3f685 33c0            xor     eax,eax
    fffffa60`00d3f687 4883c428        add     rsp,28h
    fffffa60`00d3f68b c3              ret
    
    
    1: kd> ~3
    3: kd> r if
    if=1
    
    
    3: kd> !thread
    THREAD fffffa8009df3bb0  Cid 0234.0238  Teb: 000007fffffdd000 Win32Thread: fffff900c0004d50 RUNNING on processor 3
    Not impersonating
    DeviceMap                 fffff880000073d0
    Owning Process            fffffa8009e03040       Image:         csrss.exe
    Attached Process          N/A            Image:         N/A
    Wait Start TickCount      5915           Ticks: 457 (0:00:00:07.140)
    Context Switch Count      140            IdealProcessor: 3                 LargeStack
    UserTime                  00:00:00.000
    KernelTime                00:00:00.468
    Win32 Start Address 0x0000000049d6153c
    Stack Init fffffa600569cdb0 Current fffffa600569b360
    Base fffffa600569d000 Limit fffffa6005695000 Call 0
    Priority 13 BasePriority 13 PriorityDecrement 0 IoPriority 2 PagePriority 5
    *** ERROR: Symbol file could not be found.  Defaulted to export symbols for nvlddmkm.sys - 
    Child-SP          RetAddr           : Args to Child                                                           : Call Site
    fffffa60`0569b6c8 fffff800`03527699 : 00000000`00000010 00000000`00000246 fffffa60`0569b6f0 00000000`00000018 : hal!HalpPciReadMmConfigUlong+0x7
    fffffa60`0569b6d0 fffff800`035274aa : 00000000`00000000 fffffa60`0569b800 00000000`00000040 fffff800`0351b000 : hal!HalpPCIPerformConfigAccess+0x55
    fffffa60`0569b700 fffff800`035272ef : fffffa60`0569b800 00000000`00000000 00000000`00000000 fffffa60`0569b8d0 : hal!HalpPCIConfigHoldingConfigLock+0x17a
    fffffa60`0569b750 fffff800`035270d8 : 00000000`00000000 fffffa60`0569b8d0 fffffa60`0569b800 00000000`00000040 : hal!HalpPCIConfig+0x87
    fffffa60`0569b790 fffff800`03526d1c : 00000000`00000000 00000000`00000000 00000000`00000040 fffff800`0353aa80 : hal!HalpReadPCIConfig+0x60
    fffffa60`0569b7d0 fffff800`03528190 : 00000000`00000002 fffff800`03526d9a 00000000`00000000 00000000`0000000a : hal!HalpGetPCIData+0x89
    fffffa60`0569b8a0 fffffa60`02c17c44 : 00000000`00000000 00000000`00000000 00000000`00000028 fffffa60`0569b9d0 : hal!HalGetBusDataByOffset+0x9c
    fffffa60`0569b990 fffffa60`02cdc48e : 00000000`00000000 00000000`0000ffff 00000000`00000007 00000000`00000000 : nvlddmkm+0x208c44
    fffffa60`0569b9d0 fffffa60`02cdffa4 : fffffa80`ffff8086 fffffa80`08a72870 fffffa60`03537888 fffffa80`08a72c41 : nvlddmkm+0x2cd48e
    fffffa60`0569ba50 fffffa60`02ce0344 : fffffa80`09f40300 fffffa80`09f4d000 fffffa80`08a72870 fffffa80`08757610 : nvlddmkm+0x2d0fa4
    fffffa60`0569bab0 fffffa60`02cd2867 : fffffa80`08a705e3 fffffa80`08a72870 fffffa80`08a710de fffffa80`08a72870 : nvlddmkm+0x2d1344
    fffffa60`0569bb40 fffffa60`02c0231e : fffffa80`09f4d000 fffffa80`08a72870 fffffa80`08a72870 fffffa80`09e5e010 : nvlddmkm+0x2c3867
    fffffa60`0569bb70 fffffa60`02d7f2db : fffffa80`09e5e010 fffffa80`09e5e010 fffffa80`08a72870 00000000`00000012 : nvlddmkm+0x1f331e
    fffffa60`0569bbb0 fffffa60`02cc860c : 00000000`00000000 00000000`00000000 00000000`00000001 00000000`00000012 : nvlddmkm+0x3702db
    fffffa60`0569bbf0 fffffa60`02ccd5c8 : 00000000`00000000 00000000`00000012 00000000`00000000 fffffa80`09f63d30 : nvlddmkm+0x2b960c
    fffffa60`0569bc20 fffffa60`02c383e5 : 00000000`00000000 fffffa80`09f4d000 00000000`00000000 fffffa80`095ca000 : nvlddmkm+0x2be5c8
    fffffa60`0569bcb0 fffffa60`02c0a5cd : fffffa80`09f4d000 fffffa80`09f4d000 00000000`00000001 00000000`00000001 : nvlddmkm+0x2293e5
    fffffa60`0569bce0 fffffa60`02c0a73b : fffffa60`00000000 00000000`d0000000 00000000`00000000 00000000`00000000 : nvlddmkm+0x1fb5cd
    fffffa60`0569bdb0 fffffa60`02b2ae91 : 00000000`00000000 00000000`00000000 00000000`d0000000 00000000`00000000 : nvlddmkm+0x1fb73b
    fffffa60`0569be70 fffffa60`02b2b3d0 : fffffa80`08d3d000 fffffa60`02b2ae0f 00000000`00000001 00000000`00000000 : nvlddmkm+0x11be91
    fffffa60`0569bf20 fffffa60`02ae8292 : fffffa80`0017f71e fffffa80`08d3d000 fffffa80`09ea2240 fffffa80`09ea2240 : nvlddmkm+0x11c3d0
    fffffa60`0569bf60 fffffa60`02a6473a : fffffa80`08d3d000 fffffa60`00000001 fffffa80`08d3d000 00000000`00000000 : nvlddmkm+0xd9292
    fffffa60`0569c020 fffffa60`03749ca9 : fffffa80`08d3d000 fffffa80`08d3d000 fffffa60`0569c990 fffffa60`0569c8d0 : nvlddmkm+0x5573a
    fffffa60`0569c560 fffffa60`03753389 : fffffa60`03749c27 fffffa80`08d3d000 fffffa60`0569c990 fffffa80`08b2d72c : nvlddmkm!nvDumpConfig+0x23f999
    fffffa60`0569c600 fffffa60`03756d25 : fffffa80`08b2d72c fffffa80`08d3d000 fffffa60`0569c990 00000000`00000000 : nvlddmkm!nvDumpConfig+0x249079
    fffffa60`0569c7f0 fffffa60`03882b46 : fffffa80`08d3d000 fffffa80`08b2d72c fffffa80`08b2d728 fffff800`030df8b8 : nvlddmkm!nvDumpConfig+0x24ca15
    fffffa60`0569c830 fffffa60`0388073a : 00000000`40020056 00000000`00000000 fffffa80`08b2d040 00000000`00000000 : dxgkrnl!DpiDxgkDdiStartDevice+0x62
    fffffa60`0569c880 fffffa60`03880baa : fffffa80`00000000 00000000`00000000 00000000`00000000 fffffa80`08b3fd80 : dxgkrnl!DpiFdoStartAdapter+0x382
    fffffa60`0569c9e0 fffffa60`0387b66f : 00000000`00000000 00000000`00000000 00000000`00000000 fffffa60`0569cca0 : dxgkrnl!DpiFdoStartAdapterThread+0x17a
    fffffa60`0569ca70 fffffa60`038f71be : fffffa60`00000000 00000000`00000000 00000000`00000000 00000000`00292d00 : dxgkrnl!DpiSessionCreateCallback+0x1b
    fffffa60`0569caa0 fffffa60`038f70f6 : 00000000`00000000 00000000`00000054 fffffa80`09e03040 00000000`00000000 : watchdog!SMgrSessionOpen+0x42
    fffffa60`0569cae0 fffff960`00043ecb : fffffa80`09e03040 00000000`000007ff fffffa60`0569cb48 00000000`00000000 : watchdog!SMgrNotifySessionChange+0x22
    fffffa60`0569cb20 fffff960`00046a9c : fffffa80`0000067b fffffa60`0569cca0 fffffa80`054d5080 00000000`000007ff : win32k!InitializeGreCSRSS+0x23
    fffffa60`0569cbe0 fffff800`0305a573 : fffffa80`09df3bb0 000007fe`fd7d8a20 fffffa80`09f3f630 00000000`0018f808 : win32k!NtUserInitialize+0x13c
    fffffa60`0569cc20 000007fe`fd72cd9a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffffa60`0569cc20)
    00000000`0018f768 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x7fe`fd72cd9a
    
    
    3: kd> r
    rax=00000000ffffffff rbx=0000000000000028 rcx=ffffffffffd18000
    rdx=fffffa600569b818 rsi=fffffa600569b818 rdi=0000000000000018
    rip=fffff80003533b47 rsp=fffffa600569b6c8 rbp=ffffffffffd18000
     r8=0000000000000018  r9=0000000000000018 r10=0000000000000000
    r11=0000000000000000 r12=fffff8000353a980 r13=0000000000000003
    r14=fffffa600569b907 r15=fffff800035424a0
    iopl=0         nv up ei pl zr na po nc
    cs=0010  ss=0018  ds=0000  es=0000  fs=0000  gs=0000             efl=00000246
    hal!HalpPciReadMmConfigUlong+0x7:
    fffff800`03533b47 8902            mov     dword ptr [rdx],eax ds:fffffa60`0569b818=ffffffff
    Patrick says thanks for this.

  3. #3
    Moderator
    Microsoft MVP
    BSOD Kernel Dump Expert
    Contributor
    Patrick's Avatar
    Join Date
    Jun 2012
    Location
    Long Island, New York
    Age
    21
    Posts
    3,038

    Re: Class 101 for 0x101 Bugchecks

    Wow, this is a great post... thank you for the bump, Richard. In the past, when I've seen 101's... I have always dreaded it because I didn't know how to analyze them, and now I have much to look for :)

  4. #4
    Moderator
    BSOD Kernel Dump Expert

    Join Date
    Mar 2012
    Posts
    465

    Re: Class 101 for 0x101 Bugchecks

    Good job on the approach. Looking at intelppm.sys wasn't really necessary since in this case when you look at the running thread for the faulting proc (proc 3) you can see that intelppm was not involved but rather nvlddmkm.sys, or the PCI-E bus, as the last few frames in the callstack show. In the specific situation I was dealing with in the OP, the amd chipset driver was responsible, but not in your case. I'll take a look at the kernel dump myself, but from what I see it looks like you'll want to ask the guy to remove the graphics card, clean up any foreign material that may be in the slot, and then reinsert it, as well as update graphics drivers if they haven't already.

    I'm concerned about one thing though, in that you actually are retrieving a thread with all its info n stuff from the faulting proc, which isn't really supposed to happen if that proc was actually frozen. I would think what took place is the IRQL that proc was on at the time was higher than clock interrupt but not higher than the bugcheck, but if that was the case why didn't it save the IRQL (which shows up as 0), or if it did successfully save it, then why on earth would a thread at IRQL 0 stop a clock interrupt?

    Perplexing, but I'd like to look into it further. One of the things I'd like to do is check to see if anything in the callstack actually called to increase the IRQL (KeRaiseIRQL). There's a script someone made at codemachine.com that will parse through a module to see if there's any calls it makes to a function name that you give it, which is very convenient. It's not perfect, but does the job well. There's other ways of approaching this as well, but I'll determine that to the best of my ability when I take a look at it.
    niemiro says thanks for this.

  5. #5
    Moderator
    Microsoft MVP
    BSOD Kernel Dump Expert
    Contributor
    Patrick's Avatar
    Join Date
    Jun 2012
    Location
    Long Island, New York
    Age
    21
    Posts
    3,038

    Re: Class 101 for 0x101 Bugchecks

    Oh, and a question, what exactly is intelppm.sys? I know it's the Intel Processor Driver, but what is this specific driver in charge of?

  6. #6
    Moderator
    BSOD Kernel Dump Expert

    Join Date
    Mar 2012
    Posts
    465

    Re: Class 101 for 0x101 Bugchecks

    From what I understand, CPU drivers like that are commonly used to just handle a lot of ACPI stuff like the newer power settings and dynamic overclocking stuff that's built into CPUs nowadays. There's nothing really critical for them as anything of necessity is built into Windows kernel to begin with.

    I have to admit that 0x101 bugchecks are probably one of the nastiest buggers to analyze because they often end up with a lot of missing data which needs to be reconstructed manually, and/or having to deal with analyzing interprocess communications.
    Patrick says thanks for this.

  7. #7
    Moderator
    Microsoft MVP
    BSOD Kernel Dump Expert
    Contributor
    Patrick's Avatar
    Join Date
    Jun 2012
    Location
    Long Island, New York
    Age
    21
    Posts
    3,038

    Re: Class 101 for 0x101 Bugchecks

    Very annoying indeed. Thanks for the explanation! :)

  8. #8
    BSOD Kernel Dump Analysis Shintaro's Avatar
    Join Date
    Jun 2012
    Location
    Sydney, Australia
    Age
    45
    Posts
    168

    Re: Class 101 for 0x101 Bugchecks

    Vir Gnarus,

    Outstanding post!
    Try to live an ordinary life, in a non-ordinary way.

  9. #9
    Moderator
    Microsoft MVP
    BSOD Kernel Dump Expert
    Contributor
    Patrick's Avatar
    Join Date
    Jun 2012
    Location
    Long Island, New York
    Age
    21
    Posts
    3,038

    Re: Class 101 for 0x101 Bugchecks

    Giving this a bump as I require some assistance in learning how to do this :)

    I'm working on a BSOD situation right now in which the user has attached many dumps, all of which are 101 bugchecks. Every single one of them. The user recently attached one today that mentioned it couldn't find the symbols for the Intel Storage Drivers (iaStor.sys), and after updating them the issue still remains and new dumps are still pointing to iaStor.sys..

    For example, here' where I am having trouble. Here's the dump file:

    Read More:


    Okay, I first notice that the dump says it was timed out for 19 clock ticks.

    Next, it mentions to run a !prcb command on the amount of processors. If I am reading this and understanding this correctly, there are 8 processors in the dump I have shown. So, with that known, I would run a !prcb on 1 through 8, right? Well I am going wrong somewhere, see here.

    Here's what happens when I run a !prcb on processor 0:

    0: kd> !prcb 0
    PRCB for Processor 0 at fffff780ffff0000:
    Current IRQL -- 13
    Threads-- Current fffffa8009fa03a0 Next fffffa800d8c8060 Idle fffff8000320bcc0
    Processor Index 0 Number (0, 0) GroupSetMember 1
    Interrupt Count -- 00123ee0
    Times -- Dpc 000003af Interrupt 000007e1
    Kernel 00008075 User 00000c2a
    Works just fine. Alright, let's move onto processor 1:

    0: kd> !prcb 1
    Cannot get PRCB address
    Nope.

    Let's try processor 2:

    0: kd> !prcb 2
    Cannot get PRCB address
    Nope.

    3?

    0: kd> !prcb 3
    Cannot get PRCB address
    Nada.

    So far, I have only been able to get the 0 processor to show info... am I going wrong somewhere?

    Debugging/Reverse Engineering Blog

    “Be kind whenever possible. It is always possible.”

    - Dalai Lama


  10. #10
    Administrator
    Windows Update Expert
    Developer
    niemiro's Avatar
    Join Date
    Mar 2012
    Location
    District 12
    Posts
    5,262

    Re: Class 101 for 0x101 Bugchecks

    It looks like you are doing things correctly. Are you trying to get this information out of a minidump? I don't think that minidumps retain that information, or enough information to properly diagnose this particular bugcheck. You might have to ask the user to upload something bigger

    Richard

  11. #11
    BSOD Kernel Dump Analysis Shintaro's Avatar
    Join Date
    Jun 2012
    Location
    Sydney, Australia
    Age
    45
    Posts
    168

    Re: Class 101 for 0x101 Bugchecks

    Minidumps only have the information on the process and cpu that crashed. If you want to be able to change CPU's then you need a full memory dump as far as I know.

    EDIT: CPU's are zero numbered. On a 4 core CPU: CPU1 = 0, CPU2 = 1, CPU3 = 2, CPU4 = 3.

    To digress slightly,
    The way that I determine how many CPU's there are is from this line:

    Windows 7 Kernel Version 7601 (Service Pack 1) MP (4 procs) Free x64
    As I understand the above line:
    Win 7 with SP1, Multi-processor ( 4 procs/cpu) Free (not the checked build) x64 (64 bit version)

    I am no guru, but if you want to upload a minidump we could have a look.
    niemiro and Patrick say thanks for this.
    Try to live an ordinary life, in a non-ordinary way.

  12. #12
    Moderator
    Microsoft MVP
    BSOD Kernel Dump Expert
    Contributor
    Patrick's Avatar
    Join Date
    Jun 2012
    Location
    Long Island, New York
    Age
    21
    Posts
    3,038

    Re: Class 101 for 0x101 Bugchecks

    Ah, you guys are awesome. That's exactly why then, they are minidumps.

    Thanks! :)

    Debugging/Reverse Engineering Blog

    “Be kind whenever possible. It is always possible.”

    - Dalai Lama


  13. #13
    BSOD Kernel Dump Analysis Shintaro's Avatar
    Join Date
    Jun 2012
    Location
    Sydney, Australia
    Age
    45
    Posts
    168

    Re: Class 101 for 0x101 Bugchecks

    Mate,

    Its all about sharing information and learning.
    Patrick says thanks for this.
    Try to live an ordinary life, in a non-ordinary way.

  14. #14
    Moderator
    Microsoft MVP
    BSOD Kernel Dump Expert
    Contributor
    Patrick's Avatar
    Join Date
    Jun 2012
    Location
    Long Island, New York
    Age
    21
    Posts
    3,038

    Re: Class 101 for 0x101 Bugchecks

    Quote Originally Posted by Shintaro View Post
    Mate,

    Its all about sharing information and learning.
    Absolutely.

    Debugging/Reverse Engineering Blog

    “Be kind whenever possible. It is always possible.”

    - Dalai Lama


  15. #15
    Moderator
    BSOD Kernel Dump Expert

    Join Date
    Mar 2012
    Posts
    465

    Re: Class 101 for 0x101 Bugchecks

    I added a line at the top of the article to get people's attention of the somber truth that minidumps are worthless for these bugchecks.

    Btw, when or if you plan on looking at the kernel dump, a quick way of running through data on all the processors instead of having to tediously having to churn out the thread for each processor is using the !running command:

    Code:
    0: kd> !running -it
    
    System Processors:  (000000000000000f)
      Idle Processors:  (0000000000000000) (0000000000000000) (0000000000000000) (0000000000000000)
    
           Prcbs             Current         (pri) Next            (pri) Idle
      0    fffff80002df3e80  fffffa8006261060 ( 8)                       fffff80002e01cc0  ................
    
    Child-SP          RetAddr           Call Site
    fffff880`08cd6918 fffff800`02cd7f3a nt!KeBugCheckEx
    fffff880`08cd6920 fffff800`02c8ace7 nt! ?? ::FNODOBFM::`string'+0x4e2e
    fffff880`08cd69b0 fffff800`031f4895 nt!KeUpdateSystemTime+0x377
    fffff880`08cd6ab0 fffff800`02c7d713 hal!HalpHpetClockInterrupt+0x8d
    fffff880`08cd6ae0 00000000`7518cffd nt!KiInterruptDispatchNoLock+0x163
    00000000`0008e350 00000000`00000000 0x7518cffd
    
      1    fffff880009ec180  fffff880009f6fc0 ( 0) fffffa800394f540 (22) fffff880009f6fc0  ................
    
    Child-SP          RetAddr           Call Site
    00000000`00000000 00000000`00000000 0x0
    
      2    fffff88002f65180  fffffa80036aead0 (16) fffffa80036ae5e0 (23) fffff88002f6ffc0  ................
    
    Child-SP          RetAddr           Call Site
    fffff880`03316740 fffff800`02c6afe4 nt!KeFlushMultipleRangeTb+0x260
    fffff880`03316810 fffff800`02cf94f5 nt!MiAgeWorkingSet+0x64a
    fffff880`033169c0 fffff800`02c6b116 nt! ?? ::FNODOBFM::`string'+0x4cd46
    fffff880`03316a40 fffff800`02c6b5cb nt!MmWorkingSetManager+0x6e
    fffff880`03316a90 fffff800`02f17e6a nt!KeBalanceSetManager+0x1c3
    fffff880`03316c00 fffff800`02c71f06 nt!PspSystemThreadStartup+0x5a
    fffff880`03316c40 00000000`00000000 nt!KiStartSystemThread+0x16
    
      3    fffff88002fd7180  fffffa80037c02c0 (11) fffffa80065ffb50 (21) fffff88002fe1fc0  ................
    
    Child-SP          RetAddr           Call Site
    fffff880`09462880 fffff800`02cb5acd nt!KxFlushEntireTb+0x93
    fffff880`094628c0 fffff800`02cd9e90 nt!KeFlushTb+0x119
    fffff880`09462940 fffff800`02c8de8d nt! ?? ::FNODOBFM::`string'+0xae02
    fffff880`09462980 fffff800`02c7f2ee nt!MmAccessFault+0xa7d
    fffff880`09462ae0 00000000`6862ce6a nt!KiPageFault+0x16e
    00000000`0e29fba0 00000000`00000000 0x6862ce6a
    The "i" argument causes it to display idle procs too, and "t" displays the stack trace for the thread running on each proc. Of course, to get the IRQLs and some other PRCB/PCR data, you unfortunately still have to type !irql or !prcb for each separate proc. Oh, and concerning the line of periods at the end of each, those are related to spinlock queues. Read up on it from the Windbg help manual for !running, as well as Windows Internals book on the chapter about Processes and Threads.
    niemiro and Patrick say thanks for this.

  16. #16
    Moderator
    Microsoft MVP
    BSOD Kernel Dump Expert
    Contributor
    Patrick's Avatar
    Join Date
    Jun 2012
    Location
    Long Island, New York
    Age
    21
    Posts
    3,038

    Re: Class 101 for 0x101 Bugchecks

    Right on, thanks a lot Vir!

    Debugging/Reverse Engineering Blog

    “Be kind whenever possible. It is always possible.”

    - Dalai Lama


  17. #17
    BSOD Kernel Dump Analysis Shintaro's Avatar
    Join Date
    Jun 2012
    Location
    Sydney, Australia
    Age
    45
    Posts
    168

    Re: Class 101 for 0x101 Bugchecks

    Another cool command. But hidden in there more reading.
    Try to live an ordinary life, in a non-ordinary way.

  18. #18
    Member Cayden's Avatar
    Join Date
    Jul 2012
    Location
    Toronto Canada
    Age
    20
    Posts
    178
    • specs System Specs
      • Manufacturer:
        Gateway
      • Model Number:
        DX4320-02e
      • Operating System:
        Windows 7 HP

    Re: Class 101 for 0x101 Bugchecks

    Wow this was as informative as it was amusing to read! Great work, I learned things.

  19. #19
    Registered Member
    Join Date
    Mar 2013
    Location
    Wauchope, Australia
    Posts
    3

    Re: Class 101 for 0x101 Bugchecks

    Hi Richard,

    I stumbled upon your post searching for my BSOD issue and besides having an Arg1 of 19 in lieu of your clients 18, my details are exactly the same.

    I read Vir Gnarus suggests "to remove the graphics card, clean up any foreign material that may be in the slot, and then reinsert it, as well as update graphics drivers if they haven't already."

    I'm currently downloading the latest drivers for my GEForce 570, and will clean up any foreign material that may be in the slot when the pc is shutdown, but I'm wondering if this rectified the BSOD issue for your client?

    Cheers

    Graham

  20. #20
    Moderator
    BSOD Kernel Dump Expert

    Join Date
    Mar 2012
    Posts
    465

    Re: Class 101 for 0x101 Bugchecks

    Huh? Who? What? <.>;;

Page 1 of 2 12 LastLast

LinkBacks (?)

  1. 10-20-2014, 06:39 PM
  2. 10-19-2014, 11:55 PM
  3. 10-19-2014, 11:45 PM
  4. 07-16-2014, 08:03 AM
  5. 06-09-2014, 04:49 AM
  6. 05-10-2014, 07:13 AM
  7. 02-13-2014, 10:04 AM
  8. 01-29-2014, 10:50 PM
  9. 01-09-2014, 10:58 AM
  10. 11-26-2013, 03:09 PM
  11. 11-19-2013, 11:04 PM
  12. 10-29-2013, 03:25 AM
  13. 10-05-2013, 11:41 AM
  14. 09-08-2013, 11:35 AM
  15. 09-02-2013, 09:25 AM
  16. 08-14-2013, 02:58 AM
  17. 08-12-2013, 02:44 AM
  18. 08-07-2013, 04:40 AM
  19. 07-25-2013, 10:40 AM
  20. 06-18-2013, 10:55 PM
  21. 02-10-2013, 10:32 PM

Similar Threads

  1. Privacy 101: Skype Leaks Your Location
    By JMH in forum Security News
    Replies: 0
    Last Post: 03-22-2013, 08:52 PM
  2. 0x101 "A clock interrupt was not received on a secondary processor..."
    By oatcoatedstoat in forum BSOD, Crashes, Kernel Debugging
    Replies: 89
    Last Post: 06-08-2012, 08:49 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •