1. #1

    Join Date
    Mar 2012
    Posts
    469

    The Raw Truth - Using Raw Stacks

    Howdy all, just a short tip here that I use frequently to get more nitty gritty on what was going on during a crash. For any crashdump where the last running thread is relevant to the cause, sometimes you may not get any important clues on a potential suspect at first glance. Take this !analyze -v output for instance (crashdump attached to this thread):

    Code:
    2: kd> !analyze -v
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************
    
    DRIVER_VERIFIER_IOMANAGER_VIOLATION (c9)
    The IO manager has caught a misbehaving driver.
    Arguments:
    Arg1: 000000000000023e, A driver has marked an IRP pending but didn't return STATUS_PENDING.
    Arg2: fffffa8005aea2c0, The address in the driver's code where the error was detected.
    Arg3: fffff9800a9aaee0, IRP address.
    Arg4: 0000000000000000, Status code.
    
    Debugging Details:
    ------------------
    
    TRIAGER: Could not open triage file : C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\triage\modclass.ini, error 2
    
    OVERLAPPED_MODULE: Address regions for 'watchdog' and 'cdrom.sys' overlap
    
    BUGCHECK_STR:  0xc9_23e
    
    DRIVER_VERIFIER_IO_VIOLATION_TYPE:  23e
    
    FAULTING_IP: 
    +0
    fffffa80`05aea2c0 4883ec48        sub     rsp,48h
    
    FOLLOWUP_IP: 
    cdrom!DeviceSendRequestSynchronously+c9
    fffff880`0838be95 488b0da490ffff  mov     rcx,qword ptr [cdrom!WdfDriverGlobals (fffff880`08384f40)]
    
    IRP_ADDRESS:  fffff9800a9aaee0
    
    DEVICE_OBJECT: fffffa8005f0b060
    
    CUSTOMER_CRASH_COUNT:  1
    
    DEFAULT_BUCKET_ID:  VERIFIER_ENABLED_VISTA_MINIDUMP
    
    PROCESS_NAME:  System
    
    CURRENT_IRQL:  2
    
    LOCK_ADDRESS:  fffff800034e1b80 -- (!locks fffff800034e1b80)
    
    Resource @ nt!PiEngineLock (0xfffff800034e1b80)    Available
    
    WARNING: SystemResourcesList->Flink chain invalid. Resource may be corrupted, or already deleted.
    
    
    WARNING: SystemResourcesList->Blink chain invalid. Resource may be corrupted, or already deleted.
    
    1 total locks
    
    PNP_TRIAGE: 
        Lock address  : 0xfffff800034e1b80
        Thread Count  : 0
        Thread address: 0x0000000000000000
        Thread wait   : 0x0
    
    LAST_CONTROL_TRANSFER:  from fffff8000376c3dc to fffff800032e2c40
    
    STACK_TEXT:  
    fffff880`033c39a8 fffff800`0376c3dc : 00000000`000000c9 00000000`0000023e fffffa80`05aea2c0 fffff980`0a9aaee0 : nt!KeBugCheckEx
    fffff880`033c39b0 fffff800`0377647a : fffff800`0376a9f0 fffffa80`05aea2c0 fffff980`0a9aaee0 00000000`00000000 : nt!VerifierBugCheckIfAppropriate+0x3c
    fffff880`033c39f0 fffff800`0377734e : 00000000`0000023e 00000000`00000000 fffff980`0a9aaee0 00000000`ffffffff : nt!ViErrorFinishReport+0xda
    fffff880`033c3a40 fffff800`0378271f : fffffa80`061af820 fffffa80`0617fab0 fffffa80`0617fb68 fffff880`015a8228 : nt!VfErrorReport6+0x6e
    fffff880`033c3b10 fffff800`03782b63 : fffffa80`0617fab0 00000000`00000002 fffffa80`05f0b060 fffff880`015a43f4 : nt!IovpCallDriver2+0x13f
    fffff880`033c3b70 fffff800`03788c2e : fffff980`0a9aaee0 fffff980`0a9aaee0 00000000`00000002 fffffa80`05f0b060 : nt!VfAfterCallDriver+0x353
    fffff880`033c3bc0 fffff880`00d680c2 : fffff880`00de17c0 fffffa80`0614ea20 00000000`00000025 fffffa80`0617fab0 : nt!IovCallDriver+0x57e
    fffff880`033c3c20 fffff880`00d7d59f : fffffa80`063b8da0 00000000`00000000 fffff880`00000000 fffff880`083775d5 : Wdf01000!FxIoTarget::SubmitSync+0x24a
    fffff880`033c3cd0 fffff880`0838be95 : fffffa80`00000020 fffffa80`063b8da0 fffffa80`0614ea20 00000000`00000001 : Wdf01000!imp_WdfRequestSend+0x24b
    fffff880`033c3d20 fffff880`0838c2f4 : 00000000`00000000 00000000`00000024 00000000`00000001 fffff880`033c3ee0 : cdrom!DeviceSendRequestSynchronously+0xc9
    fffff880`033c3d80 fffff880`0838af04 : 0000057f`f9f447e8 fffff880`033c3ee0 fffffa80`060eb830 00000000`00000024 : cdrom!DeviceSendSrbSynchronously+0x3ec
    fffff880`033c3eb0 fffff880`08389825 : fffffa80`00000002 fffffa80`060bbb80 fffff880`083813b8 0000057f`f9f447e8 : cdrom!DeviceCacheDeviceInquiryData+0xbc
    fffff880`033c3f80 fffff880`00dd29d4 : 00000000`00000000 fffff880`00000010 00000000`0000000c fffff880`00ddb620 : cdrom!DeviceEvtSelfManagedIoInit+0x105
    fffff880`033c3fc0 fffff880`00dd294e : fffffa80`0612aeb0 00000000`0000000c fffff880`00ddb620 fffff880`00ddb600 : Wdf01000!FxSelfManagedIoMachine::Init+0x34
    fffff880`033c3ff0 fffff880`00dc6b9b : 00000000`00000000 00000000`00000000 00000000`00000000 fffff880`00ddb620 : Wdf01000!FxSelfManagedIoMachine::ProcessEvent+0x14a
    fffff880`033c4060 fffff880`00dc66eb : 00000000`00000312 00000000`00000000 fffff880`00ddb600 fffff6fc`c00544e0 : Wdf01000!FxPkgPnp::PowerD0StartingStartSelfManagedIo+0x2f
    fffff880`033c4090 fffff880`00dc638e : fffffa80`0614db60 00000000`00000040 fffff880`00ddb3e0 00000000`00000040 : Wdf01000!FxPkgPnp::PowerEnterNewState+0x1db
    fffff880`033c41c0 fffff880`00dc606c : 00000000`00000000 00000000`00000040 00000000`00000501 fffffa80`0614db60 : Wdf01000!FxPkgPnp::PowerProcessEventInner+0x13e
    fffff880`033c4230 fffff880`00dd3b5e : 00000000`00000000 00000000`00000501 00000000`00000000 00000000`00000501 : Wdf01000!FxPkgPnp::PowerProcessEvent+0x1d4
    fffff880`033c42c0 fffff880`00dd3abb : 00000000`00000501 00000000`00000000 00000000`00000500 fffff800`037875ec : Wdf01000!FxPkgPnp::NotPowerPolOwnerStarting+0xe
    fffff880`033c42f0 fffff880`00dcf950 : fffffa80`0614db60 00000000`00000001 00000000`00000000 fffff880`00dd7ff0 : Wdf01000!FxPkgPnp::NotPowerPolicyOwnerEnterNewState+0x1c7
    fffff880`033c4370 fffff880`00dcf372 : 00000000`00000000 00000000`00000001 00000000`00000000 fffffa80`0614ee20 : Wdf01000!FxPkgPnp::PowerPolicyProcessEventInner+0x3f8
    fffff880`033c43e0 fffff880`00dcba45 : 00000000`00000000 fffffa80`0614ee20 00000000`00000108 00000000`00000000 : Wdf01000!FxPkgPnp::PowerPolicyProcessEvent+0x1e2
    fffff880`033c4470 fffff880`00dcb841 : 00000000`00000101 00000000`00000108 00000000`00000108 fffff800`00000006 : Wdf01000!FxPkgPnp::PnpEventHardwareAvailable+0x111
    fffff880`033c44b0 fffff880`00dcb4fe : fffffa80`0614db60 fffff880`033c45e0 00000000`00000004 fffff880`00dda390 : Wdf01000!FxPkgPnp::PnpEnterNewState+0x1a5
    fffff880`033c4520 fffff880`00dcb201 : 00000000`00000000 00000000`00000002 00000000`00000000 fffffa80`0614db60 : Wdf01000!FxPkgPnp::PnpProcessEventInner+0x122
    fffff880`033c4590 fffff880`00dc2d9c : 00000000`00000000 fffffa80`0614db60 00000000`00000002 00000000`00000000 : Wdf01000!FxPkgPnp::PnpProcessEvent+0x1b1
    fffff880`033c4620 fffff880`00dc1dd6 : fffffa80`0614cd30 00000000`00000002 00000000`00000000 fffffa80`0614db60 : Wdf01000!FxPkgPnp::_PnpStartDevice+0x20
    fffff880`033c4650 fffff880`00d91245 : fffff980`0a82ee50 fffff980`0a82ee50 fffffa80`0614ee20 00000000`00000000 : Wdf01000!FxPkgPnp::Dispatch+0x1b2
    fffff880`033c46c0 fffff880`00d9114b : fffffa80`0614ee20 fffff980`0a82ee50 00000000`00000002 fffffa80`0614ee20 : Wdf01000!FxDevice::Dispatch+0xa9
    fffff880`033c46f0 fffff800`03788c16 : fffff980`0a82ee50 00000000`00000002 fffffa80`0614ee20 fffffa80`0614ee20 : Wdf01000!FxDevice::DispatchWithLock+0x93
    fffff880`033c4730 fffff800`0369bd6e : fffff980`0a82ee50 fffffa80`06186830 fffffa80`0614ee20 fffffa80`061b9a40 : nt!IovCallDriver+0x566
    fffff880`033c4790 fffff800`033d387d : fffffa80`05f0b060 fffffa80`06186830 fffff800`033dcfa0 00000000`00000000 : nt!PnpAsynchronousCall+0xce
    fffff880`033c47d0 fffff800`036ab0b6 : fffff800`034e1940 fffffa80`0614a890 fffffa80`06186830 fffffa80`0614aa38 : nt!PnpStartDevice+0x11d
    fffff880`033c4890 fffff800`036ab354 : fffffa80`0614a890 fffffa80`039e002b fffffa80`039ef010 00000000`00000001 : nt!PnpStartDeviceNode+0x156
    fffff880`033c4920 fffff800`036cea86 : fffffa80`0614a890 fffffa80`039ef010 00000000`00000002 00000000`00000000 : nt!PipProcessStartPhase1+0x74
    fffff880`033c4950 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!PipProcessDevNodeTree+0x296
    
    
    STACK_COMMAND:  kb
    
    SYMBOL_STACK_INDEX:  9
    
    SYMBOL_NAME:  cdrom!DeviceSendRequestSynchronously+c9
    
    FOLLOWUP_NAME:  MachineOwner
    
    MODULE_NAME: cdrom
    
    IMAGE_NAME:  cdrom.sys
    
    DEBUG_FLR_IMAGE_TIMESTAMP:  4ce79298
    
    FAILURE_BUCKET_ID:  X64_0xc9_23e_VRF_cdrom!DeviceSendRequestSynchronously+c9
    
    BUCKET_ID:  X64_0xc9_23e_VRF_cdrom!DeviceSendRequestSynchronously+c9
    
    Followup: MachineOwner
    ---------
    This occurred when the client turned on DV (Driver Verifier) and had his computer quickly go into reboot loop after having done so. Yet despite the analysis engine's best intentions, it came up with cdrom.sys as the culprit, which is probably not the cause. Nothing in the running thread's callstack unwind showed any other drivers of suspicion, and unfortunately for us, while we can run the address of the pending IRP through !irp extension, we get no valuable details from this minidump (like the device object). So far this seems like we're stuck.

    But wait! We have one trick we can pull, and that's dump the raw stack for this thread. The stack unwind information shows us what the analysis engine was able to produce as being most likely the callstack for this thread. However, there may be references to other functions and whatnot that are not present in this stack unwind. Therefore we'll want to dump the raw stack to show details on all the different variables and registers that were involved in this thread that it decided to preserve on the stack. So, to do that, we start with getting details on the thread first using !thread:

    Code:
    2: kd> !thread
    GetPointerFromAddress: unable to read from fffff80003515000
    THREAD fffffa80039d6680  Cid 0004.0040  Teb: 0000000000000000 Win32Thread: 0000000000000000 RUNNING on processor 2
    Not impersonating
    GetUlongFromAddress: unable to read from fffff80003454ba4
    Owning Process            fffffa80039b8740       Image:         System
    Attached Process          N/A            Image:         N/A
    fffff78000000000: Unable to get shared data
    Wait Start TickCount      1064         
    Context Switch Count      4460           IdealProcessor: 2             
    ReadMemory error: Cannot get nt!KeMaximumIncrement value.
    UserTime                  00:00:00.000
    KernelTime                00:00:00.000
    Win32 Start Address nt!ExpWorkerThread (0xfffff800032ecef0)
    Stack Init fffff880033c4db0 Current fffff880033c3840
    Base fffff880033c5000 Limit fffff880033bf000 Call 0
    Priority 13 BasePriority 12 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
    Child-SP          RetAddr           : Args to Child                                                           : Call Site
    fffff880`033c39a8 fffff800`0376c3dc : 00000000`000000c9 00000000`0000023e fffffa80`05aea2c0 fffff980`0a9aaee0 : nt!KeBugCheckEx
    fffff880`033c39b0 fffff800`0377647a : fffff800`0376a9f0 fffffa80`05aea2c0 fffff980`0a9aaee0 00000000`00000000 : nt!VerifierBugCheckIfAppropriate+0x3c
    fffff880`033c39f0 fffff800`0377734e : 00000000`0000023e 00000000`00000000 fffff980`0a9aaee0 00000000`ffffffff : nt!ViErrorFinishReport+0xda
    fffff880`033c3a40 fffff800`0378271f : fffffa80`061af820 fffffa80`0617fab0 fffffa80`0617fb68 fffff880`015a8228 : nt!VfErrorReport6+0x6e
    fffff880`033c3b10 fffff800`03782b63 : fffffa80`0617fab0 00000000`00000002 fffffa80`05f0b060 fffff880`015a43f4 : nt!IovpCallDriver2+0x13f
    fffff880`033c3b70 fffff800`03788c2e : fffff980`0a9aaee0 fffff980`0a9aaee0 00000000`00000002 fffffa80`05f0b060 : nt!VfAfterCallDriver+0x353
    fffff880`033c3bc0 fffff880`00d680c2 : fffff880`00de17c0 fffffa80`0614ea20 00000000`00000025 fffffa80`0617fab0 : nt!IovCallDriver+0x57e
    fffff880`033c3c20 fffff880`00d7d59f : fffffa80`063b8da0 00000000`00000000 fffff880`00000000 fffff880`083775d5 : Wdf01000!FxIoTarget::SubmitSync+0x24a
    fffff880`033c3cd0 fffff880`0838be95 : fffffa80`00000020 fffffa80`063b8da0 fffffa80`0614ea20 00000000`00000001 : Wdf01000!imp_WdfRequestSend+0x24b
    fffff880`033c3d20 fffff880`0838c2f4 : 00000000`00000000 00000000`00000024 00000000`00000001 fffff880`033c3ee0 : cdrom!DeviceSendRequestSynchronously+0xc9
    fffff880`033c3d80 fffff880`0838af04 : 0000057f`f9f447e8 fffff880`033c3ee0 fffffa80`060eb830 00000000`00000024 : cdrom!DeviceSendSrbSynchronously+0x3ec
    fffff880`033c3eb0 fffff880`08389825 : fffffa80`00000002 fffffa80`060bbb80 fffff880`083813b8 0000057f`f9f447e8 : cdrom!DeviceCacheDeviceInquiryData+0xbc
    fffff880`033c3f80 fffff880`00dd29d4 : 00000000`00000000 fffff880`00000010 00000000`0000000c fffff880`00ddb620 : cdrom!DeviceEvtSelfManagedIoInit+0x105
    fffff880`033c3fc0 fffff880`00dd294e : fffffa80`0612aeb0 00000000`0000000c fffff880`00ddb620 fffff880`00ddb600 : Wdf01000!FxSelfManagedIoMachine::Init+0x34
    fffff880`033c3ff0 fffff880`00dc6b9b : 00000000`00000000 00000000`00000000 00000000`00000000 fffff880`00ddb620 : Wdf01000!FxSelfManagedIoMachine::ProcessEvent+0x14a
    fffff880`033c4060 fffff880`00dc66eb : 00000000`00000312 00000000`00000000 fffff880`00ddb600 fffff6fc`c00544e0 : Wdf01000!FxPkgPnp::PowerD0StartingStartSelfManagedIo+0x2f
    fffff880`033c4090 fffff880`00dc638e : fffffa80`0614db60 00000000`00000040 fffff880`00ddb3e0 00000000`00000040 : Wdf01000!FxPkgPnp::PowerEnterNewState+0x1db
    fffff880`033c41c0 fffff880`00dc606c : 00000000`00000000 00000000`00000040 00000000`00000501 fffffa80`0614db60 : Wdf01000!FxPkgPnp::PowerProcessEventInner+0x13e
    fffff880`033c4230 fffff880`00dd3b5e : 00000000`00000000 00000000`00000501 00000000`00000000 00000000`00000501 : Wdf01000!FxPkgPnp::PowerProcessEvent+0x1d4
    fffff880`033c42c0 fffff880`00dd3abb : 00000000`00000501 00000000`00000000 00000000`00000500 fffff800`037875ec : Wdf01000!FxPkgPnp::NotPowerPolOwnerStarting+0xe
    fffff880`033c42f0 fffff880`00dcf950 : fffffa80`0614db60 00000000`00000001 00000000`00000000 fffff880`00dd7ff0 : Wdf01000!FxPkgPnp::NotPowerPolicyOwnerEnterNewState+0x1c7
    fffff880`033c4370 fffff880`00dcf372 : 00000000`00000000 00000000`00000001 00000000`00000000 fffffa80`0614ee20 : Wdf01000!FxPkgPnp::PowerPolicyProcessEventInner+0x3f8
    fffff880`033c43e0 fffff880`00dcba45 : 00000000`00000000 fffffa80`0614ee20 00000000`00000108 00000000`00000000 : Wdf01000!FxPkgPnp::PowerPolicyProcessEvent+0x1e2
    fffff880`033c4470 fffff880`00dcb841 : 00000000`00000101 00000000`00000108 00000000`00000108 fffff800`00000006 : Wdf01000!FxPkgPnp::PnpEventHardwareAvailable+0x111
    fffff880`033c44b0 fffff880`00dcb4fe : fffffa80`0614db60 fffff880`033c45e0 00000000`00000004 fffff880`00dda390 : Wdf01000!FxPkgPnp::PnpEnterNewState+0x1a5
    fffff880`033c4520 fffff880`00dcb201 : 00000000`00000000 00000000`00000002 00000000`00000000 fffffa80`0614db60 : Wdf01000!FxPkgPnp::PnpProcessEventInner+0x122
    fffff880`033c4590 fffff880`00dc2d9c : 00000000`00000000 fffffa80`0614db60 00000000`00000002 00000000`00000000 : Wdf01000!FxPkgPnp::PnpProcessEvent+0x1b1
    fffff880`033c4620 fffff880`00dc1dd6 : fffffa80`0614cd30 00000000`00000002 00000000`00000000 fffffa80`0614db60 : Wdf01000!FxPkgPnp::_PnpStartDevice+0x20
    fffff880`033c4650 fffff880`00d91245 : fffff980`0a82ee50 fffff980`0a82ee50 fffffa80`0614ee20 00000000`00000000 : Wdf01000!FxPkgPnp::Dispatch+0x1b2
    fffff880`033c46c0 fffff880`00d9114b : fffffa80`0614ee20 fffff980`0a82ee50 00000000`00000002 fffffa80`0614ee20 : Wdf01000!FxDevice::Dispatch+0xa9
    fffff880`033c46f0 fffff800`03788c16 : fffff980`0a82ee50 00000000`00000002 fffffa80`0614ee20 fffffa80`0614ee20 : Wdf01000!FxDevice::DispatchWithLock+0x93
    fffff880`033c4730 fffff800`0369bd6e : fffff980`0a82ee50 fffffa80`06186830 fffffa80`0614ee20 fffffa80`061b9a40 : nt!IovCallDriver+0x566
    fffff880`033c4790 fffff800`033d387d : fffffa80`05f0b060 fffffa80`06186830 fffff800`033dcfa0 00000000`00000000 : nt!PnpAsynchronousCall+0xce
    fffff880`033c47d0 fffff800`036ab0b6 : fffff800`034e1940 fffffa80`0614a890 fffffa80`06186830 fffffa80`0614aa38 : nt!PnpStartDevice+0x11d
    fffff880`033c4890 fffff800`036ab354 : fffffa80`0614a890 fffffa80`039e002b fffffa80`039ef010 00000000`00000001 : nt!PnpStartDeviceNode+0x156
    fffff880`033c4920 fffff800`036cea86 : fffffa80`0614a890 fffffa80`039ef010 00000000`00000002 00000000`00000000 : nt!PipProcessStartPhase1+0x74
    fffff880`033c4950 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!PipProcessDevNodeTree+0x296
    Note that I just had to type !thread and didn't need to specify the address of the thread because it was smart enough to figure that I wanted to use the thread involved in the current context (again, context helps!). Because that context happens to be for the latest thread that was running before the crash, that's the one that we want (in minidumps it's the only context you'll get!). Note that the stack unwind has KeBugCheckEx in it so that means it's our crashing thread.

    Now, while the stack unwind is no different than what we saw from the !analyze -v output, we do have a lot of extra relevant data on the thread. While all of this can be used for various debugging efforts, our focus is centered on one line, which contains the Base and the Limit. This is essentially the very bottom of the stack and the maximum limit that the stack can fill up too. So what we do is take those values and use them as start and end points for a range to show all the data inbetween. You can use various types of d command to dump the data, but the one we want in particular is dps, which is part of a triplet of special d commands that are designed to show symbols. As you might remember in other articles, symbols are those informative function names like what you see in the stack unwind info that gives you a good idea what the purpose of that particular function is. dps and its fellow brothers dds and dqs will parse through the range of data you give it, determine if there's an instruction address in it, and then use the appropriate symbol for that address. It's preferred to use dps as opposed to its other two related commands because unlike them, dps will discover whether you're dealing with x32 or x64 sized memory addresses and will read and output the data accordingly. So anyways, let's do that:

    Code:
    2: kd> dps fffff880033bf000 fffff880033c5000
    fffff880`033bf000  ????????`????????
    fffff880`033bf008  ????????`????????
    fffff880`033bf010  ????????`????????
    fffff880`033bf018  ????????`????????
    fffff880`033bf020  ????????`????????
    fffff880`033bf028  ????????`????????
    fffff880`033bf030  ????????`????????
    
    ...
    
    fffff880`033c4fc0  ????????`????????
    fffff880`033c4fc8  ????????`????????
    fffff880`033c4fd0  ????????`????????
    fffff880`033c4fd8  ????????`????????
    fffff880`033c4fe0  ????????`????????
    fffff880`033c4fe8  ????????`????????
    fffff880`033c4ff0  ????????`????????
    fffff880`033c4ff8  ????????`????????
    fffff880`033c5000  ????????`????????
    The first thing you're probably wondering is why I switched the start and end values by using the Limit for the start and Base for the end. That's because stacks grow backwards (downwards in value) so that is reflected by the Base and Limit values. Since you can't tell from this output I presented since I've clipped a lot of it, there's a good couple hundred lines it posts from this. Fortunately most of the stack in this thread hasn't even been used, so there's a long range of question marks at the beginning and end. That means we have less to work with.

    Now, if you're following along, even scrolling through it you'll immediately discover a rather obtuse line in the midst of it:

    Code:
    fffff880`033c3b28  fffff880`015a8228Unable to load image \SystemRoot\system32\DRIVERS\dtsoftbus01.sys, Win32 error 0n2
    *** WARNING: Unable to verify timestamp for dtsoftbus01.sys
    *** ERROR: Module load completed but symbols could not be loaded for dtsoftbus01.sys
     dtsoftbus01+0x5228
    Ah ha! So we have a suspect here! As you can tell, what happened is that while dps did find this as a valid instruction address, it could not find any symbols for the module (driver) that this is a part of, which ends up having it bicker. That's because this - like many other 3rd-party drivers - do not have publicly available symbols that you loaded. If it weren't for you setting up Windbg to use the Microsoft symbol server or to install all the symbol packages from its website, you'd see the same thing for all the Windows functions, too! That's why it's very important to have symbols and the proper symbols, otherwise a stack unwind will be extremely vague or very misleading.

    As it was figured, this is part of Daemon Tools, and in fact it's dated from Jan 2012, so while it's recent at the time of this writing, it still was causing problems. After the client removed it, they reported no issues. Yay!

    Please understand that this is not a 100% surefire way of finding an exact cause. Every bugcheck is different, and this is just one way of finding important data on the situation that may help you. This was a relatively easy case once Driver Verifier kicked in, and all it took is to use this method to extract data that ended up being most important to solving this. There's something different for every case, so it's imperative that you have a general idea what to look for and start using tricks like this to dump all the data that will assist you in that approach.

    Note   Note
    Tip: if you suspect a thread's stack is corrupt, you can use this to dump the stack, and along with dps you can use dc to dump the stack with ASCII characters. Often times a stack is trashed by mishandling of it and having a function unintentionally dump a bunch of data onto the stack. Sometimes you may have strings of text sitting in that data that will point you to a cause, and the dc command is just the thing to use to sniff them out!


    Update:

    I think it's important to mention that this did not in fact fix the client's problem afterall, and I know exactly why. Daemon Tools driver here was responsible for the crash, yes, but why it crashed is important:

    Code:
    DRIVER_VERIFIER_IOMANAGER_VIOLATION (c9)
    The IO manager has caught a misbehaving driver.
    Arguments:
    Arg1: 000000000000023e, A driver has marked an IRP pending but didn't return STATUS_PENDING.
    Arg2: fffffa8005aea2c0, The address in the driver's code where the error was detected.
    Arg3: fffff9800a9aaee0, IRP address.
    Arg4: 0000000000000000, Status code.
    If I am correct, I am pretty sure this is related to the Force Pending I/O Requests check that's in Driver Verifier. Often this check is recommended on technical support forums without people being aware of exactly what it does and what it's for. As mentioned in BSOD Method & Tips, this check is a no-no. This is because it's a stress test creating an artificial environment that would very rarely occur in a real-world situation and so it does this to accelerate the testing procedure for this. Many drivers out there actually are not designed to handle pended IRPs like this, and so forcing this behavior on them will cause them to bug out (Windows drivers included!). While Daemon Tools driver does fail to properly handing this, it's not really supposed too, and so it whacks out when it is forced too using the Force Pending I/O Requests check.

    Understand that the 0xC9 crash was really caused by the IO subsystem checking (I/O Verification) option selected in Driver Verifier. However, what really triggered it is Force Pending I/O Requests, because that option will randomly force an IRP to go pending, which is not a common experience when running a PC. An IRP related to Daemon Tools goes pending, Daemon Tools coughs on it because it doesn't know how to handle pending IRPs correctly, and the I/O Verification check in DV notices this mishandling and triggers the bugcheck. When looking at the Windbg help manual for 0xC9 bugchecks, you'll find that the subcode for this, 0x23E, is a Non-fatal error. It's an ugly and potentially risky fault, but it is not capable in and of itself to cause harm to the system, so it is not generally worthy of creating a BSOD. DV only gives a BSOD anyways to inform the driver developer of the bug to help them with the debugging process when bugtesting their code. In fact, if you actually set up a live kernel debugger on the faulting system, the debugger will catch this kind of fault, and you can choose to ignore the fault and continue on with regular PC activity. It's not often you can recover from a BSOD like that!

    Oh, and just to confirm the fact the client had Force Pending I/O Requests on:

    Code:
    2: kd> !verifier
    
    Verify Level fbb ... enabled options are:
        Special pool
        Special irql
        All pool allocations checked on unload
        Io subsystem checking enabled
        Deadlock detection enabled
        DMA checking enabled
        Security checks enabled
        Force pending I/O requests
        IRP Logging
        Miscellaneous checks enabled
    
    Summary of All Verifier Statistics
    
    RaiseIrqls                             0x0
    AcquireSpinLocks                       0x22ba
    Synch Executions                       0x66
    Trims                                  0x1fee
    
    Pool Allocations Attempted             0x6b15
    Pool Allocations Succeeded             0x6b15
    Pool Allocations Succeeded SpecialPool 0x6b15
    Pool Allocations With NO TAG           0x3
    Pool Allocations Failed                0x0
    Resource Allocations Failed Deliberately   0x0
    
    Current paged pool allocations         0x4e for 0002A83C bytes
    Peak paged pool allocations            0x50 for 0002A874 bytes
    Current nonpaged pool allocations      0x34c9 for 00436B1C bytes
    Peak nonpaged pool allocations         0x34ca for 00436CA0 bytes
    
    GetPointerFromAddress: unable to read from fffff80003515100
    Again, note that I/O subsystem checking is on, which actually caused the BSOD, but only because of Force Pending I/O Request's intervention in the first place.

    So the real culprit for the client's problems continued to be at large, and all we did was waste time with this. So remember folks, don't turn on Force Pending I/O Requests! Note that this still does not nullify this example as being a legitimate method for finding suspect drivers. We still found what was involved in this bugcheck, but it was not responsible for the real-world problems the client was dealing with.


    Bonus:

    Just so you know, this driver actually did show up in the stack unwind present in both the !analyze -v and !thread outputs. Take the address of the instruction in the driver as we see in the raw stack, and search for it in the stack unwind output:

    Code:
    fffff880`033c3b28  fffff880`015a8228Unable to load image \SystemRoot\system32\DRIVERS\dtsoftbus01.sys, Win32 error 0n2
    *** WARNING: Unable to verify timestamp for dtsoftbus01.sys
    *** ERROR: Module load completed but symbols could not be loaded for dtsoftbus01.sys
     dtsoftbus01+0x5228
    
    Child-SP          RetAddr           : Args to Child                                                           : Call Site
    fffff880`033c39a8 fffff800`0376c3dc : 00000000`000000c9 00000000`0000023e fffffa80`05aea2c0 fffff980`0a9aaee0 : nt!KeBugCheckEx
    fffff880`033c39b0 fffff800`0377647a : fffff800`0376a9f0  fffffa80`05aea2c0 fffff980`0a9aaee0 00000000`00000000 :  nt!VerifierBugCheckIfAppropriate+0x3c
    fffff880`033c39f0 fffff800`0377734e : 00000000`0000023e  00000000`00000000 fffff980`0a9aaee0 00000000`ffffffff :  nt!ViErrorFinishReport+0xda
    fffff880`033c3a40 fffff800`0378271f : fffffa80`061af820  fffffa80`0617fab0 fffffa80`0617fb68 fffff880`015a8228 :  nt!VfErrorReport6+0x6e
    fffff880`033c3b10 fffff800`03782b63 : fffffa80`0617fab0  00000000`00000002 fffffa80`05f0b060 fffff880`015a43f4 :  nt!IovpCallDriver2+0x13f
    fffff880`033c3b70 fffff800`03788c2e : fffff980`0a9aaee0  fffff980`0a9aaee0 00000000`00000002 fffffa80`05f0b060 :  nt!VfAfterCallDriver+0x353
    fffff880`033c3bc0 fffff880`00d680c2 : fffff880`00de17c0  fffffa80`0614ea20 00000000`00000025 fffffa80`0617fab0 :  nt!IovCallDriver+0x57e
    fffff880`033c3c20 fffff880`00d7d59f : fffffa80`063b8da0  00000000`00000000 fffff880`00000000 fffff880`083775d5 :  Wdf01000!FxIoTarget::SubmitSync+0x24a
    fffff880`033c3cd0 fffff880`0838be95 : fffffa80`00000020  fffffa80`063b8da0 fffffa80`0614ea20 00000000`00000001 :  Wdf01000!imp_WdfRequestSend+0x24b
    fffff880`033c3d20 fffff880`0838c2f4 : 00000000`00000000  00000000`00000024 00000000`00000001 fffff880`033c3ee0 :  cdrom!DeviceSendRequestSynchronously+0xc9
    fffff880`033c3d80 fffff880`0838af04 : 0000057f`f9f447e8  fffff880`033c3ee0 fffffa80`060eb830 00000000`00000024 :  cdrom!DeviceSendSrbSynchronously+0x3ec
    fffff880`033c3eb0 fffff880`08389825 : fffffa80`00000002  fffffa80`060bbb80 fffff880`083813b8 0000057f`f9f447e8 :  cdrom!DeviceCacheDeviceInquiryData+0xbc
    fffff880`033c3f80 fffff880`00dd29d4 : 00000000`00000000  fffff880`00000010 00000000`0000000c fffff880`00ddb620 :  cdrom!DeviceEvtSelfManagedIoInit+0x105
    fffff880`033c3fc0 fffff880`00dd294e : fffffa80`0612aeb0  00000000`0000000c fffff880`00ddb620 fffff880`00ddb600 :  Wdf01000!FxSelfManagedIoMachine::Init+0x34
    fffff880`033c3ff0 fffff880`00dc6b9b : 00000000`00000000  00000000`00000000 00000000`00000000 fffff880`00ddb620 :  Wdf01000!FxSelfManagedIoMachine::ProcessEvent+0x14a
    fffff880`033c4060 fffff880`00dc66eb : 00000000`00000312  00000000`00000000 fffff880`00ddb600 fffff6fc`c00544e0 :  Wdf01000!FxPkgPnp::PowerD0StartingStartSelfManagedIo+0x2f
    fffff880`033c4090 fffff880`00dc638e : fffffa80`0614db60  00000000`00000040 fffff880`00ddb3e0 00000000`00000040 :  Wdf01000!FxPkgPnp::PowerEnterNewState+0x1db
    fffff880`033c41c0 fffff880`00dc606c : 00000000`00000000  00000000`00000040 00000000`00000501 fffffa80`0614db60 :  Wdf01000!FxPkgPnp::PowerProcessEventInner+0x13e
    fffff880`033c4230 fffff880`00dd3b5e : 00000000`00000000  00000000`00000501 00000000`00000000 00000000`00000501 :  Wdf01000!FxPkgPnp::PowerProcessEvent+0x1d4
    fffff880`033c42c0 fffff880`00dd3abb : 00000000`00000501  00000000`00000000 00000000`00000500 fffff800`037875ec :  Wdf01000!FxPkgPnp::NotPowerPolOwnerStarting+0xe
    fffff880`033c42f0 fffff880`00dcf950 : fffffa80`0614db60  00000000`00000001 00000000`00000000 fffff880`00dd7ff0 :  Wdf01000!FxPkgPnp::NotPowerPolicyOwnerEnterNewState+0x1c7
    fffff880`033c4370 fffff880`00dcf372 : 00000000`00000000  00000000`00000001 00000000`00000000 fffffa80`0614ee20 :  Wdf01000!FxPkgPnp::PowerPolicyProcessEventInner+0x3f8
    fffff880`033c43e0 fffff880`00dcba45 : 00000000`00000000  fffffa80`0614ee20 00000000`00000108 00000000`00000000 :  Wdf01000!FxPkgPnp::PowerPolicyProcessEvent+0x1e2
    fffff880`033c4470 fffff880`00dcb841 : 00000000`00000101  00000000`00000108 00000000`00000108 fffff800`00000006 :  Wdf01000!FxPkgPnp::PnpEventHardwareAvailable+0x111
    fffff880`033c44b0 fffff880`00dcb4fe : fffffa80`0614db60  fffff880`033c45e0 00000000`00000004 fffff880`00dda390 :  Wdf01000!FxPkgPnp::PnpEnterNewState+0x1a5
    fffff880`033c4520 fffff880`00dcb201 : 00000000`00000000  00000000`00000002 00000000`00000000 fffffa80`0614db60 :  Wdf01000!FxPkgPnp::PnpProcessEventInner+0x122
    fffff880`033c4590 fffff880`00dc2d9c : 00000000`00000000  fffffa80`0614db60 00000000`00000002 00000000`00000000 :  Wdf01000!FxPkgPnp::PnpProcessEvent+0x1b1
    fffff880`033c4620 fffff880`00dc1dd6 : fffffa80`0614cd30  00000000`00000002 00000000`00000000 fffffa80`0614db60 :  Wdf01000!FxPkgPnp::_PnpStartDevice+0x20
    fffff880`033c4650 fffff880`00d91245 : fffff980`0a82ee50  fffff980`0a82ee50 fffffa80`0614ee20 00000000`00000000 :  Wdf01000!FxPkgPnp::Dispatch+0x1b2
    fffff880`033c46c0 fffff880`00d9114b : fffffa80`0614ee20  fffff980`0a82ee50 00000000`00000002 fffffa80`0614ee20 :  Wdf01000!FxDevice::Dispatch+0xa9
    fffff880`033c46f0 fffff800`03788c16 : fffff980`0a82ee50  00000000`00000002 fffffa80`0614ee20 fffffa80`0614ee20 :  Wdf01000!FxDevice::DispatchWithLock+0x93
    fffff880`033c4730 fffff800`0369bd6e : fffff980`0a82ee50  fffffa80`06186830 fffffa80`0614ee20 fffffa80`061b9a40 :  nt!IovCallDriver+0x566
    fffff880`033c4790 fffff800`033d387d : fffffa80`05f0b060  fffffa80`06186830 fffff800`033dcfa0 00000000`00000000 :  nt!PnpAsynchronousCall+0xce
    fffff880`033c47d0 fffff800`036ab0b6 : fffff800`034e1940  fffffa80`0614a890 fffffa80`06186830 fffffa80`0614aa38 :  nt!PnpStartDevice+0x11d
    fffff880`033c4890 fffff800`036ab354 : fffffa80`0614a890  fffffa80`039e002b fffffa80`039ef010 00000000`00000001 :  nt!PnpStartDeviceNode+0x156
    fffff880`033c4920 fffff800`036cea86 : fffffa80`0614a890  fffffa80`039ef010 00000000`00000002 00000000`00000000 :  nt!PipProcessStartPhase1+0x74
    fffff880`033c4950 00000000`00000000 : 00000000`00000000  00000000`00000000 00000000`00000000 00000000`00000000 :  nt!PipProcessDevNodeTree+0x296
    It showed up here in the Args to Child because it's part of what's called the home address space. It is where a function typically (keyword here!) stores the contents of 4 general registers (rcx, rdx, r8, and r9, respectively) before it goes ahead and calls another function, which that function can then use these to work with. If you're a programmer, you'll notice that you can pass parameters to functions or during API calls. The first 4 parameters most commonly are what resides in these 4 registers. However, note that this is just a common use for this portion of the stack, and it is not set in stone (plus, this all only applies to x64 stacks). A function can at any time decide to use the home space for anything other than storing those 4 particular registers, though most commonly this is how it works, especially in kernel mode (drivers and Windows code, not user mode applications). So what we have here is this instruction address to inside dtsoftbus01.sys was quite possibly slapped into the r9 register (since it's the fourth address or parameter here) some point in time and it stored it here for safe keeping. The purpose and reasoning behind the exact placement of this here in the raw stack I'm not entirely sure yet, still learning on it. But that's the general idea.

    You can read up more info on home spaces and stack usage in x64 stacks here and here. Also a comprehensive demonstration and tutorial is here. It's a bit advanced but it helped me a lot on figuring out stack construction on x64 and how to walk it.
    Attached Files Attached Files
    Last edited by Vir Gnarus; 06-14-2012 at 10:01 AM.
    niemiro, Tekno Venus, Cookieman and 7 others say thanks for this.


    • Ad Bot

      advertising
      Beep.

        
       

  2. #2
    niemiro's Avatar
    Join Date
    Mar 2012
    Location
    District 12
    Posts
    7,747

    Re: The Raw Truth - Using Raw Stacks

    This is a wonderful post :)

    I had heard a little bit about stack trashes, but I hadn't really understood them or how to deal with them. This post clears up my confusion, and I thank you greatly for that.

    Just one quick question, if I may. Where does the analysis engine's stack come from? Why doesn't it just dump the raw stack normally? Where does it get the less useful stack from?

    Thanks a lot :)

  3. #3

    Join Date
    Mar 2012
    Posts
    469

    Re: The Raw Truth - Using Raw Stacks

    Being a programmer, I'm sure you have a basic idea of what a stack is. Every thread holds a stack in which to store register contents during for passing to other functions during calls, as well as variables and whatnot. The analysis engine uses stack unwind metadata that's typically slapped into executables during compiling and will use those as reference points. Of course, this is all related to x64-bit code. x32 is much simpler in that it just uses stack base pointers (ebp register) and walks through the raw stack using those as hop points. x64 does not use stack base pointers anymore (or rather, it never really changes during codeflow), so it needs the unwind metadata to figure everything out. Obviously this can still be problematic if that data is missing, or if the pointers in the stack it refers too have been overwritten from a stack trash.

    You can get the full details with a nice walkthrough of it here. Note that it's rather advanced, so be warned. I'm still trying to wrap my brain around it all. Stack construction is one of those important things I need to start learning, since stack trashes can happen quite often or they're missing context data and usually the only thing one can do is manually reconstruct the callstack.
    Last edited by Vir Gnarus; 04-11-2012 at 12:37 PM.
    niemiro and zigzag3143 say thanks for this.

  4. #4

    Join Date
    Feb 2012
    Posts
    2,081
    Blog Entries
    7

    Re: The Raw Truth - Using Raw Stacks

    Great post!!!

    A couple of questions:
    1 - Since the stack grows downwards, will the Limit value always be less than the Base value? (this seems to make sense, but I didn't want to assume!)
    2 - Will the thread that caused the crash always have KeBugCheckEx in the stack? In other words, is there any reason to run !thread on the the other threads (or use !running -ti to dump the stacks from all of the threads)?

    FWIW - I was told by many people that it wasn't significant that the debugger states it cannot find the symbols for a 3rd party driver.
    But, having used it numerous times, I find that it's sometimes successful when other analysis has failed to reveal a cause.
    Dave76 and zigzag3143 say thanks for this.

  5. #5
    AceInfinity's Avatar
    Join Date
    Feb 2012
    Location
    Canada
    Posts
    1,705

    Re: The Raw Truth - Using Raw Stacks

    Very good post... I can understand most of what that link contains with the help of my C# and Assembly background knowledge. Lots of it makes sense now. Now I just wish I had the experience with crash dump analysis. So far maybe a few months :)
    \n\n

    Automation Programmer
    Development Site: aceinfinity.net

  6. #6

    Join Date
    Mar 2012
    Posts
    469

    Re: The Raw Truth - Using Raw Stacks

    Quote Originally Posted by usasma View Post
    Great post!!!

    A couple of questions:
    1 - Since the stack grows downwards, will the Limit value always be less than the Base value? (this seems to make sense, but I didn't want to assume!)
    2 - Will the thread that caused the crash always have KeBugCheckEx in the stack? In other words, is there any reason to run !thread on the the other threads (or use !running -ti to dump the stacks from all of the threads)?

    FWIW - I was told by many people that it wasn't significant that the debugger states it cannot find the symbols for a 3rd party driver.
    But, having used it numerous times, I find that it's sometimes successful when other analysis has failed to reveal a cause.
    1. Yep! Since it starts at a high address and grows to lower addresses, the stack will end at the lowest available address (the Limit).
    2. It sure will! KeBugCheckEx is the function responsible for initiating the memory dump and displaying the bluescreen. It is the last function called during a bugcheck to perform this, where it will take all the data given to it from previous bugcheck functions (like nt!VerifierBugCheckIfAppropriate+0x3c in this example's callstack) and uses this to properly construct the bugcheck message and context data that will be displayed onscreen and for minidumps.

    Now, there are plenty of cases where one needs to look at the context of other threads to see what was going on in them. There are often cases where the thread crashed because it called another thread and that thread sent it bad data. There are also cases like with 0x101 bugchecks where you need to change the context to the running thread on another processor because the crashing thread is never the cause of the processor hang that caused the crash - otherwise it wouldn't of been able to produce the crash! So yeah, there's a good number of cases where one needs to check other threads, like in the same process that hosted the crashed thread, to figure out what was happening. It all is based on the approach of going through the trail of data and seeing how deep you gotta go before you can be sure what you are looking at is enough evidence to convict. It reminds me of the show Monk or Columbo, where the clues keep adding up as the main character goes from place to place following the trail of clues till there's enough to pin it on someone by figuring out exactly how it happened. The same applies here with debugging, just more nerdier. :)


    Oh, and about symbols for 3rd-party drivers, don't worry about them. In fact, it's because Windbg lacks them that often the debugger will bicker about them missing and makes finding them a lot easier. Windbg will attempt to load symbols on demand, which is why when you start !analyze -v or even before it you'll actually often see the error message about missing symbols for a 3rd-party driver that resides in the context of the crashing thread. So sometimes you may not even need to run !analyze -v or dive deeper, the error message gave away the culprit immediately! Of course, this doesn't apply to all cases, but it certainly helped on many!

    I don't know if you remember when I asked in TSF about an issue with symbols I was experiencing in that ones for a certain Windows driver wasn't loading up properly from the symbol server. The result was that while the driver did have its own built-in symbols, they were pretty generic, so all I could get from the function name was that it had to do about processor being put to sleep. It didn't explain what type of sleep (S1, S2, etc.) nor did it explain how it did so, so I was kinda puzzled. I could kinda ascertain what was going on by disassembling and reading the code, but it was proving difficult for me. It wasn't until I received a copy of the symbols from JC that successfully loaded that I could tell by the symbol name what exactly was happening. It helped troubleshooting quite a bit.

    Symbols are technically not necessary to analysis. If one knows enough assembly and all that jazz to figure things out on their own, they can disassemble the code and backtrack through it all and get a mental idea what was taking place. Escalation engineers like those in Microsoft are known to doing just that with 3rd-party drivers they don't have access to public symbols for. Despite having no symbols, they can look at the code and get an idea what it's doing and the purpose behind it, and therefore find exactly where it bugged so they can issue a bug fix and a workaround for it to the developer. In fact, for critical enterprise applications where the server has to be kept alive, they can debug into the system and hot patch the code right there as it's running! It takes a lot of skill to do that though, and it just makes things a whole lot easier regardless just to have symbols available. And this is only talking about public symbols! Having access to private symbols for something is far, far superior, where not only are functions named properly with descriptive names, but all the variables and objects are named, too!

    Any person who debugs often will tell you what's worse than no symbols are rather erroneous symbols. They will often not cause the debugger to complain, and so the person goes debugging on the basis of the symbol names and find out they are entirely misleading. Lot of wasted time and effort!
    Last edited by Vir Gnarus; 04-12-2012 at 11:21 AM.
    Dave76, niemiro and usasma say thanks for this.

  7. #7

    Join Date
    Mar 2012
    Posts
    469

    Re: The Raw Truth - Using Raw Stacks

    Quote Originally Posted by AceInfinity View Post
    Very good post... I can understand most of what that link contains with the help of my C# and Assembly background knowledge. Lots of it makes sense now. Now I just wish I had the experience with crash dump analysis. So far maybe a few months :)
    Yah, some programming knowledge (especially Assembly) is pretty much essential to figure this stuff out. At least knowing what functions and calls are and other basic principles of code flow and how things work together to perform a task goes a long way in understanding a crashdump and figuring out the best approach for analysis.

  8. #8

    Join Date
    Mar 2012
    Posts
    469

    Re: The Raw Truth - Using Raw Stacks

    Bump. Updated the, er, update. Added more refined details to explain how one can know what checks DV had on (you should know this from my BSOD Method & Tips!) as well as what actually caused the BSOD here.

  9. #9

    Join Date
    Mar 2012
    Posts
    469

    Re: The Raw Truth - Using Raw Stacks

    Bump. Added at the veeeeery end an extra link to a tutorial that greatly assisted in my understanding of x64 stack construction. It's a bit advanced - like any NTdebugging blog articles - but it still can be digested, just take it slowly. I've had to read it a couple times before a bulb lit in my head.

  10. #10

    Re: The Raw Truth - Using Raw Stacks

    I learn something new from you everyday :)

  11. #11

    Re: The Raw Truth - Using Raw Stacks

    Just found out a culprit using your method. Thanks
    Shyam Sasindran
    Blog: Captaindbg

Similar Threads

  1. Vista notebook, 0x9F, "Too many Irp stacks to be believed (>30)!!"
    By satrow in forum BSOD Kernel Dump Analysis Debugging Information
    Replies: 10
    Last Post: 06-17-2013, 06:53 AM
  2. Replies: 0
    Last Post: 08-01-2012, 06:16 PM

Log in

Log in