Rotten to the Core - Code Corruption

Vir Gnarus · Apr 25, 2012

Howdy,

I got a curious item here that I'd like to present to everyone to potentially improve their BSOD deductive reasoning. You may remember the small little article in the BSOD Method & Tips thread describing !chkimg, but you probably never found a good reason to use it. This is an example where analysis can proceed further with that information as being a clue on what to look for.

A particular client requested assistance about a BSOD they apparently experienced only one time, but because it occurred so early during their PC's life (only a month or so after purchase) they were concerned. So they provided a crashdump (attached to this thread) along with a JCGriff Report, to which I dove into. First, obviously, let's run the analysis engine through it:

Code:

*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 1000008E, {c0000005, 82cc014e, 9be37c94, 0}

Unable to load image \??\D:\X-Play\Audition Dance Battle\GameGuard\dump_wmimmc.sys, Win32 error 0n2
*** WARNING: Unable to verify timestamp for dump_wmimmc.sys
*** ERROR: Module load completed but symbols could not be loaded for dump_wmimmc.sys
TRIAGER: Could not open triage file : C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\triage\modclass.ini, error 2
Probably caused by : dump_wmimmc.sys

Followup: MachineOwner
---------

2: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

KERNEL_MODE_EXCEPTION_NOT_HANDLED_M (1000008e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Some common problems are exception code 0x80000003.  This means a hard
coded breakpoint or assertion was hit, but this system was booted
/NODEBUG.  This is not supposed to happen as developers should never have
hardcoded breakpoints in retail code, but ...
If this happens, make sure a debugger gets connected, and the
system is booted /DEBUG.  This will let us see why this breakpoint is
happening.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 82cc014e, The address that the exception occurred at
Arg3: 9be37c94, Trap Frame
Arg4: 00000000

Debugging Details:
------------------

TRIAGER: Could not open triage file : C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\triage\modclass.ini, error 2

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

FAULTING_IP: 
nt!NtWriteFile+3
82cc014e ab              stos    dword ptr es:[edi]

TRAP_FRAME:  9be37c94 -- (.trap 0xffffffff9be37c94)
ErrCode = 00000002
eax=87ae6030 ebx=82cc014b ecx=0000018c edx=82cc014b esi=0414fa28 edi=00000474
eip=82cc014e esp=9be37d08 ebp=9be37d34 iopl=0         nv up ei pl zr na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010246
nt!NtWriteFile+0x3:
82cc014e ab              stos    dword ptr es:[edi]   es:0023:00000474=????????
Resetting default scope

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  CODE_CORRUPTION

BUGCHECK_STR:  0x8E

PROCESS_NAME:  avgtray.exe

CURRENT_IRQL:  0

MISALIGNED_IP: 
nt!NtWriteFile+3
82cc014e ab              stos    dword ptr es:[edi]

LAST_CONTROL_TRANSFER:  from 777e7094 to 82cc014e

STACK_TEXT:  
9be37d34 777e7094 badb0d00 0414fa04 00000000 nt!NtWriteFile+0x3
WARNING: Frame IP not in any known module. Following frames may be wrong.
9be37d38 badb0d00 0414fa04 00000000 00000000 0x777e7094
9be37d3c 0414fa04 00000000 00000000 00000000 0xbadb0d00
9be37d40 00000000 00000000 00000000 00000000 0x414fa04


STACK_COMMAND:  kb

CHKIMG_EXTENSION: !chkimg -lo 50 -d !nt
    82cc014b-82cc014f  5 bytes - nt!NtWriteFile
    [ 6a 5c 68 80 a3:e9 00 91 ab 12 ]
5 errors : !nt (82cc014b-82cc014f)

MODULE_NAME: dump_wmimmc

IMAGE_NAME:  dump_wmimmc.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4d749b9b

FOLLOWUP_NAME:  MachineOwner

MEMORY_CORRUPTOR:  PATCH_dump_wmimmc

FAILURE_BUCKET_ID:  MEMORY_CORRUPTION_PATCH_dump_wmimmc

BUCKET_ID:  MEMORY_CORRUPTION_PATCH_dump_wmimmc

Followup: MachineOwner
---------

So as we can tell from all the clues that we're dealing with code corruption here. The Bucket IDs point to it being the case, and there's even that additional !chkimg output done for ya. As well, it immediately tells you the failing module that was most likely related to the crash, which is dump_wmimmc, and checking the info on it reveals that it's related to GameGuard, an anti-cheating device. So really, there's nothing particularly esoteric about this crash. Everything appears evident that it's pointing to GameGuard. However, one needs to do a bit more diving as a sanity check to ensure that we're not jumping to conclusions here and leading ourselves off the wrong track. Maybe memory corruption occurred while GameGuard was running. In addition, this did occur during the avgtray.exe process as you can tell by the process name provided. Maybe AVG is at fault here? To get more clarity, we'll need to discern what we're looking at and move on from there.

To start off, let's interpret what the !chkimg output is. If you remember from BSOD Method & Tips, there's two types of switches you can give it that will provide different ways of displaying the corrupt data. The analysis engine defaults to using just -d, which will show each corruption in brackets, displaying the bytes it expected to see, and the bytes it ended up seeing (the corruption), separated by a colon. In this certain case we're witnessing here, it is one strand of bytes that was corrupted, and it doesn't take much to figure out that these aren't the results of missing bits. If you wish to make sure, you can compare the two using .formats:

Code:

CHKIMG_EXTENSION: !chkimg -lo 50 -d !nt
    82cc014b-82cc014f  5 bytes - nt!NtWriteFile
    [ [COLOR=#008000]6a 5c 68 80 a3[/COLOR]:[COLOR=#ff0000]e9 00 91 ab 12[/COLOR] ]
5 errors : !nt (82cc014b-82cc014f)

...

2: kd> .formats [COLOR=#008000]6a5c6880a3[/COLOR]
Evaluate expression:
  Hex:     0000006a`5c6880a3
  Decimal: 456816885923
  Octal:   0000000006513432100243
  Binary:  00000000 00000000 00000000 01101010 01011100 01101000 10000000 10100011
  Chars:   ...j\h..
  Time:    Mon Jan  1 08:41:21.688 1601 (UTC - 4:00)
  Float:   low 2.61775e+017 high 1.48538e-043
  Double:  2.25698e-312
2: kd> .formats [COLOR=#ff0000]e90091ab12[/COLOR]
Evaluate expression:
  Hex:     000000e9`0091ab12
  Decimal: 1000736926482
  Octal:   0000000016440044325422
  Binary:  00000000 00000000 00000000 11101001 00000000 10010001 10101011 00010010
  Chars:   ........
  Time:    Mon Jan  1 23:47:53.692 1601 (UTC - 4:00)
  Float:   low 1.33775e-038 high 3.26503e-043
  Double:  4.9443e-312

As you can tell, they don't really match at all. There are bits that are set and bits that are lost. Ultimately, there's no pattern that can be discerned, so I would highly doubt we're dealing with a form of memory corruption. In addition, CPU corruption may not be likely either as that commonly produces single bit errors, not an entire strand of badly formed data. CPUs can perform opcode failure one way or another, but that shouldn't manifest itself as corruption in the actual image of the module, which is what we're seeing here. So hardware is probably not an explanation for the failure here.

Since it's most likely software, we can determine that this is data that inadvertently (or advertently) got slapped on top of kernel code, and that obviously it'll be data that's relevant to that particular software. It can come in the form of raw data (like variables or strings of text), or code. Since we can't really tell if it's numerical variables we're dealing with here unless we're doing some heavy debugging, we'll need to forgo attempting to extract those from the corrupted data. Instead, let's go see if there's a string of text here. We can do so easily by using the -db switch instead of d to get an output similar to the db command:

Code:

2: kd> !chkimg -lo 50 !nt -db
5 errors : !nt (82cc014b-82cc014f)
82cc0140  5e  5b  c9  c2  10  00  90  90  90  90  90 *e9 *00 *91 *ab *12 ^[..............

The corrupted code is marked with asterisks. Evidently, we're not dealing with a string of text as they don't translate into ASCII characters. So unfortunately, that's not what we're witnessing here.

So now we will have to figure if this is code we're dealing with. We can determine this by going to faulting IP mentioned in the !analyze -v output and then using this as a starting point to view the code in the Disassembly window. We could use the u commands, but those are unreliable unless you know what you're doing. I can offer explanation on how to properly use them in cases like this, at request. For now, let's open it up and give it a whirl. Copy the faulting IP, and paste it as the Offset value for the Disassembly window as somewhat displayed here:

Code:

FAULTING_IP: 
nt!NtWriteFile+3
[COLOR=#ff0000]82cc014e[/COLOR] ab              stos    dword ptr es:[edi]

[COLOR=#008000][I]Disassembly window output using 82cc014e as the offset:[/I][/COLOR]

No prior disassembly possible
[COLOR=#ff0000]82cc014e[/COLOR] ab              stos    dword ptr es:[edi]   es:0023:00000474=????????
82cc014f 12aa82e851fa    adc     ch,byte ptr [edx-5AE177Eh]
82cc0155 e0ff            loopne  nt!NtWriteFile+0xb (82cc0156)
82cc0157 33f6            xor     esi,esi
82cc0159 8975dc          mov     dword ptr [ebp-24h],esi
82cc015c 8975d0          mov     dword ptr [ebp-30h],esi
82cc015f 8975a4          mov     dword ptr [ebp-5Ch],esi
82cc0162 8975a8          mov     dword ptr [ebp-58h],esi
82cc0165 64a124010000    mov     eax,dword ptr fs:[00000124h]
82cc016b 8945bc          mov     dword ptr [ebp-44h],eax
82cc016e 8a983a010000    mov     bl,byte ptr [eax+13Ah]
82cc0174 885dd4          mov     byte ptr [ebp-2Ch],bl
82cc0177 8d4594          lea     eax,[ebp-6Ch]
82cc017a 50              push    eax
...

You may notice the Previous and Next buttons in the window that will display the previous several lines of code or the next lines, respectively. As you can tell, the Previous is greyed. In addition, there's a line stating No prior disassembly possible. Why is this? Because of something to do with offsets. The disassembler will try to determine what the code is and if it doesn't see anything that looks like valid opcodes prior to the address given it'll blank out the button because there's nothing valid back that way. However, understand that compilers can often legitimately use different offsets in compiled code to inline code in certain ways that isn't exactly flush with each other. There's reasoning for that and all but that's beyond this article right now. What we do know is that we can be sure that there's code back that way, but we'll need to adjust the offset accordingly so that it can interpret the bytes properly. If you don't understand what any of that means, don't worry, it'll be explained here shortly. For now, we should know that instead of having to manually alter the address we gave to the window as an offset, the window is smart enough to adjust it automatically for us. Just click the Next button once then go back once using the Previous button (PageUp and PageDwn keys work too). The result is that we should be at the same place as before, but we're not, because it automatically adjusted the offset for us. This is not displayed at the Offset at the top, but we can see by the address of the first instruction listed that we're most certainly not at the same place, but a bit off (in a good way):

Code:

[COLOR=#0000cd]82cc0150[/COLOR] aa              stos    byte ptr es:[edi]
82cc0151 82e851          sub     al,51h
82cc0154 fa              cli
82cc0155 e0ff            loopne  nt!NtWriteFile+0xb (82cc0156)
82cc0157 33f6            xor     esi,esi
82cc0159 8975dc          mov     dword ptr [ebp-24h],esi
82cc015c 8975d0          mov     dword ptr [ebp-30h],esi
82cc015f 8975a4          mov     dword ptr [ebp-5Ch],esi
82cc0162 8975a8          mov     dword ptr [ebp-58h],esi
82cc0165 64a124010000    mov     eax,dword ptr fs:[00000124h]
82cc016b 8945bc          mov     dword ptr [ebp-44h],eax
82cc016e 8a983a010000    mov     bl,byte ptr [eax+13Ah]
82cc0174 885dd4          mov     byte ptr [ebp-2Ch],bl
82cc0177 8d4594          lea     eax,[ebp-6Ch]
...

This is no longer the 82cc014e we gave it before but now changed to 82cc0150. Notice that all the instructions have changed to accommodate this difference.

If you need the explanation now for what we (or rather, the Disassembler window) just did and why, you'll need to understand what code is all about. I'm sure you know well about C++, Java, etc., all those different types of code. Well, eventually it all has to be reduced one way or another into simple instructions (opcodes) for a CPU to interpret. Each instruction varies in size, but in essence really the whole thing ends up being just a bunch of bytes sitting next to each other which is parsed by an interpreter that will be determined as code. An example of this would be as followed:

sallysellsseashellsbytheseashore

You are able to naturally interpret this, but it's much easier to add spaces:

sally sells sea shells by the sea shore

What you did naturally in your head was interpret a string of characters into words and space them so that they are viewed as words in a sentence, not as a string of random characters in the english alphabet. The same applies to a computer interpreter, which interprets a bunch of bytes as code by parsing it appropriately.

Now concerning offsets, imagine if I took the same sentence, and then placed the spaces differently by starting in a different spot. Instead of "sally", let's start at "ly" in "sally". Since "ly" isn't a valid word, let's add "sells" to it and make it a name with a possessive, "Lysell's" (weird name, I know, work with me though). The result is:

Lysell's sea shells by the sea shore

The meaning of the sentence is changed completely. Sally is replaced by Lysell, and no one's selling any sea shells anymore. Just from offsetting it a couple letter by starting with the 'L' instead of the 'S' in 'Sally' something different takes place. A better explanation would be if the change altered every single word in the sentence, even if they all are seemingly random words that appear arbitrary. That's what happens when you give a different offset than what is actually valid. The characters can still form into coherent words, but the statement appears garbled. Try running English text through Babelfish into a different language then back to English several times to see what it means to have coherent words in an incoherent phrase just by misinterpretations of the meanings. Another example is moving the decimal place in a number to change it into a completely different number.

So now that I've explained that, we see an example of it here in the Disassembly window. The adjustments have been made and now we're at another point of reference than the one we initially gave it. Now you should see that the Previous button is available and you can scroll backwards. Let's do that once:

Code:

82cc013d 8bc3            mov     eax,ebx
82cc013f 5f              pop     edi
82cc0140 5e              pop     esi
82cc0141 5b              pop     ebx
82cc0142 c9              leave
82cc0143 c21000          ret     10h
82cc0146 90              nop
82cc0147 90              nop
82cc0148 90              nop
82cc0149 90              nop
82cc014a 90              nop
nt!NtWriteFile:
[COLOR=#FF0000]82cc014b e90091ab12      jmp     dump_wmimmc+0x5250 (95779250)[/COLOR]
82cc0150 aa              stos    byte ptr es:[edi]
82cc0151 82e851          sub     al,51h
82cc0154 fa              cli
82cc0155 e0ff            loopne  nt!NtWriteFile+0xb (82cc0156)
82cc0157 33f6            xor     esi,esi
82cc0159 8975dc          mov     dword ptr [ebp-24h],esi
82cc015c 8975d0          mov     dword ptr [ebp-30h],esi
82cc015f 8975a4          mov     dword ptr [ebp-5Ch],esi
82cc0162 8975a8          mov     dword ptr [ebp-58h],esi
82cc0165 64a124010000    mov     eax,dword ptr fs:[00000124h]
82cc016b 8945bc          mov     dword ptr [ebp-44h],eax
82cc016e 8a983a010000    mov     bl,byte ptr [eax+13Ah]
82cc0174 885dd4          mov     byte ptr [ebp-2Ch],bl

Ah ha, what do we have here? Evidently, what we see is that at the very start of the nt!NtWriteFile function, an instruction was made to jump (jmp) into a function at dump_wmimmc+0x5250, which is obviously inside that bloody GameGuard driver. In fact, look at the actual raw opcode (second column). Does this look familiar? Harken back to the strand of corrupted bytes we witnessed in !chkimg:

Code:

CHKIMG_EXTENSION: !chkimg -lo 50 -d !nt
    82cc014b-82cc014f  5 bytes - nt!NtWriteFile
    [ [COLOR=#000000]6a 5c 68 80 a3[/COLOR]:[COLOR=#ff0000]e9 00 91 ab 12[/COLOR] ]
5 errors : !nt (82cc014b-82cc014f)

nt!NtWriteFile:
[COLOR=#000000]82cc014b [/COLOR][COLOR=#ff0000]e90091ab12[/COLOR][COLOR=#000000]      jmp     dump_wmimmc+0x5250 (95779250)[/COLOR]

They are identical. This is clearly the corruption !chkimg was referring too. So it was a piece of code afterall. I'm not sure if it was deliberate or not (given GameGuard works by hooks, I say it might be deliberate), but the code in the nt module was altered by GameGuard to add this jump instruction so that GameGuard would eyeball or do whatever with whatever file was written first prior to (hopefully) sending it back this way to finish this particular function. I must say it's a particularly dirty trick to hotpatch kernel code in this manner, as this is an extremely invasive way of hooking. So honestly, we can probably blame fault on GameGuard. While AVG may very well be the one that triggered the crash, it's most likely because it tripped over GameGuard's shenanigans.

Phew, I'm tired and hungry now. Time for my lunch break! I'll pop back here to explain more on the offset thing with another example and maybe even add how to use u commands properly in cases like this. Hope this helps!

Update: Just to let people know, that Vista/Win7 x64 versions and newer Windows shouldn't be experiencing a case like what we witnessed here. That's because in the x64 versions of Windows, there's something called PatchGuard that runs, which checks kernel code to see if it has been manipulated in any manner outside of Microsoft patching. If it has detected such, it'll cause a BSOD with a bugcheck of 0x109, which is CRITICAL_STRUCTURE_CORRUPTION.

usasma · Apr 25, 2012

A quick google for "dump_wmimmc" shows lot's of problems relating to this piece of work.
My question is slightly different tho'

When Windows boots it checksums the components involved in crash dump generation. It then takes the disk miniport driver and appends dump_ to it (for use in writing the crash dump to the disk should a crash occur).

I presume that this was the case with this driver - that the system checksummed it and appended dump_ to the wmimmc.sys name

Question 1 - is there just one miniport driver - or is there a storage "stack" of drivers that are appended with dump_ (I've seen several dump_ drivers in some memory dumps, so I suspect it's the latter).

Question 2 - Why would the "dump_wmimmc" copy fail - but not the wmimmc? Hasn't the crash already occurred when the dump_wmimmc is called to write to the drive?

Vir Gnarus · Apr 26, 2012

GameGuard is designed to hook into kernel code in order to filter and investigate any activity that may rely on that code to perform cheats in games, such as the nt!WriteFile function. It will obviously use various tactics to perform this, and given that they were willing to completely overwrite Windows kernel code as opposed to noninvasively hook it in some other fashion, I would not put it past them that they would use other tricks like the "dump_" prefix to try to coerce Windows into doing its bidding. What it is doing is no different than a malware of some kind like a rootkit, other than its intentions are to benefit (as if to say the ends justify the means).

I would not trust this driver in whatever fashion it presents itself, including the "dump_" prefix. Consider this: you are right in that the "dump_" modules should be copies of existing modules saved at startup to perform crashdump generation even when the original modules for crashdump and drive I/O have been corrupted. However, look at the list of loaded modules and you will see that all other loaded (or, rather, unloaded) modules with the "dump_" prefix also has an existing original version of them loaded. dump_nvstor32 has nvstor32 loaded, dump_dumpfve has fvevol loaded, and dump_storpor has storport loaded. The only exception is wmimmc, which there is no driver with a name even closely related to that. In essence, it's a fake. I bet that if one takes a snapshot of existing loaded modules (either by running Windbg as local kernel debugger) at Windows startup, this driver would not even exist in the list of loaded modules. Only until GameGuard is called to start up will it load this module to perform the necessary kernel-mode activity. While it appears to be responsible for being used during crashdump generation, it is instead falsifying its appearance.

usasma · Apr 26, 2012

I didn't even think to look for the wmimmc! I did a google for it, and the results weren't all that I'd expect to see - but I blew it off because I saw the dump_ prefix. That'll teach me to assume! Thanks for setting me straight!

Vir Gnarus · Apr 26, 2012

Yah, also you can tell using lmvm:

Code:

2: kd> lmvm dump_wmimmc
start    end        module name
95774000 957e4300   dump_wmimmc T (no symbols)           
    Loaded symbol image file: dump_wmimmc.sys
 [COLOR=#ff0000]   Image path: \??\D:\X-Play\Audition Dance Battle\GameGuard\dump_wmimmc.sys[/COLOR]
    Image name: dump_wmimmc.sys
    Timestamp:        Mon Mar 07 03:47:23 2011 (4D749B9B)
    CheckSum:         00078FC1
    ImageSize:        00070300
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

Certainly ain't hiding here, for sure! Given that it's not in a standard driver directory (like System32 or the drivers subdirectory) you can tell that it's not a driver that's loaded at Windows startup (at least not early).

Vir Gnarus · Sep 14, 2012

Bump. I added a little bit of extra info on the end based on what I've learned recently. I might peruse these old articles sometime and update them to reflect anything I might have learned lately.

x BlueRobot · Feb 17, 2021

Any corruptions can be "fixed" to what WinDbg expects by passing the -f switch.

Rotten to the Core - Code Corruption

Vir Gnarus

BSOD Kernel Dump Expert

Attachments

usasma

Retired Admin

Vir Gnarus

BSOD Kernel Dump Expert

usasma

Retired Admin

Vir Gnarus

BSOD Kernel Dump Expert

Vir Gnarus

BSOD Kernel Dump Expert

x BlueRobot

Administrator