Hi fellas, a bit of a short one this time, but worth mentioning. It's pretty much a copy-and-paste from this
thread, with some extra explanations and modifications. BSODs are attached.
There are sometimes when you are struggling to find a pattern but whatever you do find ends up still confusing you, or you find no pattern at all, and are at a loss on clues. This is one instance where scrutiny of the registers pays off, and that simple patterns like this can be found where all other clues are missing.
In this case, at first glance it seems to exhibit wild behavior. The faulting stacks displayed that this error occurred anywhere on nearly anything and at any time, so I initially perceived it as either a driver corrupting something that ends up getting triggered by innocent drivers handling the memory, or hardware.
Digging in deeper, I noticed that practically all the crashes displayed that an unhandled exception occurred that was not dealt with. The error reports were typically an access violation (c0000005
) and it was always
an attempt to read address 0xffffffffffffffff
. As an example (that I'll use throughout this post), here is one snippet of the readout from the !analyze -v for one of the crashdumps:
EXCEPTION_RECORD: fffff800043df638 -- (.exr 0xfffff800043df638)
ExceptionAddress: fffff88004152b3d (dxgmms1!VIDMM_GLOBAL::UnreferenceDmaBuffer+0x000000000000007d)
ExceptionCode: c0000005 (Access violation)
Attempt to read from address ffffffffffffffff
TRAP_FRAME: fffff800043df6e0 -- (.trap 0xfffff800043df6e0)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=ff7ffa800a0a5ed0 rbx=0000000000000000 rcx=fffffa80077b4470
rdx=fffffa800a3cd820 rsi=0000000000000000 rdi=0000000000000000
rip=fffff88004152b3d rsp=fffff800043df870 rbp=fffffa8007270d50
r8=000000000000008e r9=fffffa80077b6d58 r10=fffffa80092707d0
r11=0000000000000002 r12=0000000000000000 r13=0000000000000000
iopl=0 nv up ei ng nz na pe nc
fffff880`04152b3d f0834018ff lock add dword ptr [rax+18h],0FFFFFFFFh ds:ff7ffa80`0a0a5ee8=????????
Look at the bad address it's trying to read in the exception record, then look at the very bottom under the register list where it shows you the actual instruction trying to read that address. As you notice, the actual address it was trying to read (italicized for emphasis) doesn't look anywhere near the obviously bad address that the exception record shows. In fact this one looks rather legit.
Now at first I'm completely baffled by this a bit, because for some reason it always wants to read that bad address even though it isn't exactly pointing to it. However, upon careful scrutiny, I found some discrepancies. Look at the instruction, which is lock add dword ptr [rax+18h]
. Notice that it is trying to read the value stored in the rax register and add 18h to it, then the resulting value is to be a pointer leading to the data it wants to deal with. Now, look at the rax register, which is ff7ffa800a0a5ed0
. Compare it to the other registers that have similar address names, like rdx and rcx. Notice anything odd? That's right, for some reason a 7 managed to be present in that last portion of the address, making it FF7FF
as opposed to the others which are FFFFFF
. Why is it there? Well let's take a gander using the .formats
command to evaluate these numbers in other formats and compare again (note the bold digits in Binary):
2: kd> 0: kd> .formats ff7ffa800a0a5ed0;.formats fffffa800a3cd820
Binary: 11111111 01111111 11111010 10000000 00001010 00001010 01011110 11010000
Time: ***** Invalid FILETIME
Float: low 6.66229e-033 high -3.40254e+038
Binary: 11111111 11111111 11111010 10000000 00001010 00111100 11011000 00100000
Time: ***** Invalid FILETIME
Float: low 9.09252e-033 high -1.#QNAN
This is a classic case of a bit flip, which is a situation in which a single bit has inadvertently been flipped to 0 for what seems to be no reason. If you examine any of the other crashes, this shows up pretty much all the time (though not in the same registers). Now if it were the case of a long string of bits being changed, we can possibly attribute that to driver passing a bad address to the register or other hardware malfunctioning like HD or RAM (most likely). But with very small cases with only 1 bit flipped in this manner, I've only found this being caused from PSU, Mobo or CPU problems, with CPU being most likely cause.