Hi fellas, a bit of a short one this time, but worth mentioning. It's pretty much a copy-and-paste from
this thread, with some extra explanations and modifications. BSODs are attached.
There are sometimes when you are struggling to find a pattern but whatever you do find ends up still confusing you, or you find no pattern at all, and are at a loss on clues. This is one instance where scrutiny of the registers pays off, and that simple patterns like this can be found where all other clues are missing.
In this case, at first glance it seems to exhibit wild behavior. The faulting stacks displayed that this error occurred anywhere on nearly anything and at any time, so I initially perceived it as either a driver corrupting something that ends up getting triggered by innocent drivers handling the memory, or hardware.
Digging in deeper, I noticed that practically all the crashes displayed that an unhandled exception occurred that was not dealt with. The error reports were typically an access violation (
c0000005) and it was
always an attempt to read address
0xffffffffffffffff. As an example (that I'll use throughout this post), here is one snippet of the readout from the !analyze -v for one of the crashdumps:
Code:
EXCEPTION_RECORD: fffff800043df638 -- (.exr 0xfffff800043df638)
ExceptionAddress: fffff88004152b3d (dxgmms1!VIDMM_GLOBAL::UnreferenceDmaBuffer+0x000000000000007d)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 0000000000000000
Parameter[1]: ffffffffffffffff
Attempt to read from address ffffffffffffffff
TRAP_FRAME: fffff800043df6e0 -- (.trap 0xfffff800043df6e0)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=ff7ffa800a0a5ed0 rbx=0000000000000000 rcx=fffffa80077b4470
rdx=fffffa800a3cd820 rsi=0000000000000000 rdi=0000000000000000
rip=fffff88004152b3d rsp=fffff800043df870 rbp=fffffa8007270d50
r8=000000000000008e r9=fffffa80077b6d58 r10=fffffa80092707d0
r11=0000000000000002 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei ng nz na pe nc
dxgmms1!VIDMM_GLOBAL::UnreferenceDmaBuffer+0x7d:
fffff880`04152b3d f0834018ff lock add dword ptr [rax+18h],0FFFFFFFFh ds:ff7ffa80`0a0a5ee8=????????
Look at the bad address it's trying to read in the exception record, then look at the very bottom under the register list where it shows you the actual instruction trying to read that address. As you notice, the actual address it was trying to read (italicized for emphasis) doesn't look anywhere near the obviously bad address that the exception record shows. In fact this one looks rather legit.
Now at first I'm completely baffled by this a bit, because for some reason it always wants to read that bad address even though it isn't exactly pointing to it. However, upon careful scrutiny, I found some discrepancies. Look at the instruction, which is
lock add dword ptr [rax+18h]. Notice that it is trying to read the value stored in the rax register and add 18h to it, then the resulting value is to be a pointer leading to the data it wants to deal with. Now, look at the rax register, which is
ff7ffa800a0a5ed0. Compare it to the other registers that have similar address names, like rdx and rcx. Notice anything odd? That's right, for some reason a 7 managed to be present in that last portion of the address, making it
FF7FF as opposed to the others which are
FFFFFF. Why is it there? Well let's take a gander using the
.formats command to evaluate these numbers in other formats and compare again (note the bold digits in Binary):
Code:
2: kd> 0: kd> .formats ff7ffa800a0a5ed0;.formats fffffa800a3cd820
Evaluate expression:
Hex: ff7ffa80`0a0a5ed0
Decimal: -36034844164464944
Octal: 1775777650001202457320
Binary: 11111111 01111111 11111010 10000000 00001010 00001010 01011110 11010000
Chars: .....^.
Time: ***** Invalid FILETIME
Float: low 6.66229e-033 high -3.40254e+038
Double: -1.4035e+306
Evaluate expression:
Hex: fffffa80`0a3cd820
Decimal: -6047142193120
Octal: 1777777650001217154040
Binary: 11111111 11111111 11111010 10000000 00001010 00111100 11011000 00100000
Chars: .....<.
Time: ***** Invalid FILETIME
Float: low 9.09252e-033 high -1.#QNAN
Double: -1.#QNAN
This is a classic case of a bit flip, which is a situation in which a single bit has inadvertently been flipped to 0 for what seems to be no reason. If you examine any of the other crashes, this shows up pretty much all the time (though not in the same registers). Now if it were the case of a long string of bits being changed, we can possibly attribute that to driver passing a bad address to the register or other hardware malfunctioning like HD or RAM (most likely). But with very small cases with only 1 bit flipped in this manner, I've only found this being caused from PSU, Mobo or CPU problems, with CPU being most likely cause.