Hi fellas, a bit of a short one this time, but worth mentioning. It's pretty much a copy-and-paste from this thread, with some extra explanations and modifications. BSODs are attached.
There are sometimes when you are struggling to find a pattern but whatever you do find ends up still confusing you, or you find no pattern at all, and are at a loss on clues. This is one instance where scrutiny of the registers pays off, and that simple patterns like this can be found where all other clues are missing.
In this case, at first glance it seems to exhibit wild behavior. The faulting stacks displayed that this error occurred anywhere on nearly anything and at any time, so I initially perceived it as either a driver corrupting something that ends up getting triggered by innocent drivers handling the memory, or hardware.
Digging in deeper, I noticed that practically all the crashes displayed that an unhandled exception occurred that was not dealt with. The error reports were typically an access violation (c0000005) and it was always an attempt to read address 0xffffffffffffffff. As an example (that I'll use throughout this post), here is one snippet of the readout from the !analyze -v for one of the crashdumps:
EXCEPTION_RECORD: fffff800043df638 -- (.exr 0xfffff800043df638)
ExceptionAddress: fffff88004152b3d (dxgmms1!VIDMM_GLOBAL::UnreferenceDmaBuffer+0x000000000000007d)
ExceptionCode: c0000005 (Access violation)
Attempt to read from address ffffffffffffffff
TRAP_FRAME: fffff800043df6e0 -- (.trap 0xfffff800043df6e0)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=ff7ffa800a0a5ed0 rbx=0000000000000000 rcx=fffffa80077b4470
rdx=fffffa800a3cd820 rsi=0000000000000000 rdi=0000000000000000
rip=fffff88004152b3d rsp=fffff800043df870 rbp=fffffa8007270d50
r8=000000000000008e r9=fffffa80077b6d58 r10=fffffa80092707d0
r11=0000000000000002 r12=0000000000000000 r13=0000000000000000
iopl=0 nv up ei ng nz na pe nc
fffff880`04152b3d f0834018ff lock add dword ptr [rax+18h],0FFFFFFFFh ds:ff7ffa80`0a0a5ee8=????????
Look at the bad address it's trying to read in the exception record, then look at the very bottom under the register list where it shows you the actual instruction trying to read that address. As you notice, the actual address it was trying to read (italicized for emphasis) doesn't look anywhere near the obviously bad address that the exception record shows. In fact this one looks rather legit.
Now at first I'm completely baffled by this a bit, because for some reason it always wants to read that bad address even though it isn't exactly pointing to it. However, upon careful scrutiny, I found some discrepancies. Look at the instruction, which is lock add dword ptr [rax+18h]. Notice that it is trying to read the value stored in the rax register and add 18h to it, then the resulting value is to be a pointer leading to the data it wants to deal with. Now, look at the rax register, which is ff7ffa800a0a5ed0. Compare it to the other registers that have similar address names, like rdx and rcx. Notice anything odd? That's right, for some reason a 7 managed to be present in that last portion of the address, making it FF7FF as opposed to the others which are FFFFFF. Why is it there? Well let's take a gander using the .formats command to evaluate these numbers in other formats and compare again (note the bold digits in Binary):
This is a classic case of a bit flip, which is a situation in which a single bit has inadvertently been flipped to 0 for what seems to be no reason. If you examine any of the other crashes, this shows up pretty much all the time (though not in the same registers). Now if it were the case of a long string of bits being changed, we can possibly attribute that to driver passing a bad address to the register or other hardware malfunctioning like HD or RAM (most likely). But with very small cases with only 1 bit flipped in this manner, I've only found this being caused from PSU, Mobo or CPU problems, with CPU being most likely cause.
It seems that the general protection fault handler always raises a c0000005 exception with "Attempt to read from address ffffffffffffffff". Try "int 8" in user mode for an example. But why does an invalid memory access raise a GPF instead of page fault in the flat paged memory model of Windows?
This is one of those questions where I remember getting the answer somewhere, but I don't remember the answer. I believe it has a lot to do with the actual address, because Windows embeds safeties to detect attempting to access memory using a null reference, so it may hit a different exception routine to handle this, compared to what may be hit when an address that may actually be addressable but isn't legit (or accessed at wrong IRQL). It just assumes that there's no legitimate reason to access address 0000000000000000 or ffffffffffffffff, and that it only occurred because of using a null pointer.
Sorry to bring up an old thread, but which two addresses did you use with the formats command? I couldn't see them anywhere in the above examples you provided, or maybe I just missed it completely :huh:
Found out the reason for the general protection fault: on x64 processor, the 17 highest bits of a pointer have to be all zero or all one. The processor raises a GPF instead of page fault if a pointer where these bits are mixed is dereferenced. x86-64 - Wikipedia, the free encyclopedia
Thanks a mil for that, mate. Though, I'm not sure that applies to the specific case of a null pointer dereference which involves either an all zero or all F's address, because both of those fit as a canonical address. Though again, in this case, neither of those addresses are the ones that was really accessed, but rather ff7ffa800a0a5ed0, which is noncanonical indeed.