1. #1

    Join Date
    Mar 2012
    Posts
    469

    Bit Flips

    Hi fellas, a bit of a short one this time, but worth mentioning. It's pretty much a copy-and-paste from this thread, with some extra explanations and modifications. BSODs are attached.

    There are sometimes when you are struggling to find a pattern but whatever you do find ends up still confusing you, or you find no pattern at all, and are at a loss on clues. This is one instance where scrutiny of the registers pays off, and that simple patterns like this can be found where all other clues are missing.

    In this case, at first glance it seems to exhibit wild behavior. The faulting stacks displayed that this error occurred anywhere on nearly anything and at any time, so I initially perceived it as either a driver corrupting something that ends up getting triggered by innocent drivers handling the memory, or hardware.

    Digging in deeper, I noticed that practically all the crashes displayed that an unhandled exception occurred that was not dealt with. The error reports were typically an access violation (c0000005) and it was always an attempt to read address 0xffffffffffffffff. As an example (that I'll use throughout this post), here is one snippet of the readout from the !analyze -v for one of the crashdumps:

    Code:
    EXCEPTION_RECORD:  fffff800043df638 -- (.exr 0xfffff800043df638)
    ExceptionAddress: fffff88004152b3d (dxgmms1!VIDMM_GLOBAL::UnreferenceDmaBuffer+0x000000000000007d)
       ExceptionCode: c0000005 (Access violation)
      ExceptionFlags: 00000000
    NumberParameters: 2
       Parameter[0]: 0000000000000000
       Parameter[1]: ffffffffffffffff
    Attempt to read from address ffffffffffffffff
    
    TRAP_FRAME:  fffff800043df6e0 -- (.trap 0xfffff800043df6e0)
    NOTE: The trap frame does not contain all registers.
    Some register values may be zeroed or incorrect.
    rax=ff7ffa800a0a5ed0 rbx=0000000000000000 rcx=fffffa80077b4470
    rdx=fffffa800a3cd820 rsi=0000000000000000 rdi=0000000000000000
    rip=fffff88004152b3d rsp=fffff800043df870 rbp=fffffa8007270d50
     r8=000000000000008e  r9=fffffa80077b6d58 r10=fffffa80092707d0
    r11=0000000000000002 r12=0000000000000000 r13=0000000000000000
    r14=0000000000000000 r15=0000000000000000
    iopl=0         nv up ei ng nz na pe nc
    dxgmms1!VIDMM_GLOBAL::UnreferenceDmaBuffer+0x7d:
    fffff880`04152b3d f0834018ff      lock add dword ptr [rax+18h],0FFFFFFFFh ds:ff7ffa80`0a0a5ee8=????????
    Look at the bad address it's trying to read in the exception record, then look at the very bottom under the register list where it shows you the actual instruction trying to read that address. As you notice, the actual address it was trying to read (italicized for emphasis) doesn't look anywhere near the obviously bad address that the exception record shows. In fact this one looks rather legit.

    Now at first I'm completely baffled by this a bit, because for some reason it always wants to read that bad address even though it isn't exactly pointing to it. However, upon careful scrutiny, I found some discrepancies. Look at the instruction, which is lock add dword ptr [rax+18h]. Notice that it is trying to read the value stored in the rax register and add 18h to it, then the resulting value is to be a pointer leading to the data it wants to deal with. Now, look at the rax register, which is ff7ffa800a0a5ed0. Compare it to the other registers that have similar address names, like rdx and rcx. Notice anything odd? That's right, for some reason a 7 managed to be present in that last portion of the address, making it FF7FF as opposed to the others which are FFFFFF. Why is it there? Well let's take a gander using the .formats command to evaluate these numbers in other formats and compare again (note the bold digits in Binary):
    Code:
    2: kd> 0: kd> .formats ff7ffa800a0a5ed0;.formats fffffa800a3cd820
    Evaluate expression:
      Hex:     ff7ffa80`0a0a5ed0
      Decimal: -36034844164464944
      Octal:   1775777650001202457320
      Binary:  11111111 01111111 11111010 10000000 00001010 00001010 01011110 11010000
      Chars:   .....^.
      Time:    ***** Invalid FILETIME
      Float:   low 6.66229e-033 high -3.40254e+038
      Double:  -1.4035e+306
    Evaluate expression:
      Hex:     fffffa80`0a3cd820
      Decimal: -6047142193120
      Octal:   1777777650001217154040
      Binary:  11111111 11111111 11111010 10000000 00001010 00111100 11011000 00100000
      Chars:   .....<. 
      Time:    ***** Invalid FILETIME
      Float:   low 9.09252e-033 high -1.#QNAN
      Double:  -1.#QNAN
    This is a classic case of a bit flip, which is a situation in which a single bit has inadvertently been flipped to 0 for what seems to be no reason. If you examine any of the other crashes, this shows up pretty much all the time (though not in the same registers). Now if it were the case of a long string of bits being changed, we can possibly attribute that to driver passing a bad address to the register or other hardware malfunctioning like HD or RAM (most likely). But with very small cases with only 1 bit flipped in this manner, I've only found this being caused from PSU, Mobo or CPU problems, with CPU being most likely cause.
    Attached Files Attached Files
    Last edited by Vir Gnarus; 05-10-2013 at 11:53 AM.
    jcgriff2, x BlueRobot, usasma and 2 others say thanks for this.


    • Ad Bot

      advertising
      Beep.

        
       

  2. #2

    Join Date
    Apr 2013
    Posts
    30

    Re: Bit Flips

    It seems that the general protection fault handler always raises a c0000005 exception with "Attempt to read from address ffffffffffffffff". Try "int 8" in user mode for an example. But why does an invalid memory access raise a GPF instead of page fault in the flat paged memory model of Windows?

  3. #3

    Join Date
    Mar 2012
    Posts
    469

    Re: Bit Flips

    This is one of those questions where I remember getting the answer somewhere, but I don't remember the answer. I believe it has a lot to do with the actual address, because Windows embeds safeties to detect attempting to access memory using a null reference, so it may hit a different exception routine to handle this, compared to what may be hit when an address that may actually be addressable but isn't legit (or accessed at wrong IRQL). It just assumes that there's no legitimate reason to access address 0000000000000000 or ffffffffffffffff, and that it only occurred because of using a null pointer.

  4. #4
    x BlueRobot's Avatar
    Join Date
    May 2013
    Location
    Minkowski Space
    Posts
    1,749

    Re: Bit Flips

    Sorry to bring up an old thread, but which two addresses did you use with the formats command? I couldn't see them anywhere in the above examples you provided, or maybe I just missed it completely
    Machines Can Think

    Oxygen, Nature's paradox.

  5. #5

    Join Date
    Mar 2012
    Posts
    469

    Re: Bit Flips

    Doh! Thanks for catching that. Actually what happened was I incidentally demonstrated the .formats command from another crashdump the client provided that displayed identical symptoms.

    In fact, I also noticed in this that the crashdumps aren't attached afterall! I'll see if I can scrounge it up and make the corrections/additions.

  6. #6
    x BlueRobot's Avatar
    Join Date
    May 2013
    Location
    Minkowski Space
    Posts
    1,749

    Re: Bit Flips

    Thanks :)
    Machines Can Think

    Oxygen, Nature's paradox.

  7. #7

    Join Date
    Mar 2012
    Posts
    469

    Re: Bit Flips

    Done.
    x BlueRobot says thanks for this.

  8. #8
    x BlueRobot's Avatar
    Join Date
    May 2013
    Location
    Minkowski Space
    Posts
    1,749

    Re: Bit Flips

    Thank you again
    Machines Can Think

    Oxygen, Nature's paradox.

  9. #9

    Join Date
    Apr 2013
    Posts
    30

    Re: Bit Flips

    Found out the reason for the general protection fault: on x64 processor, the 17 highest bits of a pointer have to be all zero or all one. The processor raises a GPF instead of page fault if a pointer where these bits are mixed is dereferenced. x86-64 - Wikipedia, the free encyclopedia
    Vir Gnarus says thanks for this.

  10. #10

    Join Date
    Mar 2012
    Posts
    469

    Re: Bit Flips

    Thanks a mil for that, mate. Though, I'm not sure that applies to the specific case of a null pointer dereference which involves either an all zero or all F's address, because both of those fit as a canonical address. Though again, in this case, neither of those addresses are the ones that was really accessed, but rather ff7ffa800a0a5ed0, which is noncanonical indeed.

  11. #11

    Join Date
    Apr 2013
    Posts
    30

    Re: Bit Flips

    Code:
    int main()
    {
        int* a = (int*)0xff7ffa800a0a5ed0;
        *a = 5;
    }
    Bit Flips-av-png
    Vir Gnarus and niemiro say thanks for this.

  12. #12

    Join Date
    Mar 2012
    Posts
    469

    Re: Bit Flips

    Thanks. Have you checked to see if the same occurs during access to a canonical address?

Similar Threads

  1. window 7 64 bit not updating
    By adoozay in forum Windows Update
    Replies: 5
    Last Post: 07-13-2013, 03:04 PM
  2. I'm a bit out of my depth here :0)
    By usasma in forum BSOD Processing Apps Download | Information | Discussions
    Replies: 9
    Last Post: 04-23-2013, 05:55 AM
  3. Replies: 0
    Last Post: 05-16-2012, 08:22 PM
  4. Microsoft: Windows 8 32-bit can still run 16-bit apps
    By zigzag3143 in forum Microsoft News
    Replies: 0
    Last Post: 05-16-2012, 04:59 PM

Log in

Log in