Multiple BOSD mainly Driver Overran Stack Buffer

RABBoT

Member
Joined
Feb 10, 2014
Posts
12
I need some help pinpointing the cause of my repeating BSOD problem.
First I'll list the specs, but the issue is still there with most of everything taken off the board.

System Information
Operating System: Windows 8.1 Pro 64-bit
Processor: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
Motherboard: ASUS MAXIMUS VI HERO LGA 1150 Intel Z87
Memory: CORSAIR DOMINATOR GT 8GB
GPU: Nvidia 660Ti (x2 in SLI)
SSD: Kingston HyperX 120GB
HDD: Seagate Barracuda 3TB 7200 RPM
PowerSupply: Thermaltake Toughpower Grand TPG-1200M 1200W

Now I am receiving a variety of BSOD errors that all happen withing about an hour or startup. The most common are DRIVER_OVERRAN_STACK_BUFFER, MEMORY_MANAGEMENT, KMODE_EXCEPTION_NOT_HANDLED, and WHEA_UNCORRECTABLE_ERROR.

These errors will still occur even if I remove both GPU and run off integrated graphics.
I have also ran Memtest86+ for several hours and found no problems with the RAM sticks.

I am running the system several times using Prime95, and will get BSOD with any combination of my two 660Ti in the PCI-Es within about 20 min. If I remove both GPUs and use integrated it lasts about an hour before BSOD.

All parts are new except the SSD, and the GPUs which aren't even a year old. With what I've gotten I'm thinking it must be the motherboard since the RAM appears to be ok and I know the power supply tested working as well.

The Windows install was a fresh installation, and as far as drivers go I have also updated to the latest versions for all the hardware as of 2/6/14.

Any help is appreciated and if I need to give more info just ask Thanks.
 

Attachments

Hi,

We have a lot of hardware bug checks here, we may end up requiring a kernel-dump. We'll see.

WHEA_UNCORRECTABLE_ERROR (124)

A fatal hardware error has occurred. This fatal error displays data from the Windows Hardware Error Architecture (WHEA).

If we run an !errrec on the 2nd parameter of the bugcheck (address of the WER structure) we get the following:

Code:
===============================================================================
Section 0     : Processor Generic
-------------------------------------------------------------------------------
Descriptor    @ ffffe000040bc0a8
Section       @ ffffe000040bc180
Offset        : 344
Length        : 192
Flags         : 0x00000001 Primary
Severity      : Fatal

Proc. Type    : x86/x64
Instr. Set    : x64
[COLOR=#ff0000][I][B]Error Type    : Cache error[/B][/I][/COLOR]
Operation     : Generic
Flags         : 0x00
Level         : 0
CPU Version   : 0x00000000000306c3
Processor ID  : 0x0000000000000007


Code:
===============================================================================
Section 2     : x86/x64 MCA
-------------------------------------------------------------------------------
Descriptor    @ ffffe000040bc138
Section       @ ffffe000040bc2c0
Offset        : 664
Length        : 264
Flags         : 0x00000000
Severity      : Fatal

[COLOR=#ff0000][I][B]Error         : DCACHEL0_WR_ERR (Proc 7 Bank 1)[/B][/I][/COLOR]
  Status      : 0xbf80000000000124
  Address     : 0x000000022195b080
  Misc.       : 0x0000000000000086

DCACHEL0_WR_ERR = L0 Cache Write error, it occurred specifically on Processor #7 and Cache Bank 1.

CLOCK_WATCHDOG_TIMEOUT (101)


This indicates that an expected clock interrupt on a secondary processor, in a multi-processor system, was not received within the allocated interval.

We'll need a kernel dump to use any useful commands for debugging a *101.

APC_INDEX_MISMATCH (1)

This indicates that there has been a mismatch in the APC state index.

The most common cause of this bug check is when a file system or driver has a mismatched sequence of calls to disable and re-enable APCs. The key data item is the Thread->CombinedApcDisable field. The CombinedApcDisable field consists of two separate 16-bit fields: SpecialApcDisable and KernelApcDisable. A negative value of either field indicates that a driver has disabled special or normal APCs (respectively) without re-enabling them. A positive value indicates that a driver has enabled special or normal APCs too many times.

MEMORY_MANAGEMENT (1a)

This indicates that a severe memory management error occurred.

BugCheck 1A, {41287, 580000000a8, 0, 0}- The 1st parameter of the bug check is 41287 which indicates an illegal page fault occurred while holding working set synchronization.-- PROCESS_NAME: prime95.exe

---------------


1. You have various problematic and unnecessary Asus bloatware installed such as Asus PC Probe, please uninstall ASAP.

2. Remove and replace AVG with Windows 8's built-in Windows Defender for temporary troubleshooting purposes:

AVG removal - http://www.avg.com/us-en/utilities

Windows Defender (how to turn on after removal) - Windows Defender - Turn On or Off in Windows 8

3. SuperANTISpyware is also installed which is likely causing conflicts with AVG, so please remove as well.

4. If you're still crashing after the above, please set your system to generate kernel dumps - Creating a Kernel-Mode Dump File (Windows Debuggers)

Regards,

Patrick
 
Ok after doing above steps and more testing here is the next minidump files.

I'll also add that looking into the latest nvidia driver notes it had something related to problems when running with 144Hz monitors which I am running three of, but problem still existed when doing tests using every driver back to about November of 2013.
 

Attachments

You are wanting the Memory.dmp file correct? If that's the case the forum wont let me attach the zip of that file. I may be able to slip the file into parts if it is a size limit to the forum attachments.
 
Correct, you'll need to upload it to a 3rd party host such as Onedrive, Mediafire, Dropbox, etc.

Regards,

Patrick
 
My pleasure, and not a problem!

Right, so the kernel-dump is of the DRIVER_OVERRAN_STACK_BUFFER (f7) bug check.

This indicates that a driver has overrun a stack-based buffer.

Code:
5: kd> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffffd000`207cdac8 fffff800`685bc441 : 00000000`000000f7 4d140000`000089ee 0000a0fc`844448ef ffff5f03`7bbbb710 : nt!KeBugCheckEx
ffffd000`207cdad0 fffff800`6845afbc : ffffd000`207a3180 ffffd000`207cdb4c ffffd000`207cdb50 ffffd000`207cdb58 : [COLOR=#ff0000][I][B]nt!_report_gsfailure+0x25[/B][/I][/COLOR]
ffffd000`207cdb10 fffff800`685567bc : ffffd000`207a3180 ffffd000`207a3180 ffffd000`207af340 00000000`00000000 : nt!PoIdle+0x2ac
ffffd000`207cdc60 00000000`00000000 : ffffd000`207ce000 ffffd000`207c8000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x2c

-- DEFAULT_BUCKET_ID: GS_FALSE_POSITIVE_MISSING_GSFRAME

Was Driver Verifier enabled at the time of this crash? Hopefully no, because we'll need to enable it to likely get more information as to what specifically caused this buffer overrun. It doesn't appear to be enabled:

5: kd> !verifier

Verify Level 0 ...

Driver Verifier:

What is Driver Verifier?

Driver Verifier is included in Windows 8, 7, Windows Server 2008 R2, Windows Vista, Windows Server 2008, Windows 2000, Windows XP, and Windows Server 2003 to promote stability and reliability; you can use this tool to troubleshoot driver issues. Windows kernel-mode components can cause system corruption or system failures as a result of an improperly written driver, such as an earlier version of a Windows Driver Model (WDM) driver.

Essentially, if there's a 3rd party driver believed to be at issue, enabling Driver Verifier will help flush out the rogue driver if it detects a violation.

Before enabling Driver Verifier, it is recommended to create a System Restore Point:

Vista - START | type rstrui - create a restore point
Windows 7 - START | type create | select "Create a Restore Point"
Windows 8 - Restore Point - Create in Windows 8

How to enable Driver Verifier:

Start > type "verifier" without the quotes > Select the following options -

1. Select - "Create custom settings (for code developers)"
2. Select - "Select individual settings from a full list"
3. Check the following boxes -
- Special Pool
- Pool Tracking
- Force IRQL Checking
- Deadlock Detection
- Security Checks (Windows 7 & 8)
- DDI compliance checking (Windows 8)
- Miscellaneous Checks
4. Select - "Select driver names from a list"
5. Click on the "Provider" tab. This will sort all of the drivers by the provider.
6. Check EVERY box that is NOT provided by Microsoft / Microsoft Corporation.
7. Click on Finish.
8. Restart.

Important information regarding Driver Verifier:

- If Driver Verifier finds a violation, the system will BSOD.

- After enabling Driver Verifier and restarting the system, depending on the culprit, if for example the driver is on start-up, you may not be able to get back into normal Windows because Driver Verifier will flag it, and as stated above, that will cause / force a BSOD.

If this happens, do not panic, do the following:

- Boot into Safe Mode by repeatedly tapping the F8 key during boot-up.

- Once in Safe Mode - Start > Search > type "cmd" without the quotes.

- To turn off Driver Verifier, type in cmd "verifier /reset" without the quotes.
・ Restart and boot into normal Windows.

If your OS became corrupt or you cannot boot into Windows after disabling verifier via Safe Mode:

- Boot into Safe Mode by repeatedly tapping the F8 key during boot-up.

- Once in Safe Mode - Start > type "system restore" without the quotes.

- Choose the restore point you created earlier.

How long should I keep Driver Verifier enabled for?

It varies, many experts and analysts have different recommendations. Personally, I recommend keeping it enabled for at least 24 hours. If you don't BSOD by then, disable Driver Verifier.

My system BSOD'd, where can I find the crash dumps?

They will be located in %systemroot%\Minidump

Any other questions can most likely be answered by this article:
Using Driver Verifier to identify issues with Windows drivers for advanced users

Regards,

Patrick
 
I did not have driver verifier active on any of the last dumps. Any time I try to activate the system BSODs before being able to launch windows.
Here is the latest minidump with the driver verifier currently active.
 

Attachments

Great, thanks for the kernel-dump!

*C4 bug check as usual, but the call stack is different this time as we have a kernel:

Code:
2: kd> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffffd000`2222d548 fffff802`944e46a8 : 00000000`000000c4 00000000`00000000 00000000`00000000 00000000`00000001 : nt!KeBugCheckEx
ffffd000`2222d550 fffff802`944fac20 : ffffe000`00000000 00000000`00000057 ffffe000`04c81040 fffff802`93e78000 : nt!VerifierBugCheckIfAppropriate+0x3c
ffffd000`2222d590 fffff802`944d9bb5 : 00000000`00000000 00000000`00000020 00000000`70617257 00000000`00000000 : nt!ExAllocatePoolSanityChecks+0xd4
ffffd000`2222d5e0 fffff800`002d2a17 : 00000000`00000081 00000000`00000000 00000000`70617257 00000000`70617257 : nt!VeAllocatePoolWithTagPriority+0x89
ffffd000`2222d650 fffff802`944d9e91 : 00000000`00000081 00000000`00000000 ffffe000`05d7d000 fffff800`03561d85 : VerifierExt!ExAllocatePoolWithTagPriority_internal_wrapper+0x7b
ffffd000`2222d690 fffff800`03561d98 : 00000000`00000000 ffffe000`05d7d000 ffffe000`05d7d000 00000000`00000082 : nt!VerifierExAllocatePool+0x61
ffffd000`2222d6d0 fffff800`035603f3 : ffffd000`2222d7bc ffffd000`2222d950 ffffe000`05d7d000 00000000`00000000 : [COLOR=#ff0000][I][B]asramdisk+0x2d98[/B][/I][/COLOR]
ffffd000`2222d770 fffff802`942bf476 : ffffe000`05d0e380 ffffe000`05d7d000 ffffd000`2222d950 ffffe000`05d7d000 : [COLOR=#ff0000][I][B]asramdisk+0x13f3[/B][/I][/COLOR]
ffffd000`2222d850 fffff802`94330caa : 00000000`00000000 00000000`00000000 fffff802`941171c0 ffffe000`04c81040 : nt!IopLoadDriver+0x5e2
ffffd000`2222db10 fffff802`93f191b9 : fffff802`00000000 ffffffff`80000958 fffff802`94330c5c ffffe000`039d3540 : nt!IopLoadUnloadDriver+0x4e
ffffd000`2222db50 fffff802`93f052e4 : ffffd000`20673340 ffffe000`04c81040 ffffe000`04c81040 ffffe000`000dd900 : nt!ExpWorkerThread+0x2b5
ffffd000`2222dc00 fffff802`93fcc2c6 : ffffd000`20667180 ffffe000`04c81040 ffffd000`20673340 fffff802`944e8de3 : nt!PspSystemThreadStartup+0x58
ffffd000`2222dc60 00000000`00000000 : ffffd000`2222e000 ffffd000`22228000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16

-- FAILURE_BUCKET_ID: 0xc4_0_VRF_asramdisk+2d98

^^ Verifier detected asramdisk.sys in violation, which is AsRamDisk from Asus. I am not sure what Asus software this is bundled with, but if I had to imagine, it also comes with PC Probe. I am unsure as to why you cannot find any mention of Asus PC Probe within Control Panel, Revo, etc. Is there any mention of Asus software in general, such as AI Booster, not even PC Probe? Uninstall all Asus software, essentially. If there is no mention of Asus software, even PC Probe, we're going to need to do this in an interesting way:

1. Create a Restore Point - Restore Point - Create in Windows 8

2. After the restore point is created, navigate to C:\Windows\System32\Drivers

3. Once inside the Drivers directory, find and rename the following drivers:

AsIO.sys > AsIO.old

AsUpIO.sys > AsUpIO.old

AsInsHelp64.sys > AsInsHelp64.old

and if AsRamDisk is not in Control Panel either, do the same.... asramdisk.sys > asramdisk.old

Restart.

Regards,

Patrick
 
Ok looks like I am able to boot up with verifier running now. I found the AsRamdisk program and removed it, but none of the other Asus drivers were in the normal C:\Windows\System32\Drivers. I did manage to find them in the C:\Windows\sysWOW64 folder and renamed them there. At this point there shouldn't be anything left running from Asus.
I'll send the next memory.dmp if this fails soon.

Thanks Again,
Mason
 
Alright, so it looks like we are no longer dealing with any issues regarding software, as I originally assumed with the *124, but I wanted to be sure given DV flagged Asus. The attached DMP file is of the PAGE_FAULT_IN_NONPAGED_AREA (50) bug check.

This indicates that invalid system memory has been referenced.

Bug check 0x50 usually occurs after the installation of faulty hardware or in the event of failure of installed hardware (usually related to defective RAM, be it main memory, L2 RAM cache, or video RAM).

Another common cause is the installation of a faulty system service.

Antivirus software can also trigger this error, as can a corrupted NTFS volume.


Code:
4: kd> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffffd000`28685508 fffff803`9f77bb1f : 00000000`00000050 ffffc000`2a95c108 00000000`00000000 ffffd000`286856f0 : nt!KeBugCheckEx
ffffd000`28685510 fffff803`9f6425ad : 00000000`00000000 ffffe000`01d8d080 ffffd000`286856f0 00000000`00000001 : nt! ?? ::FNODOBFM::`string'+0x1797f
ffffd000`286855b0 fffff803`9f75df2f : 00000000`00000000 ffffc000`018f7010 00000000`00000000 ffffd000`286856f0 : nt!MmAccessFault+0x7ed
ffffd000`286856f0 fffff800`01c97c60 : ffffe000`01d70c70 ffff1cfe`de8368bd ffffd000`207a3180 00000000`00000000 : nt!KiPageFault+0x12f (TrapFrame @ ffffd000`286856f0)
ffffd000`28685880 fffff800`01c67696 : 00000000`00000000 fffff800`01c9e2c7 00000000`00000002 00000000`00000000 : [COLOR=#ff0000][I][B]dxgmms1!VIDMM_GLOBAL::ReferenceDmaBuffer+0x200[/B][/I][/COLOR]
ffffd000`28685c40 fffff800`01b3bf35 : 00000000`00000200 ffffd000`28686950 00000000`00000200 ffffd000`28685dc0 : [COLOR=#ff0000][I][B]dxgmms1!VidMmReferenceDmaBuffer+0x56[/B][/I][/COLOR]
ffffd000`28685ca0 fffff800`01b3b8e5 : ffffc000`08b2e000 00000000`00000000 00000000`00000000 ffffc000`013846c0 : [COLOR=#ff0000][I][B]dxgkrnl!DXGCONTEXT::Render+0x225[/B][/I][/COLOR]
ffffd000`286867f0 fffff803`9f75f4b3 : ffffe000`01d8d080 ffffe000`01d8d080 00000000`0fac0000 ffffe000`01139e70 : [COLOR=#ff0000][I][B]dxgkrnl!DxgkRender+0x325[/B][/I][/COLOR]
ffffd000`28686b00 00000000`77d777fa : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ ffffd000`28686b00)
00000000`1299e6b8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x77d777fa

^^ We have various Direct X MMS and Kernel routines being called.

-- FAILURE_BUCKET_ID: AV_VRF_dxgmms1!VIDMM_GLOBAL::ReferenceDmaBuffer

Verifier has flagged Direct X MMS.



At this point, it's likely either faulty RAM or GPU. Let's start with Memtest for NO LESS than ~8 passes (several hours):

Memtest86+:

Download Memtest86+ here:

Memtest86+ - Advanced Memory Diagnostic Tool

Which should I download?

You can either download the pre-compiled ISO that you would burn to a CD and then boot from the CD, or you can download the auto-installer for the USB key. What this will do is format your USB drive, make it a bootable device, and then install the necessary files. Both do the same job, it's just up to you which you choose, or which you have available (whether it's CD or USB).

How Memtest works:

Memtest86 writes a series of test patterns to most memory addresses, reads back the data written, and compares it for errors.

The default pass does 9 different tests, varying in access patterns and test data. A tenth test, bit fade, is selectable from the menu. It writes all memory with zeroes, then sleeps for 90 minutes before checking to see if bits have changed (perhaps because of refresh problems). This is repeated with all ones for a total time of 3 hours per pass.

Many chipsets can report RAM speeds and timings via SPD (Serial Presence Detect) or EPP (Enhanced Performance Profiles), and some even support changing the expected memory speed. If the expected memory speed is overclocked, Memtest86 can test that memory performance is error-free with these faster settings.

Some hardware is able to report the "PAT status" (PAT: enabled or PAT: disabled). This is a reference to Intel Performance acceleration technology; there may be BIOS settings which affect this aspect of memory timing.

This information, if available to the program, can be displayed via a menu option.

Any other questions, they can most likely be answered by reading this great guide here:

FAQ : please read before posting

Regards,

Patrick
 
Ok just finished with the Memtest86+ and ran for over 24 hours with no errors detected. I also am in the 90% sure range that it isn't the 660ti because I still receive BSOD when using integrated GPU or a single 660 that I had ran for about a year on already. I also ran both 660s flawlessly for about 2-3 months on old system before I upgraded CPU, Motherboard, and RAM. The only other two new parts in the machine are the CPU itself and the motherboard. I personally had been thinking that the motherboard may be the culprit, but from what I've sent does that seem like a viable cause with the variety of errors?

I can try to get it to do another memory.dmp with both 660s not installed.

Thanks,
Mason
 
Thanks a bunch!

It's a *124 bug check with the same WHEA output we've seen above (DCACHEL0_WR_ERR (Proc 0 Bank 1). The only difference this time is it was processor #0 (main CPU core) and not #7.

At this point, I believe it's a safe call to make that the CPU is faulty. Follow the list:

1. Ensure your temperatures are within standard and nothing's overheating. You can use a program such as Speccy if you'd like to monitor temps - Speccy - System Information - Free Download

2. Clear your CMOS (or load optimized BIOS defaults) to ensure there's no improper BIOS setting - How To Clear CMOS (Reset BIOS)

3. Ensure your BIOS is up to date.

4. The only software conflict that can usually cause *124 bugchecks are OS to BIOS utilities from manufacturer's like Asus' AI Suite. If you have something like this software-wise, remove it ASAP.

5. If all of the above fail, the only left to do is replace your processor as it is faulty.

Regards,

Patrick
 

Has Sysnative Forums helped you? Please consider donating to help us support the site!

Back
Top