The Special BSOD Reason: 0xEF and 0xF4

axe0

Administrator,
BSOD Academy Instructor,
Security Analyst
Staff member
Joined
May 21, 2015
Posts
3,482
Location
Holland
Commonly BSOD crashes only happen within kernel mode because only in kernel mode problems can occur that are fatal to Windows. While that is a pretty strict rule, like with many other rules, there is an exception to this rule. The 0xEF or CRITICAL PROCESS DIED bugcheck is the exception to the rule. For us to understand why the 0xEF is the exception, we need to understand a part of the boot process.

In the boot process, just before the login screen is shown, Smss.exe, Wininit.exe, and Csrss.exe are working. Smss.exe (Session Manager Sub System) is a native application, which means that it does not use the APIs that Windows provides. Smss is also considered as a trusted part of Windows which allows Smss.exe to perform a few actions that few others can also do. This makes Smss a special process and is what distinguishes Smss from other user-mode processes.
Smss doesn't use the systems of Windows, because one of the first tasks Smss has is initializing those systems. The very first task Smss does is mark itself as a critical process and mark the main thread as a critical thread. That means that if, for any reason, the critical thread or the process itself exits, Windows crashes. This happens because Windows simply cannot run without processes that are marked as critical. A few other tasks Smss has is initializing Wininit and Csrss. Smss initializes session 0. Whilst the command for initializing session 0 is executed the same command to initialize session 0 is also launching Wininit process. Csrss (Client/Server Run-time Sub System) is started with the call SmpStartCsr that is executed a few tasks later. Csrss, in turn, loads win32k.sys which uses the video driver to change the resolution so you won't be greeted with a small resolution.

After the initialization session of Smss is performed, Wininit will continue to run, also marks itself as critical and some tasks later lets Winlogon take over. Meanwhile, Csrss has also been marked as critical.

Smss, Csrss, and Wininit are responsible for loading various Windows internal systems, the desktop so you can work, the login screen, and a lot more.

If any critical process or critical thread exits for any reason then Windows has no choice but to stop itself from running and thus in kernel mode it will force a BSOD with reason CRITICAL_PROCESS_DIED. While a BSOD can only be created from kernel mode, the reason behind a BSOD is not limited to kernel mode.

This is the general picture behind a critical process/thread. Not every process that is a critical process is mentioned.

While we now understand the reason behind the 0xEF crash, debugging is not made any easier. The 0xEF crash starts in user mode which brings with it a drawback. We are used to looking into kernel mode operations and have the resources (symbols) to do just that with thanks to Microsoft. Looking into user mode operations requires the largest dump that Windows can possibly make because a normal kernel memory dump (not a minidump) contains all kernel-mode data but nothing else. A complete memory dump contains both kernel-mode and user-mode data but is incredibly large, these days it can be anywhere from 8GB to 32GB, basically the size of RAM that's installed. That is however not the main drawback, the main drawback is that if we want to be able to look properly in all the data that's present, we need resources that are mostly private. Those symbols are symbols that allow Windbg to recognize code from third-party programs. Many third-party vendors get paid for writing code and there's no way they'll provide the resources to us just so we can look at a complete memory dump for a 0xEF crash that happens once in a while. We can look into user-mode data though, but since we do not have the right resources we are simply limited to what we can do. That is why debugging a 0xEF is not made any easier after understanding the reason behind the cause.

Speaking of causes, if you have looked at Carrona's BSOD Index you can notice there's barely any information available about the 0xEF crash. As such, we cannot really depend on Carrona's website for this instance we have to get this information in another way. One way is through experience. Fortunately, after a few years, it is possible to provide a shortlist of the most probable causes for this crash. From what I've seen this particular crash is mostly caused by issues with the drive or the RAM. The reason for the crash is not limited to these parts though, it could also be caused by the motherboard or PSU, for example, but it is not realistic to extend the most probable causes to more than the drive or the RAM. Theoretically, the 0xEF could be caused by some other part but the RAM or drive, in practice that rarely happens, if ever. While talking about causes, malware should not be forgotten. Although over the past couple of years malware hasn't been what it used to be, sophisticated malware can be a reason for this particular crash. Since many malware authors have changed their direction it is very unlikely to see malware these days that could potentially cause the 0xEF crash but isn't blocked by current antivirus solutions before it can.
 
Last edited:
I've actually come across another reason for this bugcheck, which is due to hardware-based stack protection, essentially the processor will maintain two copies of the stack, with the secondary copy being known as the shadow stack. This stack is meant to be intended control flow of the running thread and if there is a mismatch between the return addresses on either of the stacks, then a special hardware exception is thrown which causes the process to be terminated by Windows.

Rich (BB code):
CRITICAL_PROCESS_DIED (ef)
        A critical system process died
Arguments:
Arg1: ffffaf08b56a90c0, Process object or thread object
Arg2: 0000000000000000, If this is 0, a process died. If this is 1, a thread died.
Arg3: 0000000000000000
Arg4: 0000000000000000

Rich (BB code):
3: kd> knL
 # Child-SP          RetAddr               Call Site
00 ffff8c0a`14b3ed38 fffff801`0ed0d122     nt!KeBugCheckEx
01 ffff8c0a`14b3ed40 fffff801`0ec0c7a3     nt!PspCatchCriticalBreak+0x10e
02 ffff8c0a`14b3ede0 fffff801`0ea99290     nt!PspTerminateAllThreads+0x172917
03 ffff8c0a`14b3ee50 fffff801`0ea9908c     nt!PspTerminateProcess+0xe0
04 ffff8c0a`14b3ee90 fffff801`0e80f8f8     nt!NtTerminateProcess+0x9c << Terminate our svchost.exe process which then bugchecks the system
05 ffff8c0a`14b3ef00 fffff801`0e800ca0     nt!KiSystemServiceCopyEnd+0x28
06 ffff8c0a`14b3f098 fffff801`0e860d9d     nt!KiServiceLinkage
07 ffff8c0a`14b3f0a0 fffff801`0e8106a4     nt!KiDispatchException+0x17941d
08 ffff8c0a`14b3f8e0 fffff801`0e80e03c     nt!KiFastFailDispatch+0xe4
09 ffff8c0a`14b3fac0 00007ffc`f18833c6     nt!KiControlProtectionFault+0x2fc << Throws #CP (Control Protection) exception
0a 0000002f`4637f820 000001b1`ae000340     ntdll!RtlpGetActivationContextData+0x52
0b 0000002f`4637f828 000001b1`ae002480     0x000001b1`ae000340
0c 0000002f`4637f830 00000000`00000001     0x000001b1`ae002480
0d 0000002f`4637f838 000001b1`000000f0     0x1
0e 0000002f`4637f840 00000000`00000002     0x000001b1`000000f0
0f 0000002f`4637f848 00000050`00000000     0x2
10 0000002f`4637f850 00000000`00000002     0x00000050`00000000
11 0000002f`4637f858 00000000`000000f0     0x2
12 0000002f`4637f860 00000000`00000000     0xf0

If you examine the first parameter passed to the nt!KiDispatchException method then you will see the type of exception being thrown.

Rich (BB code):
07 ffff8c0a14b3f0a0 fffff8010e8106a4 nt!KiDispatchException+17941d (perf)
    Parameter[0] = ffff8c0a14b3fa18
    Parameter[1] = 0000000000000000
    Parameter[2] = ffffffffffffff80
    Parameter[3] = 0000002f4637f820

Rich (BB code):
3: kd> .exr ffff8c0a14b3fa18
ExceptionAddress: 00007ffcf18833c6 (ntdll!RtlpGetActivationContextData+0x0000000000000052)
   ExceptionCode: c0000409 (Security check failure or stack buffer overrun)
  ExceptionFlags: 00000001
NumberParameters: 1
   Parameter[0]: 0000000000000039
Subcode: 0x39 FAST_FAIL_CONTROL_INVALID_RETURN_ADDRESS Shadow stack violation

This is exception will be more likely caused by drivers rather than hardware, although, I wouldn't completely rule that out as being a cause.

Here is a very good read about how shadow stacks are used with Windows: Developer Guidance for Hardware-enforced Stack Protection
 
If the 0xEF is commonly the result of a RAM or drive problem, wouldn't we also expect to see other problems in addition to the 0xEF? Different bugchecks for example, or errors and failures in the system log?
 
RAM and drive problems can cause both a variety but also specific problems, it depends on the exact problem. With hard drives, for example, bad blocks are a frequent problem that can cause file corruption, depending on how many blocks become bad you may have a problem with a single file or program using a file located on one bad block or many files/programs without it affecting anything but those files/programs. Due to filesystem healing capabilities and blocks in reserve, wherever possible, data on a bad block gets moved to another block so the user might not even notice.

Over the past years, whenever I saw 0xEF or 0xF4 mentioned, good chance you might also spot 0x7A and/or 0x124 in the logs. That combo of crashes almost always led to either RAM or HDD.

I wouldn't hold any expectations whenever RAM or hard drive is suspicious that you should also see different bugchecks. Yes, it happens, but you may also see just 0xEF or just 0xF4, I've seen that a couple of times too.
 
Back
Top