There is a kernel level driver installed on a terminal server.It works fine for certain period of time on that terminal sever. later on
that terminal server itself getting into freezed state where nobody can RDP & web console to connect with server. In my case,
CPU is always hitting to 100% in freezed state and i had to hard reboot only by using VM option "power off". After unstalling that driver the terminal server works fine or even responds properly always.Even if it is 100% CPU usage and gets slow but still reponds to the RDP & web console.
That scenario is kind of hard to reproduce it. but still i got successful to fetch complete memory dump out of that machine in that scenario then i analyzed full memory dump using microsoft WinDbg tool. WinDbg tool displayed faulty driver module name and call stack as below
Module Name: MMTEProxy (Installed Driver)
0: kd> !analyze -v
***
* *
* Bugcheck Analysis *
* *
***
NMI_HARDWARE_FAILURE (80)
This is typically due to a hardware malfunction. The hardware supplier should
be called.
Arguments:
Arg1: 00000000004f4454
Arg2: 0000000000000000
Arg3: 0000000000000000
Arg4: 0000000000000000
Debugging Details:
------------------
KEY_VALUES_STRING: 1
PROCESSES_ANALYSIS: 1
SERVICE_ANALYSIS: 1
STACKHASH_ANALYSIS: 1
TIMELINE_ANALYSIS: 1
DUMP_CLASS: 1
DUMP_QUALIFIER: 402
BUILD_VERSION_STRING: 9600.17415.amd64fre.winblue_r4.141028-1500
SYSTEM_MANUFACTURER: VMware, Inc.
VIRTUAL_MACHINE: VMware
SYSTEM_PRODUCT_NAME: VMware Virtual Platform
SYSTEM_VERSION: None
BIOS_VENDOR: Phoenix Technologies LTD
BIOS_VERSION: 6.00
BIOS_DATE: 04/05/2016
BASEBOARD_MANUFACTURER: Intel Corporation
BASEBOARD_PRODUCT: 440BX Desktop Reference Platform
BASEBOARD_VERSION: None
DUMP_TYPE: 0
BUGCHECK_P1: 4f4454
BUGCHECK_P2: 0
BUGCHECK_P3: 0
BUGCHECK_P4: 0
CPU_COUNT: 2
CPU_MHZ: bb8
CPU_VENDOR: GenuineIntel
CPU_FAMILY: 6
CPU_MODEL: 3e
CPU_STEPPING: 4
CPU_MICROCODE: 6,3e,4,0 (F,M,S,R) SIG: 42C'00000000 (cache) 42C'00000000 (init)
DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT
BUGCHECK_STR: 0x80
PROCESS_NAME: svchost.exe
CURRENT_IRQL: 0
ANALYSIS_SESSION_HOST: INPN01LAP107
ANALYSIS_SESSION_TIME: 03-26-2019 16:30:13.0120
ANALYSIS_VERSION: 10.0.18317.1001 amd64fre
LAST_CONTROL_TRANSFER: from fffff8005ae205b2 to fffff8009a6601a7
STACK_TEXT:
nt!KxWaitForLockOwnerShip+0x27
MMTEProxy!SVSessionLutTranslatePort+0x2c2 [c:\users\dkelone\git\MMTE\MMTE\MMTEdriver\sessionlut.c @ 873]
MMTEProxy!PerformProxySocketRedirection+0xba7 [c:\users\dkelone\git\MMTE\MMTE\MMTEdriver\filteralebindredirect.c @ 247]
MMTEProxy!TriggerProxyByALERedirectInline+0x244 [c:\users\dkelone\git\MMTE\MMTE\MMTEdriver\filteralebindredirect.c @ 690]
MMTEProxy!DDProxyBindRedirectClassify+0x537 [c:\users\dkelone\git\MMTE\MMTE\MMTEdriver\filteralebindredirect.c @ 881]
THREAD_SHA1_HASH_MOD_FUNC: 03f7fb5fd041c46c9b4dff8f1685ccff753d3642
THREAD_SHA1_HASH_MOD_FUNC_OFFSET: 7f4a5e830d38804e610244f134268d53640c97a0
THREAD_SHA1_HASH_MOD: 2a8f232a3e3c38ad2a6b44b0d2253b97c2ac4b2a
FOLLOWUP_IP:
MMTEProxy!SVSessionLutTranslatePort+2c2 [c:\users\dkelone\git\MMTE\MMTE\MMTEdriver\sessionlut.c @ 873]
fffff800`5ae205b2 c644244000 mov byte ptr [rsp+40h],0
FAULT_INSTR_CODE: 402444c6
FAULTING_SOURCE_LINE: c:\users\dkelone\git\MMTE\MMTE\MMTEdriver\sessionlut.c
FAULTING_SOURCE_FILE: c:\users\dkelone\git\MMTE\MMTE\MMTEdriver\sessionlut.c
FAULTING_SOURCE_LINE_NUMBER: 873
FAULTING_SOURCE_CODE:
No source found for 'c:\users\dkelone\git\MMTE\MMTE\MMTEdriver\sessionlut.c'
SYMBOL_STACK_INDEX: 1
SYMBOL_NAME: MMTEProxy!SVSessionLutTranslatePort+2c2
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: MMTEProxy
IMAGE_NAME: MMTEProxy.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 5a60d5f0
STACK_COMMAND: .thread ; .cxr ; kb
BUCKET_ID_FUNC_OFFSET: 2c2
FAILURE_BUCKET_ID: 0x80_MMTEProxy!SVSessionLutTranslatePort
BUCKET_ID: 0x80_MMTEProxy!SVSessionLutTranslatePort
PRIMARY_PROBLEM_CLASS: 0x80_MMTEProxy!SVSessionLutTranslatePort
TARGET_TIME: 2019-02-26T11:15:36.000Z
OSBUILD: 9600
OSSERVICEPACK: 0
SERVICEPACK_NUMBER: 0
OS_REVISION: 0
SUITE_MASK: 16
PRODUCT_TYPE: 3
OSPLATFORM_TYPE: x64
OSNAME: Windows 8.1
OSEDITION: Windows 8.1 Server TerminalServer
OS_LOCALE:
USER_LCID: 0
OSBUILD_TIMESTAMP: 2014-10-29 06:08:48
BUILDDATESTAMP_STR: 141028-1500
BUILDLAB_STR: winblue_r4
BUILDOSVER_STR: 6.3.9600.17415.amd64fre.winblue_r4.141028-1500
ANALYSIS_SESSION_ELAPSED_TIME: 685
ANALYSIS_SOURCE: KM
FAILURE_ID_HASH_STRING: km:0x80_MMTEProxy!svsessionluttranslateport
FAILURE_ID_HASH: {c64b7e97-0bf3-daf1-ad95-9f39cbf37a9a}
Followup: MachineOwner
---------
Since i am not expert in kernel level driver development,But i tried to google about driver. Internally it uses the following lock to perform any operation at process table or session table
#Code snippet
{
...
...
KeAcquireInStackQueuedSpinLock(&gProcessTableLock,&processTableLockHandle);
tempNode = processTableListHead;
while (processTableListHead != tempNode->Flink)
{
tempNode = tempNode->Flink;
if (CONTAINING_RECORD(tempNode, PROCESS_TABLE, list_entry)->processId == processId &&
CONTAINING_RECORD(tempNode, PROCESS_TABLE, list_entry)->inUse == TRUE)
{
*sessionID = CONTAINING_RECORD(tempNode, PROCESS_TABLE, list_entry)->sessionId;
found = TRUE;
break;
}
}
....
....
....
KeReleaseInStackQueuedSpinLock(&processTableLockHandle);
}
With help of WinDbg tool, What i observed here, Mostly it is failling at source line no where assinging the value to a variables and that variables defined before accuiring the lock. You can see it in above driver code snippet. my driver is a WFP ALE filtered driver. it inspects traffic it works in a multhreaded environment and my driver allocates/freed memory in non-paged pool
Still i am not getting what causing this issue. whether its lock is not handled properly at code level or some particular situation.
Can you please help me with pointer or direction?