Server 2016 - Random crash with "a runtime critical stop occurred" on iDRAC

Sunfyre

Member
Joined
Jun 8, 2021
Posts
5
I have a Dell server that has randomly shut down three times in the last three months, it is a Hyper-V server running Windows Server 2016.
It shows no health issues with the hardware and has dual PSUs with one connected to a UPS - there are no critical alerts shown on the OS, UPS or iDRAC - it only says "a runtime critical stop occurred" on the iDRAC.
I've seen that MBAM has caused these symptoms but this server is running Vipre and no other AV.
Any advice would be greatly appreciated ;)
 
So, it appears that you had a few BSOD crashes in the past, here's latest which is a month old now.

Rich (BB code):
22: kd> knL
 # Child-SP          RetAddr           Call Site
00 ffff8681`2e089d88 fffff802`d4439507 nt!KeBugCheckEx
01 ffff8681`2e089d90 fffff802`d4436778 nt!KeAccumulateTicks+0x407
02 ffff8681`2e089df0 fffff802`d4c264e5 nt!KeClockInterruptNotify+0xb8
03 ffff8681`2e089f40 fffff802`d44b13f6 hal!HalpTimerClockIpiRoutine+0x15
04 ffff8681`2e089f70 fffff802`d4564ada nt!KiCallInterruptServiceRoutine+0x106
05 ffff8681`2e089fb0 fffff802`d4564fc7 nt!KiInterruptSubDispatchNoLockNoEtw+0xea
06 ffff8681`2e081d50 fffff802`d441a314 nt!KiInterruptDispatchNoLockNoEtw+0x37
07 ffff8681`2e081ee0 fffff802`d441a2c4 nt!KxWaitForLockOwnerShip+0x34
08 ffff8681`2e081f10 fffff801`ec5272b4 nt!KeAcquireInStackQueuedSpinLock+0x44 << Implictly raises the IRQL Level to 2
09 ffff8681`2e081f40 00000000`00000001 sbwfw+0x72b4 << Our issue
0a ffff8681`2e081f48 00000000`0000dd86 0x1
0b ffff8681`2e081f50 ffff8681`2e082190 0xdd86
0c ffff8681`2e081f58 fffff801`ec535750 0xffff8681`2e082190
0d ffff8681`2e081f60 ffff8681`2e082190 sbwfw+0x15750
0e ffff8681`2e081f68 fffff801`ec5250af 0xffff8681`2e082190
0f ffff8681`2e081f70 ffffb181`3910bcac sbwfw+0x50af
10 ffff8681`2e081f78 00000000`001f419a 0xffffb181`3910bcac
11 ffff8681`2e081f80 ffff8681`2e082190 0x1f419a
12 ffff8681`2e081f88 ffffa98d`24a2aee0 0xffff8681`2e082190
13 ffff8681`2e081f90 fffff801`ec5666c8 0xffffa98d`24a2aee0
14 ffff8681`2e081f98 fffff801`ec527741 sbwfw+0x466c8
15 ffff8681`2e081fa0 00000000`00000001 sbwfw+0x7741
16 ffff8681`2e081fa8 ffff8681`2e082068 0x1
17 ffff8681`2e081fb0 ffff8681`2e18d350 0xffff8681`2e082068
18 ffff8681`2e081fb8 fffff801`ec566dd9 0xffff8681`2e18d350
19 ffff8681`2e081fc0 00000000`00000002 sbwfw+0x46dd9
1a ffff8681`2e081fc8 fffff801`ec5666d8 0x2
1b ffff8681`2e081fd0 ffffa98d`050f0230 sbwfw+0x466d8
1c ffff8681`2e081fd8 ffffa98d`06d9bd01 0xffffa98d`050f0230
1d ffff8681`2e081fe0 00000000`00000001 0xffffa98d`06d9bd01
1e ffff8681`2e081fe8 ffffa98d`06ffd7a0 0x1
1f ffff8681`2e081ff0 00000000`00000001 0xffffa98d`06ffd7a0
20 ffff8681`2e081ff8 fffff801`ec5279b4 0x1
21 ffff8681`2e082000 00000000`00000000 sbwfw+0x79b4

Rich (BB code):
22: kd> lmvm sbwfw
Browse full module list
start             end                 module name
fffff801`ec520000 fffff801`ec578000   sbwfw    T (no symbols)           
    Loaded symbol image file: sbwfw.sys
    Image path: \SystemRoot\system32\DRIVERS\sbwfw.sys
    Image name: sbwfw.sys
    Browse all global symbols  functions  data
    Timestamp:        Thu Apr 23 18:38:32 2020 (5EA1D298)
    CheckSum:         0005E67F
    ImageSize:        00058000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:

It appears that your ThreatTrack Firewall driver is causing some issues here. I would suggest checking if there is an updated version for the program or alternatively opening a support ticket with them, if you're unable to remove the software from the server.
 
It shows no health issues with the hardware;)


This is preventative maintenance: (WHEA corrected hardware errors 01/14 and 04/14)


The logs reported WHEA corrected hardware errors: Memory Single-Bit ECC




The computer has 12 RAM modules that are 16 GB each.

These are matching modules that are ECC.

There should be no ECC errors.


See if you can swap test RAM as there would be significant downtime to test 4 modules at a time.


ECC memory - Wikipedia





Code:
Event[11933]:
  Log Name: System
  Source: Microsoft-Windows-WHEA-Logger
  Date: 2021-04-14T00:48:33.420
  Event ID: 23
  Task: N/A
  Level: Warning
  Opcode: Info
  Keyword: N/A
  User: S-1-5-19
  User Name: NT AUTHORITY\LOCAL SERVICE
  Computer: AbertayVM2
  Description:
A corrected hardware error has occurred.

Component: Memory
Error Source: Generic
Error Type: Single-Bit ECC

The details view of this entry contains further information.


Event[29174]:
  Log Name: System
  Source: Microsoft-Windows-WHEA-Logger
  Date: 2021-01-14T16:01:22.763
  Event ID: 23
  Task: N/A
  Level: Warning
  Opcode: Info
  Keyword: N/A
  User: S-1-5-19
  User Name: NT AUTHORITY\LOCAL SERVICE
  Computer: AbertayVM2
  Description:
A corrected hardware error has occurred.

Component: Memory
Error Source: Generic
Error Type: Single-Bit ECC

The details view of this entry contains further information.
 
Thank you my friend, I will arrange to remove Vipre from the Hyper-V server tonight, I will update on how it goes.

@zbook - I'll replace Vipre first as that's a much easier operation to deal with, thanks.
 
Just a quick bit of info that might help....
"The previous system shutdown at 11:34:49 on ‎07/‎06/‎2021 was unexpected."
 
Nothing, no - that's what made it look all a bit strange due to the lack of the info in event viewer, iDRAC and UPS (APC)
 

Has Sysnative Forums helped you? Please consider donating to help us support the site!

Back
Top