Server 2012 R2 possible memory leak?

bascotie

Well-known member
Joined
Feb 14, 2021
Posts
206
Hi guys,

So I don't know if this 100% classifies as a BSOD, it's a bit of a long story and I'll summarize it as succinctly as possible.

- It's a 2012 R2 Server which hosts AD/DNS/SQL/RDS. It's mainly used for a rental program called Point of Rentals so we have anywhere from 30-60 people on an average day remoted into the server mostly to use that program which is SQL based (we have sql management studio installed on the server). The server has 112GB RAM
- A few months ago, the server crashed (BSOD, I assume, and wouldn't boot. Boot repair didn't work, but a chkdsk got it booting again)
- 2 months later, a similar issue except this time we had to restore the most recent backup image to get it working.
- To my dismay, a week later, it happens again. A tech finds that the RAID card in the Dell server is extremely hot / looks loose so we replace it.
- Things work pretty well for a while, but then we start running into issues where the server gets very slow. Sometimes, it'll be slow but only showing 65% memory usage, but many others times, it'll show 97% or so memory usage, even though in task manager the most consuming process is SQL which is using about 65GB (which is the limit we set for it, we've tried other limits as well for testing)
- Rebooting the server fixes this for a couple days, or more, then it happens again.
- I've checked with Dell Openmanage and everything checks out (no failing drives, etc). This past weekend, I updated the RAID drivers (but could not update bios and RAID firmware remotely) so we'll see if that has any impact. I also cleared out and ran chkdsk on some drives and all looks good for the most part. sfc scan found one error it couldn't fix but a DISM restorehealth looked to have fixed it.

I'm technically testing it right now, but I'm trying to be proactive so I collected some logs a few days ago in hopes that we might get a hint of what it may be in case it happens again.


Attached is the sysnative log zip.

Also, I've run poolmon while the server was hitting high memory usage last week and here are some screenshots (keep in mind, this is with task manager saying SQL was the top offender with 65GB of RAM, and everything else didn't seem to use much at all)

poolmon.png

Here's a snapshot of task manager during this climbing memory usage:

rental.png

I tried running this command to see which drivers CM31 and MmSt were tied to , but I assume it couldn't find it because it was paged memory?

output.png

Any help is appreciated and this is also a precious learning experience for me. Thank you
 

Attachments

The logs displayed:
a) BSOD
b) There were no collected dump files
c) Multiple drive file system corruptions including the MFT and page file



Code:
A corruption was discovered in the file system structure on volume C:

The Master File Table (MFT) contains a corrupted file record.  The file reference number is 0x17a000000002c45.  The name of the file is "\pagefile.sys".

A corruption was discovered in the file system structure on volume F:

A corruption was found in a file system index structure.  The file reference number is 0x60000003bff6e.  The name of the file is "\SecondPor\POR\Attachments".  The corrupted index attribute is ":$I30:$INDEX_ALLOCATION".

A corruption was discovered in the file system structure on volume New Volume.





1) Run chkdsk switches on all drives.
Find downtime so that you can run chkdsk switches on the Windows drive.

2) Modify the startup and recovery system failure settings from small memory dump to automatic or kernel memory dump.

3) Reevaluate computer instability / stability after all chkdsk reports display no corruption.






Open administrative command prompt and type or copy and paste:
chkdsk /r /v
This may take hours to run so plan to run overnight.
Run on all drives using the syntax: chkdsk /r /v C: or chkdsk /r /v D: changing the drive letter to the applicable drive.

C:\Windows\system32>chkdsk /r /v
The type of the file system is NTFS.
Cannot lock current drive.

Chkdsk cannot run because the volume is in use by another
process. Would you like to schedule this volume to be
checked the next time the system restarts? (Y/N)

Type: Y
reboot



After running chkdsk switches:

Download ListChkdskResult.exe (by SleepyDude) from Here, save the file to your desktop.
https://www.dropbox.com/s/xfsr4yyg5yun3k1/ListChkdskResult.exe?dl=1
Once the file has been downloaded please go to your desktop and double click on ListChkdskResult.exe.
This scan only takes a few seconds to run, once the scan is complete a pop-up will open with all the CHKDSK from the most recent scan plus any previous scans.


Code:
Event[8972]:
  Log Name: System
  Source: Ntfs
  Date: 2021-02-12T12:39:03.755
  Event ID: 55
  Task: N/A
  Level: Error
  Opcode: Info
  Keyword: N/A
  User: S-1-5-18
  User Name: NT AUTHORITY\SYSTEM
  Computer: ENTERPRISE.RENTAL.LOCAL
  Description:

A corruption was discovered in the file system structure on volume C:.

The exact nature of the corruption is unknown.  The file system structures need to be scanned online.


Event[8971]:
  Log Name: System
  Source: Ntfs
  Date: 2021-02-12T12:39:06.377
  Event ID: 55
  Task: N/A
  Level: Error
  Opcode: Info
  Keyword: N/A
  User: S-1-5-18
  User Name: NT AUTHORITY\SYSTEM
  Computer: ENTERPRISE.RENTAL.LOCAL
  Description:
A corruption was discovered in the file system structure on volume F:.

The exact nature of the corruption is unknown.  The file system structures need to be scanned online.


Event[8624]:
  Log Name: System
  Source: Ntfs
  Date: 2021-02-12T20:52:39.583
  Event ID: 55
  Task: N/A
  Level: Error
  Opcode: Info
  Keyword: N/A
  User: S-1-5-18
  User Name: NT AUTHORITY\SYSTEM
  Computer: ENTERPRISE.RENTAL.LOCAL
  Description:
A corruption was discovered in the file system structure on volume New Volume.

The exact nature of the corruption is unknown.  The file system structures need to be scanned online.


Event[34304]:
  Log Name: System
  Source: Ntfs
  Date: 2021-01-04T13:32:38.780
  Event ID: 55
  Task: N/A
  Level: Error
  Opcode: Info
  Keyword: N/A
  User: S-1-5-18
  User Name: NT AUTHORITY\SYSTEM
  Computer: ENTERPRISE.RENTAL.LOCAL
  Description:
A corruption was discovered in the file system structure on volume New Volume.

The exact nature of the corruption is unknown.  The file system structures need to be scanned online.


Event[33687]:
  Log Name: System
  Source: Ntfs
  Date: 2021-01-05T06:20:42.823
  Event ID: 55
  Task: N/A
  Level: Error
  Opcode: Info
  Keyword: N/A
  User: S-1-5-18
  User Name: NT AUTHORITY\SYSTEM
  Computer: ENTERPRISE.RENTAL.LOCAL
  Description:
A corruption was discovered in the file system structure on volume F:.

The exact nature of the corruption is unknown.  The file system structures need to be scanned online.


Event[32811]:
  Log Name: System
  Source: Ntfs
  Date: 2021-01-05T20:40:38.016
  Event ID: 55
  Task: N/A
  Level: Error
  Opcode: Info
  Keyword: N/A
  User: S-1-5-18
  User Name: NT AUTHORITY\SYSTEM
  Computer: ENTERPRISE.RENTAL.LOCAL
  Description:
A corruption was discovered in the file system structure on volume F:.

A corruption was found in a file system index structure.  The file reference number is 0x60000003bff6e.  The name of the file is "\SecondPor\POR\Attachments".  The corrupted index attribute is ":$I30:$INDEX_ALLOCATION".


Event[12728]:
  Log Name: System
  Source: Ntfs
  Date: 2021-02-09T20:45:37.710
  Event ID: 55
  Task: N/A
  Level: Error
  Opcode: Info
  Keyword: N/A
  User: S-1-5-18
  User Name: NT AUTHORITY\SYSTEM
  Computer: ENTERPRISE.RENTAL.LOCAL
  Description:
A corruption was discovered in the file system structure on volume OS.

The Master File Table (MFT) contains a corrupted file record.  The file reference number is 0x17a000000002c45.  The name of the file is "\pagefile.sys".


Event[330]:
  Log Name: System
  Source: Ntfs
  Date: 2021-02-25T16:28:48.800
  Event ID: 55
  Task: N/A
  Level: Error
  Opcode: Info
  Keyword: N/A
  User: S-1-5-18
  User Name: NT AUTHORITY\SYSTEM
  Computer: ENTERPRISE.RENTAL.LOCAL
  Description:
A corruption was discovered in the file system structure on volume F:.

The exact nature of the corruption is unknown.  The file system structures need to be scanned online.


Event[972]:
  Log Name: System
  Source: Ntfs
  Date: 2021-02-25T04:07:56.959
  Event ID: 55
  Task: N/A
  Level: Error
  Opcode: Info
  Keyword: N/A
  User: S-1-5-18
  User Name: NT AUTHORITY\SYSTEM
  Computer: ENTERPRISE.RENTAL.LOCAL
  Description:
A corruption was discovered in the file system structure on volume New Volume.

The exact nature of the corruption is unknown.  The file system structures need to be scanned online.



Code:
------------------------
Disk & DVD/CD-ROM Drives
------------------------
      Drive: C:
Free Space: 95.7 GB
Total Space: 570.7 GB
File System: NTFS
      Model: DELL PERC H730P Adp SCSI Disk Device

      Drive: D:
Free Space: 30.4 GB
Total Space: 285.6 GB
File System: NTFS
      Model: DELL PERC H730P Adp SCSI Disk Device

      Drive: F:
Free Space: 28.1 GB
Total Space: 1907.2 GB
File System: NTFS
      Model: DELL PERC H730P Adp SCSI Disk Device

      Drive: H:
Free Space: 37.0 GB
Total Space: 953.7 GB
File System: NTFS
      Model: ST1000DM003-1ER162

      Drive: I:
Free Space: 50.8 GB
Total Space: 1907.7 GB
File System: NTFS
      Model: Dell Portable SCSI Disk Device

      Drive: E:
      Model: PLDS DVD-ROM DH-16D8S
     Driver: c:\windows\system32\drivers\cdrom.sys, 6.03.9600.18878 (English), 12/5/2017 07:24:08, 165376 bytes
 
Last edited by a moderator:
@zbook some of those logs are almost two months old, the rest are several weeks old. The OP has mentioned that they've ran chkdsk.
 
I tried running this command to see which drivers CM31 and MmSt were tied to , but I assume it couldn't find it because it was paged memory?
They're pool tags and not driver names. You can find them in the pooltag.txt file.

Rich (BB code):
8: kd> !pooltag CM31
Pooltag CM31
Description: Internal Configuration manager allocations
Driver!Module: nt!cm

Rich (BB code):
8: kd> !pooltag MmSt
Pooltag MmSt
Description: Mm section object prototype ptes
Driver!Module: nt!mm

- Things work pretty well for a while, but then we start running into issues where the server gets very slow. Sometimes, it'll be slow but only showing 65% memory usage, but many others times, it'll show 97% or so memory usage, even though in task manager the most consuming process is SQL which is using about 65GB (which is the limit we set for it, we've tried other limits as well for testing)
By slow, do you mean the entire server or just the SQL server instance? As in, are queries taking a long time to process?
 
@zbook some of those logs are almost two months old, the rest are several weeks old. The OP has mentioned that they've ran chkdsk.




Many of the log findings from January and February reported that switches that had been used left unfixed drive problems during the two months:

Code:
Windows has examined the list of previously identified potential issues and found problems.


Code:
Windows cannot perform an online scan on the volume because it is in the "Full Chkdsk Needed" state.


Code:
Windows has examined the list of previously identified potential issues and found problems.


Code:
Windows cannot perform an online scan on the volume because it is in the "Full Chkdsk Needed" state.


Code:
Windows has examined the list of previously identified potential issues and found problems.


Code:
Windows has found problems that must be fixed offline.
Please run chkdsk /spotfix to fix the issues.


Code:
Windows has examined the list of previously identified potential issues and found problems.
Please run chkdsk /scan to fully analyze the problems and queue them for repair.
 
Zbook, I edited your first post. The way of use with code boxes made your post unnecessary long so I merged them in one code box. Same with post #5. The amount of code boxes used and the enters in between them makes your post unnecessary long and doesn't make it easier to read them.
 
Thanks for the reply everyone. I'm attaching new logs as of today since the ones I posted were prior to the weekend where we ran chkdsk on at least the "New Volume, or F:" drive. I believe the C: drive has also had a chkdsk run last week so see if these are still reporting errors. When I try to schedule a chkdsk, it says all drives are clean and it's not necessary:

@BlueRobot : It seems like the whole server gets slow when this happens. They especially have complained about being able to remote into their sessions since this is primarily what they use. I have personally been in the server during one of these issues, and it feels like it takes a while for various things to respond.

BTW, is there a way to tag users on replies besides using the reply/quote feature?
 

Attachments

The opening post log collector started on 2/25.
The new log collector on 3/1 reported:

Code:
F:
CHKDSK discovered free space marked as allocated in the volume bitmap.

Windows has made corrections to the file system.

I:
Code:
Windows has scanned the file system and found no problems.


Please run each as per the earlier post:
chkdsk /r /v C:
chkdsk /r /v D:
chkdsk /r /v H:

Typically the non-windows drives can be performed without impacting server downtime.

Plan to check C: when downtime can be planned.
 
The opening post log collector started on 2/25.
The new log collector on 3/1 reported:

Code:
F:
CHKDSK discovered free space marked as allocated in the volume bitmap.

Windows has made corrections to the file system.

I:
Code:
Windows has scanned the file system and found no problems.


Please run each as per the earlier post:
chkdsk /r /v C:
chkdsk /r /v D:
chkdsk /r /v H:

Typically the non-windows drives can be performed without impacting server downtime.

Plan to check C: when downtime can be planned.
I'll try that this weekend since they have some downtime between sat night and sunday morning. That'll also give me a chance to see how the week goes. Thanks!
 
The BIOS is old (approximately 13 missed updates).

The BIOS: Version/Date Dell Inc. 1.3.6, 6/8/2015

1) Upgrade the BIOS: 1.3.6 > 2.11
Challenge Page
Support for PowerEdge T630 | Drivers & Downloads | Dell US

2) Change startup and recovery system failure write debugging information to kernel or automatic memory dump


Code:
Version 2.12.1
Enhancements:
- Enhancement to address the security vulnerabilities (Common Vulnerabilities and Exposures - CVE) such as CVE-2020-0592, CVE-
2020-8696, CVE-2020-8698, CVE-2020-8705, CVE-2020-8749, and CVE-2020-8755.
- Updated the Intel Management Engine firmware to version SPS_E5_03.01.03.079.0_GR_WBG_REL.
- Updated the Intel SINIT to v3.1.4_20191029.

Version 2.11.0
Fixes
- Fixed an issue where the system stops responding when trying to boot from a PXE.
- Fixed an issue in EFI_RESET_SYSTEM runtime service that resulted in the Oracle VM having kernel dump record after system reboot.
- Removed the Correctable Error Warning threshold SEL event.

Enhancements
- Enhancement to address the security vulnerabilities (Common Vulnerabilities and Exposures-CVE) such as CVE-2019-0124 and CVE-2019-0151.
- Updated the Intel SINIT to v3.1.3_20190718.

Version 2.10.5
Enhancements
- Enhancement to address the security vulnerabilities (Common Vulnerabilities and Exposures-CVE) such as CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091, and CVE-2019-0089.
- Updated the Intel Xeon Processor E5-2600 v4 Product Family Processor Microcode to version 0x0b000038.
- Updated the Xeon Processor E5-2600 v3 Product Family Processor Microcode to version 0x43.
- Updated the Intel Management Engine firmware to version SPS_E5_03.01.03.072.0_GR_WBG_REL.
- Support for the iDRAC8 2.70.70.70 version.

Fixes
- When booting to RHEL 8, the following message is displayed:
Kernel panic - not syncing: Fatal hardware error!


Version 2.9.1
Fixes
- Fixed an issue pertaining to the PCR2 measurement. When TPM is enabled, the PCR2 values were sometimes inconsistent when the CSIOR was enabled.

Enhancements
- Enhanced the BIOS security protection features.
- Updated the Intel Xeon Processor E5-2600 v4 Product Family Processor Microcode to version 0x0B000033.
- Updated the Intel Xeon Processor E5-2600 v3 Product Family Processor Microcode to version 0x41.


Version 2.8.0
Fixed issues (this release)
- For memory sizes less than 1 TB, updated the MTRR algorithm to the same behavior as 2.4.2 and earlier BIOS versions.

New and enhanced features
- Enhancement to address security vulnerability CVE-2018-3639 (https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-3639).
- Enhancement to address security vulnerability CVE-2018-3640 (https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-3640).
- Updated the Intel Xeon Processor E5-2600 v4 Product Family Processor Microcode to version 0x0b00002E.
- Updated the Xeon Processor E5-2600 v3 Product Family Processor Microcode to version 0x3D.
- Added setup option QPI Link L1 Power Management. Default is set to Enabled.
- Added setup option Lower Memory Mapped I/O Base to 512 GB. Default is set to Disabled.
- Added proper identification for maximum DIMM speed of 2666 MHz.


Version 2.7.1
Fixes
- None

Enhancements
- Updated the Intel Xeon Processor Microcode to address CVE-2017-5715 (http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=2017-5715)
- Updated the Intel Xeon Processor E5-2600 v4 Product Family Processor Microcode to version 0x0b00002A.
- Updated the Xeon Processor E5-2600 v3 Product Family Processor Microcode to version 0x3C.
- CVE-2017-5753 and CVE-2017-5754 are addressed by Operating System & Hypervisor updates.
- Please see more information at http://www.dell.com/support/article/SLN308588


Version 2.6.0
Fixes
- Added a workaround to fix false Multibit memory errors after a CPU IERR.

Enhancements
-N/A


Version 2.5.4
Fixes
- Added workaround to address the uncorrectable errors on RDIMMs with specific vendor or the rev Register Clock Driver (RCD).
- Updated ECRC (end-to-end CRC checking) to prevent the OS from changing the BIOS settings.
- Fixed CPU machine check issue when the Write Data Cyclic Redundancy Check (CRC) is enabled.

Enhancements
- Updated the Intel Processor and Memory Reference Code (MRC) to PLR11.
- Updated the Intel Xeon Processor E5-2600 v4 Product Family Processor Microcode to version 0x21.
- Updated the Intel Xeon Processor E5-2600 v3 Product Family Processor Microcode to version 0x3A.
- Secure MOR change to support the Windows 2016 R3 server version.
- Added support for the Redfish management interfaces.
- Added improvements to Non-Volatile Memory Express (NVMe).
- Added improvements to the Secure Boot Custom Policy settings.
- Added improvements to the S130 storage.


Version 2.0.2
Fixes
- Fixed an issue where iSCSI boot got disabled when configuration of connection2 setting failed under the UEFI boot mode.
- Fixed the system unexpected issues if running warm reset from iDRAC after BIOS setup change.
- Fixed an issue where mouse device trail is seen while moving cursor quickly in the HII browser.

Enhancements
- Updated the Intel processor and memory reference code to version 3.0.0.
- Added support for Intel Xeon processor E5-2600 V4 product family.
- Updated the Intel Management Engine (ME) firmware to SPS_E5_03.01.03.021.0_WBG_REL.
- Updated PERC S130 option ROM (OPROM) and Unified Extensible Firmware Interface (UEFI) drivers to version 4.2.0-0009.
- Added support for a new PM1725 category.
- Improved NVMe version 1.1 export log.
- Updated NVMe Unified Extensible Firmware Interface (UEFI) driver to version 2.5.
- Updated to Unified Extensible Firmware Interface (UEFI) 2.4 support.
- Added Global Slot Boot Driver Disable option.



Version 2.4.2
Fixes
- Export log issues in the Non-Volatile Memory Express (NVMe) Human Interface Infrastructure (HII).
- Rarely, the system may stop responding because of a power failure during the boot process.

Enhancements
- Updated the Intel Processor and Memory Reference Code to PLR8.
- Updated the Intel Xeon Processor E5-2600 v4 Product Family Processor Microcode to version 0x1F.
- Updated the Intel Xeon Processor E5-2600 v3 Product Family Processor Microcode to version 0x39.


Version 2.2.5
Enhancements:
- Updated the Intel Processor and Memory Reference Code to PLR4.
- Updated the Intel Xeon Processor E5-2600 v4 Product Family Processor Microcode to version 0x1E.
- Updated the Intel Xeon Processor E5-2600 v3 Product Family Processor Microcode to version 0x38.
- Updated the Intel Trusted Execution Technology (Intel TXT) BIOS and SINIT Authenticated Code Module (ACM) to version 3.1.0.
- The Intel TXT feature is supported with Trusted Platform Module (TPM) version 2.0.
- Updated TPM version 2.0 support.
- Updated the integrated Dell Remote Access Controller (iDRAC) Human Interface Infrastructure (HII) to version 2.40.40.05.
- Updated text in the BIOS Setup menu help content.
- Changed the default setting of BIOS Setup option In-System-Characterization to Disabled.

Fixes:
- The boot order may change after updating the BIOS version.
- The cause of system internal error (IERR) is not getting logged.


Version 2.1.5
Enhancements
- Updated the Intel Xeon Processor E5-2600 v3 Product Family Processor Microcode to 0x37.
- Updated the Intel Management Engine (ME) firmware to SPS_E5_03.01.03.030.0_WBG_REL.
- Updated the Intel Trusted Execution Technology (Intel TXT) BIOS and SINIT Authenticated Code Module (ACM) to version v3.0.5.
- Added support to JEDEC serial present detect (SPD) 1.1.
- Added support for memory module with 128 GB DIMM size.
- Updated text in the BIOS Setup Menu help content.

Fixes
- Intermittent PCIe slot training errors.
- While rebooting, server that used E5-2603 or 2609 processor (6-core Low Core Count (LCC) processor) displayed the Red Screen of Death (RSOD) error.
- After importing a platform key, the Secure Boot feature is forced to get enabled.
- Unused DIMM clocks are not disabled for E5-2600 v3 CPU-based system.
- Watchdog timer event log is missing from the ELog and Windows Event Log.
- CPU is not frequently polling the DIMM temperature sensor.
- System cannot boot by using DVD or HDD.
- During the POST stage on the console redirection, an option to press the F12 key is not displayed on the monitor.


Version 2.0.3
Fixes
- None.

Enhancements
- Updated the Intel Processor and Memory Reference Code to MR2.
- Added support for Intel Xeon processor E5-2600 V4 Product Family processor Microcode to 0x17.


Version 1.5.4
Fixes
- Fixed an issue where sometimes systems with Intel SSD drives freeze at PXE boot.
- Fixed an issue where hotkeys are still available after abnormal exit from Life Cycle Controller.
- Fixed an issue where execution of "Ctrl-p" command fails through Serial-Over-Lan.
- Fixed some HII display issues for NVMe PCIe SSDs in System Setup menu.

Enhancements
- Updated the Intel processor and memory reference code to PLR9.1.
- Updated the Intel Xeon processor E5-2600 V3 product family processor microcode to 0x36.
- Updated the Intel Management Engine (ME) firmware version to SPS_E5_03.00.07.173.0_PLR9-G_WBG_REL.
- Added the new I/O Non-Posted Prefetch option in System Setup that can be used to control the PCIe throughput by enabling or disabling the PCI IO non-posted prefetch mode.
- Added the new Form Factor field for NVMe PCIe SSDs in the HII menu.
- Added TPM2 support.
 
The BIOS is old (approximately 13 missed updates).

The BIOS: Version/Date Dell Inc. 1.3.6, 6/8/2015

1) Upgrade the BIOS: 1.3.6 > 2.11
Challenge Page
Support for PowerEdge T630 | Drivers & Downloads | Dell US

2) Change startup and recovery system failure write debugging information to kernel or automatic memory dump


Code:
Version 2.12.1
Enhancements:
- Enhancement to address the security vulnerabilities (Common Vulnerabilities and Exposures - CVE) such as CVE-2020-0592, CVE-
2020-8696, CVE-2020-8698, CVE-2020-8705, CVE-2020-8749, and CVE-2020-8755.
- Updated the Intel Management Engine firmware to version SPS_E5_03.01.03.079.0_GR_WBG_REL.
- Updated the Intel SINIT to v3.1.4_20191029.

Version 2.11.0
Fixes
- Fixed an issue where the system stops responding when trying to boot from a PXE.
- Fixed an issue in EFI_RESET_SYSTEM runtime service that resulted in the Oracle VM having kernel dump record after system reboot.
- Removed the Correctable Error Warning threshold SEL event.

Enhancements
- Enhancement to address the security vulnerabilities (Common Vulnerabilities and Exposures-CVE) such as CVE-2019-0124 and CVE-2019-0151.
- Updated the Intel SINIT to v3.1.3_20190718.

Version 2.10.5
Enhancements
- Enhancement to address the security vulnerabilities (Common Vulnerabilities and Exposures-CVE) such as CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091, and CVE-2019-0089.
- Updated the Intel Xeon Processor E5-2600 v4 Product Family Processor Microcode to version 0x0b000038.
- Updated the Xeon Processor E5-2600 v3 Product Family Processor Microcode to version 0x43.
- Updated the Intel Management Engine firmware to version SPS_E5_03.01.03.072.0_GR_WBG_REL.
- Support for the iDRAC8 2.70.70.70 version.

Fixes
- When booting to RHEL 8, the following message is displayed:
Kernel panic - not syncing: Fatal hardware error!


Version 2.9.1
Fixes
- Fixed an issue pertaining to the PCR2 measurement. When TPM is enabled, the PCR2 values were sometimes inconsistent when the CSIOR was enabled.

Enhancements
- Enhanced the BIOS security protection features.
- Updated the Intel Xeon Processor E5-2600 v4 Product Family Processor Microcode to version 0x0B000033.
- Updated the Intel Xeon Processor E5-2600 v3 Product Family Processor Microcode to version 0x41.


Version 2.8.0
Fixed issues (this release)
- For memory sizes less than 1 TB, updated the MTRR algorithm to the same behavior as 2.4.2 and earlier BIOS versions.

New and enhanced features
- Enhancement to address security vulnerability CVE-2018-3639 (https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-3639).
- Enhancement to address security vulnerability CVE-2018-3640 (https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-3640).
- Updated the Intel Xeon Processor E5-2600 v4 Product Family Processor Microcode to version 0x0b00002E.
- Updated the Xeon Processor E5-2600 v3 Product Family Processor Microcode to version 0x3D.
- Added setup option QPI Link L1 Power Management. Default is set to Enabled.
- Added setup option Lower Memory Mapped I/O Base to 512 GB. Default is set to Disabled.
- Added proper identification for maximum DIMM speed of 2666 MHz.


Version 2.7.1
Fixes
- None

Enhancements
- Updated the Intel Xeon Processor Microcode to address CVE-2017-5715 (http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=2017-5715)
- Updated the Intel Xeon Processor E5-2600 v4 Product Family Processor Microcode to version 0x0b00002A.
- Updated the Xeon Processor E5-2600 v3 Product Family Processor Microcode to version 0x3C.
- CVE-2017-5753 and CVE-2017-5754 are addressed by Operating System & Hypervisor updates.
- Please see more information at http://www.dell.com/support/article/SLN308588


Version 2.6.0
Fixes
- Added a workaround to fix false Multibit memory errors after a CPU IERR.

Enhancements
-N/A


Version 2.5.4
Fixes
- Added workaround to address the uncorrectable errors on RDIMMs with specific vendor or the rev Register Clock Driver (RCD).
- Updated ECRC (end-to-end CRC checking) to prevent the OS from changing the BIOS settings.
- Fixed CPU machine check issue when the Write Data Cyclic Redundancy Check (CRC) is enabled.

Enhancements
- Updated the Intel Processor and Memory Reference Code (MRC) to PLR11.
- Updated the Intel Xeon Processor E5-2600 v4 Product Family Processor Microcode to version 0x21.
- Updated the Intel Xeon Processor E5-2600 v3 Product Family Processor Microcode to version 0x3A.
- Secure MOR change to support the Windows 2016 R3 server version.
- Added support for the Redfish management interfaces.
- Added improvements to Non-Volatile Memory Express (NVMe).
- Added improvements to the Secure Boot Custom Policy settings.
- Added improvements to the S130 storage.


Version 2.0.2
Fixes
- Fixed an issue where iSCSI boot got disabled when configuration of connection2 setting failed under the UEFI boot mode.
- Fixed the system unexpected issues if running warm reset from iDRAC after BIOS setup change.
- Fixed an issue where mouse device trail is seen while moving cursor quickly in the HII browser.

Enhancements
- Updated the Intel processor and memory reference code to version 3.0.0.
- Added support for Intel Xeon processor E5-2600 V4 product family.
- Updated the Intel Management Engine (ME) firmware to SPS_E5_03.01.03.021.0_WBG_REL.
- Updated PERC S130 option ROM (OPROM) and Unified Extensible Firmware Interface (UEFI) drivers to version 4.2.0-0009.
- Added support for a new PM1725 category.
- Improved NVMe version 1.1 export log.
- Updated NVMe Unified Extensible Firmware Interface (UEFI) driver to version 2.5.
- Updated to Unified Extensible Firmware Interface (UEFI) 2.4 support.
- Added Global Slot Boot Driver Disable option.



Version 2.4.2
Fixes
- Export log issues in the Non-Volatile Memory Express (NVMe) Human Interface Infrastructure (HII).
- Rarely, the system may stop responding because of a power failure during the boot process.

Enhancements
- Updated the Intel Processor and Memory Reference Code to PLR8.
- Updated the Intel Xeon Processor E5-2600 v4 Product Family Processor Microcode to version 0x1F.
- Updated the Intel Xeon Processor E5-2600 v3 Product Family Processor Microcode to version 0x39.


Version 2.2.5
Enhancements:
- Updated the Intel Processor and Memory Reference Code to PLR4.
- Updated the Intel Xeon Processor E5-2600 v4 Product Family Processor Microcode to version 0x1E.
- Updated the Intel Xeon Processor E5-2600 v3 Product Family Processor Microcode to version 0x38.
- Updated the Intel Trusted Execution Technology (Intel TXT) BIOS and SINIT Authenticated Code Module (ACM) to version 3.1.0.
- The Intel TXT feature is supported with Trusted Platform Module (TPM) version 2.0.
- Updated TPM version 2.0 support.
- Updated the integrated Dell Remote Access Controller (iDRAC) Human Interface Infrastructure (HII) to version 2.40.40.05.
- Updated text in the BIOS Setup menu help content.
- Changed the default setting of BIOS Setup option In-System-Characterization to Disabled.

Fixes:
- The boot order may change after updating the BIOS version.
- The cause of system internal error (IERR) is not getting logged.


Version 2.1.5
Enhancements
- Updated the Intel Xeon Processor E5-2600 v3 Product Family Processor Microcode to 0x37.
- Updated the Intel Management Engine (ME) firmware to SPS_E5_03.01.03.030.0_WBG_REL.
- Updated the Intel Trusted Execution Technology (Intel TXT) BIOS and SINIT Authenticated Code Module (ACM) to version v3.0.5.
- Added support to JEDEC serial present detect (SPD) 1.1.
- Added support for memory module with 128 GB DIMM size.
- Updated text in the BIOS Setup Menu help content.

Fixes
- Intermittent PCIe slot training errors.
- While rebooting, server that used E5-2603 or 2609 processor (6-core Low Core Count (LCC) processor) displayed the Red Screen of Death (RSOD) error.
- After importing a platform key, the Secure Boot feature is forced to get enabled.
- Unused DIMM clocks are not disabled for E5-2600 v3 CPU-based system.
- Watchdog timer event log is missing from the ELog and Windows Event Log.
- CPU is not frequently polling the DIMM temperature sensor.
- System cannot boot by using DVD or HDD.
- During the POST stage on the console redirection, an option to press the F12 key is not displayed on the monitor.


Version 2.0.3
Fixes
- None.

Enhancements
- Updated the Intel Processor and Memory Reference Code to MR2.
- Added support for Intel Xeon processor E5-2600 V4 Product Family processor Microcode to 0x17.


Version 1.5.4
Fixes
- Fixed an issue where sometimes systems with Intel SSD drives freeze at PXE boot.
- Fixed an issue where hotkeys are still available after abnormal exit from Life Cycle Controller.
- Fixed an issue where execution of "Ctrl-p" command fails through Serial-Over-Lan.
- Fixed some HII display issues for NVMe PCIe SSDs in System Setup menu.

Enhancements
- Updated the Intel processor and memory reference code to PLR9.1.
- Updated the Intel Xeon processor E5-2600 V3 product family processor microcode to 0x36.
- Updated the Intel Management Engine (ME) firmware version to SPS_E5_03.00.07.173.0_PLR9-G_WBG_REL.
- Added the new I/O Non-Posted Prefetch option in System Setup that can be used to control the PCIe throughput by enabling or disabling the PCI IO non-posted prefetch mode.
- Added the new Form Factor field for NVMe PCIe SSDs in the HII menu.
- Added TPM2 support.
So I forgot to mention in response to blue's post that the reason I changed from auto dump to small was we were not getting any bsod dumps the couple times it happened last year.

I've changed it now to kernel dump.

Also, according to dell open manage, we had a bad ram stick which we have since changed out a couple months ago.

Also, I tried to update bios and the raid firmware as well but both of them failed when I try to update from Windows so I will have to set a time to do this in person
 
BTW, is there a way to tag users on replies besides using the reply/quote feature?
There is, you almost nailed it too. For x BlueRobot you missed the 'x' and ' ' (space).

Tagging/mentioning a user works as follows
@<full username>
e.g.
@axe0
Not a shortened name, no @BlueRobot or @blue (blue is a different user here), but the full username. It's strict in this because the username works like an ID. If you know the ID of a user and the bbcode for tagging/mentioning, then you can use shortened names.
BBcode = [USER=9865]@axe0[/USER]
 
So I forgot to mention in response to blue's post that the reason I changed from auto dump to small was we were not getting any bsod dumps the couple times it happened last year.

I've changed it now to kernel dump.
That setting will just produce a full kernel memory dump only -- \windows\memory.dmp - no minidumps. The full kernel memory dump is overwritten with each BSOD. Windows stores up to 50 minidumps as of Windows 7 or Windows 8/8.1. Prior to that, the number of minidumps Windows retained was unlimited.

Change system crash setting to Automatic. This setting results in Windows producing a full kernel dump and a mini kernel dump.

You likely got no dumps because one of the earlier messages re: page file corruption -
The Master File Table (MFT) contains a corrupted file record. The file reference number is 0x17a000000002c45. The name of the file is "\pagefile.sys".

When a BSOD occurs, kernel memory is written to the page file. That is what is occurring while the countdown screen is displayed. Upon system reboot after BSOD, Windows uses the page file data to create the kernel memory dumps (both full kernel dump and mini kernel dump if the system crash setting is "Automatic"). If the data in the page file is corrupted, I would think that no dumps would be produced. There should be evidence of this in one of the Event Viewer files (usually, anyway) - System or Application log; forget which one.

If chkdsk cannot fix the corrupted MFT file record/ page file, delete and reallocate the page file - (11) Deletion + Reallocation of the Page File (Windows 10, 8.1, 8, 7 & Vista) | Sysnative Forums

You can check/verify that your system can produce a dump by doing this - Forcing a System Crash from the Keyboard - Windows drivers | Microsoft Docs

Regards. . .

John
 
Last edited:
If the data in the page file is corrupted, I would think that no dumps would be produced. There should be evidence of this in one of the Event Viewer files (usually, anyway)
Then the data won't be written to the pagefile to prevent issues on the drive because it's unknown where this corruption started when the data is written. This is in the event logs something like 'dump file creation failed due to error during dump file creation'.
 
Apologies about the page file tutorial - it has not been updated since our forum software was changed.
 
Just an update, so far so good. No major slowdowns or complaints:

1) Ran chkdsk on a couple drives previous to posting this
2) This last weekend, I did another chkdsk on some other drives which came up clean
3) We upgraded raid firmware and drivers. Unfortunately, we could not upgrade bios no matter which version or method we tried. Both from Windows, and via the lifecycle controller would not work .Skipped that for now
4) Got some advice on another forum on cleaning up SQL indexes and tweaking those settings which we did.
5) added more RAM.

Technically, things were okay week one but adding all that made things better. Just keeping an eye on it but I think we can mark this as solved! Thanks so much guys
 
Please upload new Sysnative log collector results.
Gathering the new logs now. It always takes a very long time on network statistics. Any way to speed it up? It's usually a matter of hours before that phase finishes in the sysnative program
 
These were some findings in the new log results:


There were no new collected BSOD.
And there were no BSOD seen in the collected logs.



Chkdsk C: displayed cleaning


chkdsk /r /v D: no results seen
chkdsk /r /v H: no results seen


BIOS upgrade failures:
1.5.4
1.5.4
2.0.3
2.4.2
2.4.2


These txt files may have additional information:
C:\ProgramData\Dell\UpdatePackage\log\\BIOS_P8KHV_WN64_1.5.4.txt
C:\ProgramData\Dell\UpdatePackage\log\\BIOS_PRY6P_WN64_2.0.3.txt
C:\ProgramData\Dell\UpdatePackage\log\\BIOS_GK2F7_WN64_2.4.2.txt


There were application crashes, one had dump files:
Code:
03/06/2021  10:03 PM        91,345,739 MEMORY~1.HDM memory.hdmp
03/06/2021  10:03 PM           297,991 TRIAGE~1.DMP triagedump.dmp



Recent app crashes:
araavl.exe
databaseedits.exe
dsm_sa_datamgr64.exe
biosie.exe
chipsetdriver.exe
psdup.exe


See if reinstalling the Intel chipsetdrivers makes any difference:
Downloads for Chipsets

For the computer sluggishness you can try clean boot:
How to perform a clean boot in Windows
 

Has Sysnative Forums helped you? Please consider donating to help us support the site!

Back
Top