Hi Guys,
Once again, ran into another interesting experience that is helping me improve on my knowledge of Windows Internals that I thought I might share. Note that I'm still studying on this, so any correspondence would be grand, or just simply pop back in for any updates so you can learn as well. I'm willing to answer any questions that I can to the best of my ability, so don't choke when you don't understand something, simply ask!
First all, what is an MDL? It's called a Memory Descriptor List. A thorough description of MDLs is available
here. Here's the first paragraph:
An I/O buffer that spans a range of contiguous virtual memory addresses can be spread over several physical pages, and these pages can be discontiguous. The operating system uses a memory descriptor list (MDL) to describe the physical page layout for a virtual memory buffer.
I'm sure this is a pretty heavy description for many of you. However, if you remember how I described what virtual memory and physical memory are like in my
BSOD Methods & Tips thread, you'll understand that virtual memory for the most part should be fairly contiguous (side-by-side) whereas the physical memory can be all over the place. The physical memory is the discontiguous "physical pages" that the article is referring too. So the MDL describes the physical memory layout that will be used to help the virtual memory buffer do its job - since all work is typically done on physical memory, so it needs to describe and map this memory properly.
Typically, this is unnecessary, as usual memory management provides everything needed to deal with the physical memory and translate it to virtual memory for the driver. However, for situations involving heavy data flow (like most networking operations and other DMA - Direct Memory Access - uses), you want to access the physical memory that's holding your data as quickly as possible,
and you don't want the memory manager to fiddle with the data in any way (like moving it, paging it to disk, etc.), so to have it remain reliably in place until you're done using it. That's where MDLs come in: to describe that physical memory that's been effectively pinned down for use.
For more information on MDLs, I've found
this article along with
this one to be beneficial. The book
Windows Internals also explains them a bit in pages 570-571 and 872 (5th edition).
In this recent case, which thread is
here (I've attached the dmp file as well to this thread), I've had to come across MDLs and get a bit of an understanding on how they work. Previously I've seen them on certain Windbg outputs and had a general idea what they were for, but this is one case where MDLs are listed as the cause of the crash:
Code:
DRIVER_VERIFIER_DETECTED_VIOLATION (c4)
A device driver attempting to corrupt the system has been caught. This is
because the driver was specified in the registry as being suspect (by the
administrator) and the kernel has enabled substantial checking of this driver.
If the driver attempts to corrupt the system, bugchecks 0xC4, 0xC1 and 0xA will
be among the most commonly seen crashes.
Arguments:
Arg1: 00000000000000b0, MmProbeAndLockPages called on an MDL having incorrect flags.
For example, calling MmProbeAndLockPages for an MDL set-up
by calling MmBuildMdlForNonPagedPool is incorrect.
Arg2: fffffa80073f1010, MDL address.
Arg3: 0000000000000004, MDL flags.
Arg4: 0000000000000004, Incorrect MDL flags.
The stack in this case is a bit irrelevant - to my understanding - but I do notice that it involves a GUI message of some sort. What we need to do is first understand what actually took place that was wrong, and that means interpreting the output of the error code.
The error mentioned that a routine, MmProbeAndLockPages, called on an MDL which happened to harbor incorrect flags. MDLs are data structures, and part of their structure is called Mdlflags, which contain bit flags that help describe the MDL. The routine MmProbeAndLockPages, is somewhat self explanatory (details
here). It will probe virtual memory addresses, slap their related content into physical memory, and lock it into that physical memory. Once it's done, it'll modify the MDL and add descriptions for the new physical memory pages it filled and locked up. The issue here, then is Driver Verifier detected that the MDL described memory that was not in any particular state to permit MmProbeAndLockPages to do what it wanted to do. Example could be that it was already locked n loaded by something else, or is in a state that doesn't allow it to lock or probe.
In any case, this can be discerned by the flags. From the error code, it reports that the current flags set (that is, the bits that are set as 1) is 4 (decimal, not hex). That's 100 in binary, so it's just one flag set. The erroneous flag is, well, 4 (100) as well. Understand this isn't a number signifying quantity of flags set, per se. It's actually telling you the flags that
are set. You simply translate it into binary and figure which bits are set and which aren't.
We'll discern what the flag means later.
Moving on, at the very least I can use this error code to get a bit more info on things. It provides the MDL address so I can actually look at the MDL itself. Since MDLs are data structures, we will use the
dt command. To use it properly, we need to provide it the right symbols so Windbg can translate the data into something that is coherent and understandable for us humans to read. To understand what I mean, here's the raw data, presented by the
dw command:
Code:
2: kd> dw fffffa80073f1010
fffffa80`073f1010 0000 0000 0000 0000 0038 0004 0000 0000
fffffa80`073f1020 0000 0000 0000 0000 1000 073f fa80 ffff
fffffa80`073f1030 1000 073f fa80 ffff 1000 0000 0000 0000
fffffa80`073f1040 40d9 0010 0000 0000 0000 0000 0000 0000
fffffa80`073f1050 0000 0000 0000 0000 0000 0000 0000 0000
fffffa80`073f1060 0000 0000 0000 0000 748b 3824 8348 20c4
fffffa80`073f1070 c35f 9090 9090 9090 f3ff 8348 30ec 8b48
fffffa80`073f1080 4101 01b8 0000 4800 188b 6183 df50 c033
Unless you're a serious pro like Mark Russinovich or Dimitry Kostokov, this stuff is just gibberish to you. We need symbols to translate this into something appropriate. It's the same with anything in debugging. Without symbols or even appropriate symbols, the data that's present in a crashdump or live memory debug is nearly indecipherable. Now, here's what it looks like with the proper symbols. Since MDLs are Windows-based objects, Microsoft provides symbols for them. The symbol is
nt!_MDL . So the
dt command should be
dt <symbol> <address>.
Code:
2: kd> dt nt!_MDL fffffa80073f1010
+0x000 Next : (null)
+0x008 Size : 0n56
+0x00a MdlFlags : 0n4
+0x010 Process : (null)
+0x018 MappedSystemVa : 0xfffffa80`073f1000 Void
+0x020 StartVa : 0xfffffa80`073f1000 Void
+0x028 ByteCount : 0x1000
+0x02c ByteOffset : 0
Look at that. That looks much better. Now, for those curious, where did those values - like 56 for Size and fffffa80`073f1000 for MappedSystemVa - come from within the data present when we did the
dw command? Well with the hexadecimal offsets, starting each line (0x000, 0x008), those tell you where the data range for each value starts. See if you can find them in the data. Note that Size and MdlFlags are displayed in decimal (0n) format as opposed to the rest which are in hexademical (0x), so you'll want to translate them first into hex before lookin for em in the data. Also note that larger values are broken into sections and 'backwards'. You can fix that by using a different
d command. In this case, the big guys are pointers, so use
dp, which is designed to show data in pointer-sized sections. For those who don't wanna try on their own (you should), here's what the data looks lik e with
dp:
Code:
2: kd> dq fffffa80`073f1000
fffffa80`073f1000 00000000`00000000 00000000`00000000
fffffa80`073f1010 00000000`00000000 00000000`00040038
fffffa80`073f1020 00000000`00000000 fffffa80`073f1000
fffffa80`073f1030 fffffa80`073f1000 00000000`00001000
fffffa80`073f1040 00000000`001040d9 00000000`00000000
fffffa80`073f1050 00000000`00000000 00000000`00000000
fffffa80`073f1060 00000000`00000000 20c48348`3824748b
fffffa80`073f1070 90909090`9090c35f 8b4830ec`8348f3ff
EDIT: Oops! Thanks for JC for pointing this out. At the end I say this is what it looks like for the dp command yet the command I use in the example ends up being dq. These actually are identical. However dp is preferable because it will automatically adjust the size of each section based on the system type of the target that was running - whether it was 32-bit or 64-bit. In this case, the system that we're analyzing is 64-bit, which will have the quad word size (16-byte, or 0x################ size) per section as opposed to being double word (8-byte, or 0x######## size). The static variations for dq is dd for 32-bit and dq for 64-bit. This is all done because memory addresses are different sizes for 32-bit systems and 64-bit systems. The variant dps is perfect for dumping raw stacks.
Since we now have an understanding how to extract the goodies from the MDL structure, and we know what to look for (the MDL flags), we can now use this information to interpret the MDL and figure out what about the MDL is causing all the fuss. Obviously to do this we need a reference to understand just what about the Mdlflags value of 0n4 actually means. To assist us, we have the trusty WDK. You have to make sure to install the Build Environments, as it includes the header files which one of them (
wdm.h) has the details we need. It's located in the
inc\ddk subdirectory. Why they haven't put all this stuff into the Help documentation as well eludes me, but it's in the actual header files. In fact, most details that you often miss from the WDK help documentation or MSDN website often end up being tucked away in the header files, so keep that in mind when you are searching for information on something that you're debugging.
Tip: how did I figure out what header file to look for info on MDLs? Look up MDLs in MSDN or WDK help documentation. For the
MDL article note the required Header to utilize the MDL functions. The primary header is wdm.h.
Do a search in wdm.h for MDL and you'll find the section related to them. Scroll down some and you'll find a list of #defines, as followed:
Code:
#define MDL_MAPPED_TO_SYSTEM_VA 0x0001
#define MDL_PAGES_LOCKED 0x0002
#define MDL_SOURCE_IS_NONPAGED_POOL 0x0004
#define MDL_ALLOCATED_FIXED_SIZE 0x0008
#define MDL_PARTIAL 0x0010
#define MDL_PARTIAL_HAS_BEEN_MAPPED 0x0020
#define MDL_IO_PAGE_READ 0x0040
#define MDL_WRITE_OPERATION 0x0080
#define MDL_PARENT_MAPPED_SYSTEM_VA 0x0100
#define MDL_FREE_EXTRA_PTES 0x0200
#define MDL_DESCRIBES_AWE 0x0400
#define MDL_IO_SPACE 0x0800
#define MDL_NETWORK_HEADER 0x1000
#define MDL_MAPPING_CAN_FAIL 0x2000
#define MDL_ALLOCATED_MUST_SUCCEED 0x4000
#define MDL_INTERNAL 0x8000
Note the value for each. These are the flags we're looking for. Obviously we're only looking for 1 flag (0n4) so in this case it's MDL_SOURCE_IS_NONPAGED_POOL.
Tip: What about other cases where the number is not exactly listed here? Let's take a number for Mdlflags that's 0n38 for example. First, we need to to translate it into a hexadecimal number, for as you can see, it's prefixed with 0n (decimal) and the numbers listed for the mdl flags is 0x (hexadecimal). Either use a calculator or do it in your head (hopefully you can). It will turn out as 0x26. Now, split it up based on the flags provided, starting with the highest available flag:
0x20 - MDL_PARTIAL_HAS_BEEN_MAPPED
0x4 - MDL_SOURCE_IS_NONPAGED_POOL
0x2 - MDL_PAGES_LOCKED
So in the example, it would be that the MDL would have those three flags, and therefore can be described as such. The example is only based on a random number I chose, it's obviously not conducive of a genuine example that you'll find out in the real world.
Now that we know what this is about, we can look back at the error code present in the bugcheck and notice an example issue that sounds awfully relevant:
Code:
DRIVER_VERIFIER_DETECTED_VIOLATION (c4)
A device driver attempting to corrupt the system has been caught. This is
because the driver was specified in the registry as being suspect (by the
administrator) and the kernel has enabled substantial checking of this driver.
If the driver attempts to corrupt the system, bugchecks 0xC4, 0xC1 and 0xA will
be among the most commonly seen crashes.
Arguments:
Arg1: 00000000000000b0, MmProbeAndLockPages called on an MDL having incorrect flags.
For example, calling MmProbeAndLockPages for an MDL set-up
by calling MmBuildMdlForNonPagedPool is incorrect.
Arg2: fffffa80073f1010, MDL address.
Arg3: 0000000000000004, MDL flags.
Arg4: 0000000000000004, Incorrect MDL flags.
Sounds like we have this problem after all. However, you might be asking, "Ok, so why
is this a problem?" Well, my initial understanding, is that the routine MmProbeAndLockPages involves, well, locking pages. Yet MmBuildMdlForNonPagedPool means to define the MDL as pointing to
nonpaged pool, so both functions are contradictory of each other. Here's what's described in the
WDK:
Because the pages described by the MDL are already nonpageable and are already mapped to the system address space, drivers must not try to lock them by using the MmProbeAndLockPages routine, or to create additional system-address-space mappings by using the MmMapLockedPagesSpecifyCache routine.
The result of doing so, as the article states, is "undefined". That means that doing this isn't exactly
wrong, per se, but it is a risky maneuver and is far from best practice. As such, Driver Verifier will crack down on poor coding methods like this whereas an error of this type will not be generated without Driver Verifier. The code itself could actually end up running well with this happening, but it's considered rather sloppy (as in risky) and is not recommended.
And that concludes what I have for now regarding MDLs. I will ever be delighted in any questions, comments or corrections you may have. Thanks for reading!