Deliciously solid article. Thanks for bringing this to my attention. Didn't think this one would slip under my radar.
@Digerati
As for a "black box" recorder, it's called a live kernel debugging session using another PC hooked to the victim PC. Fortunately, Windows 7 and newer has incorporated network-based live kernel debugging, but I'm not sure how that works as I've personally never tried it. Heard good things about it, though.
Also, do remember that any changes logged to a system's environment can get real resource intensive, and fast. Take Process Monitor, for example. Have that run for a few minutes and you're already looking at a pretty hefty file. Now just imagine adding all memory allocations and all data pertaining to them, as well as all I/O and code execution data, and one can see things can start getting out of control. A complete memory dump from a system is already huge, so imagine having that plus iterative additions to it for every execution made, AND expect the victim PC to be running swell through all that!
Only logging activity in the past few cycles is ok and all, but just so you know, crashdumps already do that! Callstacks, processor environments, even PNP triage data is stored and conserved at all times on Windows that tells you what's happened previously to give you a window into past activities. Driver Verifier even goes a step further by providing logs for various activities of drivers it watches, which can be accessed in a crashdump using
!verifier extension (of course, kernel dump required). Even so, many problems occur minutes, sometimes hours, days weeks or even months prior to an actual symptom manifests (especially with something like a bad memory allocation), so it does no good to only record most recent activity when the poison has already settled long outside the range of logging.
Suffice to say, a full blown logging process can't be done. However, live kernel debugging is the next best thing, because it allows someone to lay 'traps' (conditional breakpoints) so that when the culprit goes to commit the crime it'll be caught in the act. Of course, to do this one needs to sit down with all his current data, evaluate, and determine where he expects the suspect will strike next, but for someone that becomes more acquainted with this, it becomes easier to make a strong educated guess.
Btw, back to the logging thing, a somewhat more lightweight maneuver than logging
every bit of activity on a system is to turn on the appropriate
gflags for that system. Of course, personal experience has shown that tripping all these flags has some, undesirable effects, namely slowly a system down to horrendous crawl in many cases. However, if one only triggers the flags for data they are interested in viewing that is relevant to the problem, then it won't be so much an issue. Another option, as presented by the original article on DPCs, is an Event Viewer Trace Log.
I think a lot of misconceptions about debugging is that people think, "More data, more data, more data!" There are indeed cases where a broad general sweep is necessary to track down a cause, but that's only if all other speculation has missed their mark. An anecdotal example is
here, where I've suffered long and hard, doing everything from Procmon logs, to Wireshark logs, to Trace logs, triggered crashdumps, and more, and I was still stumped on what was going on. I could
see the effects in the data, and I could pinpoint it as far as some network service issue, but for the life of me I could not tell what was causing it. It was only then later on that it was discovered the
router the PC was hooked up too was causing the problem. The amount of logs could only tell one side of the tale, but without an indepth analysis of all the
relevant components, I was only working with flimsy data that could tell no more story. Crashdumps and all are also rather the same when it comes to hardware problems. Often times you can see the effects, and make guesses on what hardware may do it, but that's all it really is is just guessing. Without proper testing and diagnostic procedures on the hardware one could not tell from the data what's going on. To do any of this, though, is going to require prior experience, skill and knowledge to view the problem, make a hypothesis, then to gather the appropriate data and coordinate a solution. No matter how thorough
!analyze -v can be, it - nor any other automated tool - is going to be as precise as a sharp mind.