Hardware Interrupts and IRQs:
Interrupts are a essential part of the operating system. They ensure that vital tasks are completed as needed and come in two forms: hardware and software. In this guide, we'll be predominantly focusing on hardware interrupts and how they are handled by the processor.
Hardware interrupts are represented in operating system by IRQs or Interrupt Requests. These requests are issued by a number of hardware devices such as a keyword or a hard disk. When a request is made, a signal is sent to the processor, the processor receives the request and consults a special table called the Interrupt Dispatch Table (IDT) for the appropriate interrupt handler to be able to process the request.
Before we look into the IDT, let's take a look into how the signal reaches the processor and how the processor knows which device sent the interrupt request.
Handling Device Interrupts
Each request has it's own IRQ number, these numbers are associated to different lines on what is known as a Advanced Programmable Interrupt Controller or APIC. Older systems used PICs, however, these are rather obsolete now and therefore we'll only focus on APICs. APICs consist of a local APIC which is associated to each processor and a I/O APIC which sits between the hardware devices and the local APICs. In fact, we can dump this information in WindDbg using
!ioapic and
!apic.
A hardware device issues a IRQ to the I/O APIC, the IRQ is then sent to the local APIC which translates the IRQ to an interrupt vector which is used to index into IDT of the processor. Each processor will have its own IDT. The IDT will then look up the corresponding Interrupt Service Routine (ISR) to handle the interrupt request and raise the Interrupt Request level (IRQL) of the processor if needed. The address of the IDT is stored within a register called IDTR. This register is manipulated by LIDT and SIDT instructions respectively.
Once the interrupt has begun, the device driver which caused the hardware interrupt will acquire a spinlock (calls
KeSynchronizeExecution)to ensure that any data which is required to service the interrupt is not corrupted by other device drivers' threads. This spinlock is acquired at the IRQL specified within the interrupt object's
SynchronizeIrql field. Once the appropriate data has been read, then the spinlock will be released.
Since holding the processor at DIRQL level can be quite expensive and detrimental, the ISR will queue a DPC object (calls
IoRequestDpc) to complete the rest of the interrupt at a later time. Once the DPC object has been queued, it will added to the DPC queue of the processor, whereupon the DPC's ISR will called once the processor's IRQL level has dropped to DISPATCH_LEVEL.
APIC
Code:
5: kd> !apic
Apic (x2Apic mode) ID:4fc73880 (4fc73880) LogDesc:4fc73880 TPR 4FC73880
TimeCnt: 4fc73880tmbase SpurVec:80 FaultVec:4fc73880 error:4fc73880 DISABLED
Ipi Cmd: 000000a6`f7afd4c8 Vec:C8 NMI Dest=Othrs-Pend lvl high rirr
Timer..: 00000000`4fc73880 Vec:80 FixedDel Dest=Self-Pend edg low m
Linti0.: 00000000`4fc73880 Vec:80 FixedDel Dest=Self-Pend edg low m
Linti1.: 00000000`4fc73880 Vec:80 FixedDel Dest=Self-Pend edg low m
TMR: 05-6, 0B-10, 12, 15-16, 1A-1c, 1F, 25-26, 2B-30, 32, 35-36, 3A-3c, 3F, 45-46, 4B-50, 52, 55-56, 5A-5c, 5F, 65-66, 6B-70, 72, 75-76, 7A-7c, 7F, 85-86, 8B-90, 92, 95-96, 9A-9c, 9F, A5-a6, AB-b0, B2, B5-b6, BA-bc, BF, C5-c6, CB-d0, D2, D5-d6, DA-dc, DF, E5-e6, EB-f0, F2, F5-f6, FA-fc, FF
IRR: 05-6, 0B-10, 12, 15-16, 1A-1c, 1F, 25-26, 2B-30, 32, 35-36, 3A-3c, 3F, 45-46, 4B-50, 52, 55-56, 5A-5c, 5F, 65-66, 6B-70, 72, 75-76, 7A-7c, 7F, 85-86, 8B-90, 92, 95-96, 9A-9c, 9F, A5-a6, AB-b0, B2, B5-b6, BA-bc, BF, C5-c6, CB-d0, D2, D5-d6, DA-dc, DF, E5-e6, EB-f0, F2, F5-f6, FA-fc, FF
ISR: 05-6, 0B-10, 12, 15-16, 1A-1c, 1F, 25-26, 2B-30, 32, 35-36, 3A-3c, 3F, 45-46,
Let's examine of the output from the command.
Linti0 and
Linti1 refer to the local interrupt pins associated with local APIC. Hardware devices can connect to these pins and send interrupts to the local APIC which in turn is forwarded to the processor. These are an example of what is known as a local interrupt. Local interrupts are then routed to a collection of special registers which are known collectively as the local vector table. The entries within the LVT, may be forwarded to the IDT, depending on how that particular entry has been configured to be used. The
Vector refers to the entry within the IDT, the IDT will be explored in more depth in the next section.
NMI is a delivery mode, and states how the interrupt should be delivered. If the delivery mode is set to NMI, then the interrupt vector number is ignored. NMI refers to non-maskable interrupt. On the other hand,
Fixed is the opposite, and the vector number will be used to index into the IDT and deliver the corresponding interrupt to the processor.
The APIC Mode refers to the operating mode of the APIC. x2APIC is the latest APIC architecture and is an extension of the older xAPIC architecture. The key difference between the modes is that x2APIC provides better performance and some additional features.
Please note that more information regarding the APIC can be found in the Intel developer documentation, please refer to
Volume 3, Section 10.1.
Interrupt Dispatch Table (IDT)
We can dump the IDT in WinDbg using the !idt extension:
Rich (BB code):
5: kd> !idt
Dumping IDT: fffff880030b06c0
00: fffff800034b8700 nt!KiDivideErrorFault
01: fffff800034b8800 nt!KiDebugTrapOrFault
02: fffff800034b89c0 nt!KiNmiInterruptStart Stack = 0xFFFFF880030B00C0
03: fffff800034b8d40 nt!KiBreakpointTrap
04: fffff800034b8e40 nt!KiOverflowTrap
05: fffff800034b8f40 nt!KiBoundFault
06: fffff800034b9040 nt!KiInvalidOpcodeFault
07: fffff800034b9280 nt!KiNpxNotAvailableFault
08: fffff800034b9340 nt!KiDoubleFaultAbort Stack = 0xFFFFF880030AC0C0
09: fffff800034b9400 nt!KiNpxSegmentOverrunAbort
0a: fffff800034b94c0 nt!KiInvalidTssFault
0b: fffff800034b9580 nt!KiSegmentNotPresentFault
0c: fffff800034b96c0 nt!KiStackFault
0d: fffff800034b9800 nt!KiGeneralProtectionFault
0e: fffff800034b9940 nt!KiPageFault
10: fffff800034b9d00 nt!KiFloatingErrorFault
11: fffff800034b9e80 nt!KiAlignmentFault
12: fffff800034b9f80 nt!KiMcheckAbort Stack = 0xFFFFF880030AE0C0
13: fffff800034ba300 nt!KiXmmException
1f: fffff800034af860 nt!KiApcInterrupt
2c: fffff800034ba4c0 nt!KiRaiseAssertion
2d: fffff800034ba5c0 nt!KiDebugServiceTrap
2f: fffff80003507510 nt!KiDpcInterrupt
37: fffffa800ca66ef0 hal!HalpApicSpuriousService (KINTERRUPT fffffa800ca66e60)
3f: fffffa800ca66f90 hal!HalpApicSpuriousService (KINTERRUPT fffffa800ca66f00)
50: fffffa800ca670d0 hal!HalpCmciService (KINTERRUPT fffffa800ca67040)
I've highlighted the address of the IDT in
green. This matches the address which is stored within the IDTR register:
Code:
5: kd> r @idtr
idtr=fffff880030b06c0
The numbers along the left-hand side are the interrupt vectors which are used to index into the IDT and find the corresponding ISR. For example, for page faults, the interrupt vector is 0e. We can dump a particular entry of the IDT by doing the following:
Code:
5: kd> !idt 0e
Dumping IDT: fffff880030b06c0
0e: fffff800034b9940 nt!KiPageFault
Now, you may have noticed the KINTERRUPT structure exists for some entries but not all, this is because the page fault is technically an exception and not an interrupt, and therefore will not have the aforementioned structure. Let's investigate what this structure informs us:
Code:
5: kd> dt nt!_KINTERRUPT fffffa800cb4fa80
+0x000 Type : 0n22
+0x002 Size : 0n160
+0x008 InterruptListEntry : _LIST_ENTRY [ 0x00000000`00000000 - 0x00000000`00000000 ]
+0x018 ServiceRoutine : 0xfffff800`0347da10 unsigned char nt!KiInterruptMessageDispatch+0
+0x020 MessageServiceRoutine : 0xfffff880`0482d62c unsigned char +0
+0x028 MessageIndex : 0
+0x030 ServiceContext : 0xfffffa80`0e4c4bb0 Void
+0x038 SpinLock : 0
+0x040 TickCount : 0
+0x048 ActualLock : 0xfffffa80`0e7b3230 -> 0
+0x050 DispatchAddress : 0xfffff800`034b7670 void nt!KiInterruptDispatch+0
+0x058 Vector : 0xa0
+0x05c Irql : 0xa ''
+0x05d SynchronizeIrql : 0xa ''
+0x05e FloatingSave : 0 ''
+0x05f Connected : 0x1 ''
+0x060 Number : 5
+0x064 ShareVector : 0x1 ''
+0x065 Pad : [3] ""
+0x068 Mode : 1 ( Latched )
+0x06c Polarity : 0 ( InterruptPolarityUnknown )
+0x070 ServiceCount : 0
+0x074 DispatchCount : 0
+0x078 Rsvd1 : 0
+0x080 TrapFrame : 0xfffff880`030cdab0 _KTRAP_FRAME
+0x088 Reserved : (null)
+0x090 DispatchCode : [4] 0x8d485550
The
Vector field refers to the interrupt vector number which is used to index into the IDT. The
Irql field refers to the IRQL level which the interrupt will run at. We can translate this to a IRQL number using the following:
Code:
5: kd> ? 0xa
Evaluate expression: 10 = 00000000`0000000a
Judging from the IRQL level, we can see that this is a device interrupt (explained in the following section). Following on from this, the
ServiceRoutine field refers to the ISR which will be called to handle the interrupt.
Now, this brings me to the point about shared interrupts due to shared IRQs. Since there can be more devices present than the number of available interrupt lines for a processor, then IRQs may be shared between multiple devices, which means that one interrupt vector may be used to service multiple devices. We can examine this within the
InterruptListEntry field.
Code:
!idt -a
81: fffff802aad5d608 USBPORT!USBPORT_InterruptService (KINTERRUPT ffffa18196d53280)
1394ohci!Interrupt::WdfEvtInterruptIsr (KMDF) (KINTERRUPT ffffa18198314c80)
USBPORT!USBPORT_InterruptService (KINTERRUPT ffffa18198314a00)
Notice how multiple interrupts are all associated to the same interrupt entry?
Let's dump the first interrupt in the list.
Code:
1: kd> dt nt!_KINTERRUPT ffffa18196d53280
+0x000 Type : 0n22
+0x002 Size : 0n256
+0x008 InterruptListEntry : _LIST_ENTRY [ 0xffffa181`98314c88 - 0xffffa181`98314a08 ]
+0x018 ServiceRoutine : 0xfffff807`25eb6470 unsigned char USBPORT!USBPORT_InterruptService+0
+0x020 MessageServiceRoutine : (null)
+0x028 MessageIndex : 0
+0x030 ServiceContext : 0xffffd607`5ade0050 Void
+0x038 SpinLock : 0
+0x040 TickCount : 0
+0x048 ActualLock : 0xffffd607`5af45df0 -> 0
+0x050 DispatchAddress : 0xfffff802`aac381d0 void nt!KiChainedDispatch+0
+0x058 Vector : 0x81
+0x05c Irql : 0x8 ''
+0x05d SynchronizeIrql : 0x8 ''
+0x05e FloatingSave : 0 ''
+0x05f Connected : 0x1 ''
+0x060 Number : 1
+0x064 ShareVector : 0x1 ''
+0x065 EmulateActiveBoth : 0 ''
+0x066 ActiveCount : 0
+0x068 InternalState : 0n4
+0x06c Mode : 0 ( LevelSensitive )
+0x070 Polarity : 0 ( InterruptPolarityUnknown )
+0x074 ServiceCount : 0
+0x078 DispatchCount : 0
+0x080 PassiveEvent : (null)
+0x088 TrapFrame : (null)
+0x090 DisconnectData : (null)
+0x098 ServiceThread : (null)
+0x0a0 ConnectionData : 0xffffd607`5af45e00 _INTERRUPT_CONNECTION_DATA
+0x0a8 IntTrackEntry : 0xffffd607`5afb2320 Void
+0x0b0 IsrDpcStats : _ISRDPCSTATS
+0x0f0 RedirectObject : (null)
+0x0f8 Padding : [8] ""
We can see that the two associated interrupts are within the
InterruptListEntry linked list.
Code:
1: kd> ? ffffa18198314c80+8
Evaluate expression: -103897000489848 = ffffa181`98314c88
1: kd> ? ffffa18198314a00+8
Evaluate expression: -103897000490488 = ffffa181`98314a08
Now, looking at the
DispatchAddress field, we can see that the interrupt will be dispatched using the chained dispatch function. The chained dispatch function is used to transfer control to interrupts which have other interrupts associated to them i.e. multiple interrupts are sharing the same IRQ. On the other hand, the interrupt dispatch function is called for interrupts where this isn't the case.
The Mode and Polarity fields are also of note, and are used in combination with each other to determine how an interrupt will be delivered to the processor. Both of these fields are enumerations. These can be dumped in WinDbg using the following commands.
Code:
1: kd> dt _KINTERRUPT_MODE
nt!_KINTERRUPT_MODE
LevelSensitive = 0n0
Latched = 0n1
Interrupt mode refers to the type of interrupt which will be sent to the processor. There is currently two modes: edge-triggered and level-triggered. Edge triggered interrupts are used for interrupts which use the messaging delivery mechanism, whereas, level-triggered interrupts use the traditional line-based delivery mechanism which takes into account the number of physical lines available for an APIC. We'll discuss the difference between the two in more depth in a moment.
Code:
1: kd> dt _KINTERRUPT_POLARITY
nt!_KINTERRUPT_POLARITY
InterruptPolarityUnknown = 0n0
InterruptActiveHigh = 0n1
InterruptRisingEdge = 0n1
InterruptActiveLow = 0n2
InterruptFallingEdge = 0n2
InterruptActiveBoth = 0n3
InterruptActiveBothTriggerLow = 0n3
InterruptActiveBothTriggerHigh = 0n4
The polarity describes how a line-based or message-based interrupt will be delivered to the processor. It is important to note, that the rising and falling edge refer to the signal edge, which describes the transition of a signal from 0 to 1 and vice versa. A signal which is transitioning from 0 to 1 has a raising edge, whereas, a signal which has a falling edge is transitioning from 1 to 0.
More details on the enumeration values can be found here -
_KINTERRUPT_POLARITY (wdm.h) - Windows drivers
Line-based and Message-based interrupts
Instead of using a particular interrupt pin and then sending an interrupt signal along the associated interrupt line. Message-based interrupts (MSIs) write to a particular memory address which is typically a memory-mapped I/O region of address space.
Now, let's go back to our interrupt structure and examine some of the differences.
Code:
1: kd> dt nt!_KINTERRUPT ffffa18196d53280
+0x000 Type : 0n22
+0x002 Size : 0n256
+0x008 InterruptListEntry : _LIST_ENTRY [ 0xffffa181`98314c88 - 0xffffa181`98314a08 ]
+0x018 ServiceRoutine : 0xfffff807`25eb6470 unsigned char USBPORT!USBPORT_InterruptService+0
+0x020 MessageServiceRoutine : (null)
+0x028 MessageIndex : 0
+0x030 ServiceContext : 0xffffd607`5ade0050 Void
+0x038 SpinLock : 0
+0x040 TickCount : 0
+0x048 ActualLock : 0xffffd607`5af45df0 -> 0
+0x050 DispatchAddress : 0xfffff802`aac381d0 void nt!KiChainedDispatch+0
+0x058 Vector : 0x81
+0x05c Irql : 0x8 ''
+0x05d SynchronizeIrql : 0x8 ''
+0x05e FloatingSave : 0 ''
+0x05f Connected : 0x1 ''
+0x060 Number : 1
+0x064 ShareVector : 0x1 ''
+0x065 EmulateActiveBoth : 0 ''
+0x066 ActiveCount : 0
+0x068 InternalState : 0n4
+0x06c Mode : 0 ( LevelSensitive )
+0x070 Polarity : 0 ( InterruptPolarityUnknown )
+0x074 ServiceCount : 0
+0x078 DispatchCount : 0
+0x080 PassiveEvent : (null)
+0x088 TrapFrame : (null)
+0x090 DisconnectData : (null)
+0x098 ServiceThread : (null)
+0x0a0 ConnectionData : 0xffffd607`5af45e00 _INTERRUPT_CONNECTION_DATA
+0x0a8 IntTrackEntry : 0xffffd607`5afb2320 Void
+0x0b0 IsrDpcStats : _ISRDPCSTATS
+0x0f0 RedirectObject : (null)
+0x0f8 Padding : [8] ""
I've used the same interrupt as before, since the dump which I was using, didn't appear to have any MSIs, although, the example should be suffice. The first key difference is the dispatching function is
KiInterruptMessageDispatch. The
MessageServiceRoutine field has the same meaning as it's line-based interrupt counterpart.
The
MessageIndex field is used as an index into the
MessageInfo array which is part of the
_IO_INTERRUPT_MESSAGE_INFO structure. This is passed to the message service routine. Please see here for further details -
KMESSAGE_SERVICE_ROUTINE (wdm.h) - Windows drivers
It should also be mentioned that each entry within the IDT is represented by a structure called _KIDTENTRY64 (_KIDTENTRY on x86 systems). We can dump this structure in WinDbg:
Code:
5: kd> dt nt!_KIDTENTRY64
+0x000 OffsetLow : Uint2B
+0x002 Selector : Uint2B
+0x004 IstIndex : Pos 0, 3 Bits
+0x004 Reserved0 : Pos 3, 5 Bits
+0x004 Type : Pos 8, 5 Bits
+0x004 Dpl : Pos 13, 2 Bits
+0x004 Present : Pos 15, 1 Bit
+0x006 OffsetMiddle : Uint2B
+0x008 OffsetHigh : Uint4B
+0x00c Reserved1 : Uint4B
+0x000 Alignment : Uint8B
Now, it is important to note, that technically the fields here correspond to the field of a IDT gate descriptor, which is a form of segment descriptor. I'll assume that you understand what segment descriptors in order to keep this article in scope, however, in short they are used to describe a section of memory including the required access rights etc.
The
OffsetLow,
OffsetMiddle and
OffsetHigh are combined together to form the memory address of the interrupt vector. The
Dpl field is used to describe the ring level (privilege level) the processor must be operating at in order for the segment to be accessed. The
Type field is used to describe the type of segment descriptor, in this case it will either be a task gate, trap gate or an interrupt gate. The
Present flag is used to determine if the segment is present in memory or not.
Interrupt Flag
The Interrupt flag is used to indicate if the processor should handle maskable interrupts or not. Maskable interrupts are all hardware interrupts apart from NMIs. If this flag is cleared (set to 0), then the processor will simply ignore any maskable interrupts. The STI (Set interrupts) and CLI (clear interrupts) instructions are used to handle changes made to this EFLAGS register.
IRQLs:
IRQLs are interrupt priority levels, each IRQ is mapped to a particular IRQL through the HAL. On x64 systems, the IRQLs run from 0 (the lowest) to 15 (the highest). On x86 systems, the IRQLs run from 0 to 31.
IRQLs are used to determine which interrupts a processor is able to be interrupted by and subsequently process. A higher IRQL interrupt will take precedence over a lower IRQL one. Every processor will have a current IRQL which can be found within the Processor Control Region (PCR). We can dump this structure in WinDbg using the !pcr extension:
Code:
5: kd> !pcr
KPCR for Processor 5 at fffff880030a5000:
Major 1 Minor 1
NtTib.ExceptionList: fffff880030b0640
NtTib.StackBase: fffff880030aa040
NtTib.StackLimit: 00000000001ad328
NtTib.SubSystemTib: fffff880030a5000
NtTib.Version: 00000000030a5180
NtTib.UserPointer: fffff880030a57f0
NtTib.SelfTib: 000007fffffde000
SelfPcr: 0000000000000000
Prcb: fffff880030a5180
Irql: 0000000000000000
IRR: 0000000000000000
IDR: 0000000000000000
InterruptMode: 0000000000000000
IDT: 0000000000000000
GDT: 0000000000000000
TSS: 0000000000000000
CurrentThread: fffffa800d716060
NextThread: 0000000000000000
IdleThread: fffff880030b00c0
DpcQueue:
If a processor receives an interrupt which is lower than it's current IRQL, then the interrupt is masked, and will either be scheduled to be run when the IRQL has been lowered or will be scheduled to run on a different processor.
In the event that the processor is able to service the interrupt, the current IRQL will be saved along with the current thread context, the IRQL level of the processor will then be raised to IRQL level associated to the interrupt and the appropriate ISR will be called. It should be noted that the processor will always try to return to the lowest possible IRQL as soon as possible to ensure that interrupts can be serviced efficiently.
We can dump the saved IRQL for the processor using the !irql extension:
Code:
5: kd> !irql
Debugger saved IRQL for processor 0x5 -- 0 (LOW_LEVEL)
Alternatively, the saved IRQL for the process can also be found within the PRCB (Processor Region Control Block), we can be dumped using WinDbg using !prcb:
Code:
5: kd> !prcb
PRCB for Processor 5 at fffff880030a5180:
Current IRQL -- 0
Threads-- Current fffffa800d716060 Next 0000000000000000 Idle fffff880030b00c0
Processor Index 5 Number (0, 5) GroupSetMember 20
Interrupt Count -- 001e9395
Times -- Dpc 00000218 Interrupt 00000839
Kernel 0000e9d7 User 000030aa
Sources:
Windows Internals 5th Edition
CodeMachine - Interrupt Dispatching Internals
.:: Phrack Magazine ::.
Windows: Line-Based vs. Message Signaled-Based Interrupts. MSI tool.
Introduction to Spin Locks - Windows drivers
Introduction to Message-Signaled Interrupts - Windows drivers