So it’s been a while since I wrote one of my “ludicrously technical”(TM) blogs, and I guess that makes it time for a little treatise on modern computing time yet again.
Computers have come a long way over the past few decades, and in particular since the personal computing boom of the 1990s. When I first got involved with computers, they were things like the BBC Master (naturalized Americans: think Apple II, Commodore, and other MOS Technology 6502 based systems), which was very limited indeed and had very little in the way of peripherals one could add. Ok, so I’ve never used punched cards directly – I’m too young for that guilty pleasure.
With the introduction of the IBM PC in the early 1980s (I still remember my dad’s 80386 computer, which he bought in 1988 for 3000GBP – that’s well over $10K in today’s monetary terms – I hated being “forced” to use a PC instead of my BBC, although he probably had the right idea in retrospect!), along came various user-extendable system buses – such as ISA, EISA, MCA, and later PCI, AGP, PCI-X and modern day PCIe used for the vast majority of “home” computing systems). These allowed users to install various add-on “expansion” cards.
Fundamentally, that peripheral card you plug into your computer is nothing more than a circuit board containing a few chips with a bus, or metallic pin, connector on the lower edge of the card. These pins convey signals to the computer, at various points during a “bus cycle”, and depending upon the bus in use, this can be very complicated indeed. One of the things that all of these devices need to do, no matter how complex, or how simple, is to signal to the computer when they need service. The CPU doesn’t just know when this is, it has to be explicitly told.
Interrupts are the mechanism through which devices tell the CPU that they would like to get some service, please, if you don’t mind (and can I get whip on that triple-extra-venti-frappa-interrupt?). Depending on the bus in use, legacy buses especially, an interrupt is simply an electrical signal transition on one of those pins on the lower edge of the card, conveyed to the processor via a bus line.
*). Edge. These are more common in legacy systems. An interrupt is signaled as the result of a rising (or falling) edge of the electrical voltage transition of the corresponding interrupt line. Only one device can signal an interrupt on the same line at the same time, and it’s possible to miss an interrupt if the processor doesn’t sample the bus at the right moment, or is too late responding.
*). Level. These are very common today. Especially “active low” (low voltage state signals that an interrupt line is active, whereas it’s default is to be pulled up to a higher voltage level). An interrupt is asserted as a voltage state transition and remains asserted until such time as it is cleared. Multiple devices can assert at the same time, after a wire-OR natured fashion, and the line remains asserted until all devices have quiesed from the interrupt state.
The original (XT) PC made use of an Intel 8259-type PIC – Programmable Interrupt Controller – whose job was to rely to the CPU whenever one of many devices had asserted an interrupt. The CPU would only be interrupted once (CPUs typically don’t have more than a couple of interrupt inputs even today – and usually only one of those is actually for normal external interrupts, more on that later on), and would inquire of the XT-PIC for the corresponding line that triggered it.
The XT-PIC (actually two of them, they were usually daisy-chained, limiting total irqs) was a very simple piece of technology but it had its day. Many of us recall actively configuring hardware devices for specific interrupt lines, using physical hardware jumpers to avoid conflicts with other devices, etc. PCI later made the latter easier, because devices could be configured automatically (that really helped to get end-users involved, because they didn’t need to understand).
Other computing systems advanced considerably over the PC Architecture, which is still largely antiquated and a waste of time – but is used by the vast majority of computer users today, so is nonetheless the most relevant. Nobody cares about the OpenPIC used on older PowerPC systems, even if it was ahead of its time. I am only focusing on PC technology here, because it’s what people use, not because I particularly find it the most interesting or the most enjoyable.
After the XT-PIC came the APIC. Intel invented the “Intel APIC Architecture”, which is essentially a fancy way of saying they invented the LAPIC – local APIC – and the IO-APIC, as well as the original APIC bus that was used to connect these two devices together. Every modern Intel-compatible processor has an LAPIC built right into it (and mapped at an address determined via MSR register, with a default up in the 0xFECO_0000 memory range), and this talks to one or more external IO-APICs over a designated bus – used to be the APIC bus, but it’s now the system bus, or the HyperTransport bus in AMD systems.
You can’t do SMP – multiprocessing – without something like an APIC architecture. The IO-APIC(s) in a system all talk to every LAPIC, using a prioritization algorithm to determine which CPU will get to deal with a given interrupt (there’s also a task register for this purpose, but nobody uses it properly). CPUs signal each other with IPIs (special Inter-processor interrupts) via their local LAPIC. End devices signal their interrupt to the processor as follows:
*). Device internally decides to assert an interrupt.
*). State transition on the interrupt line (active high/low, edge/level).
*). IO-APIC receives interrupt at one of its interrupt pins.
*). IO-APIC uses a vector table to map the pin to an IRQ vector.
*). IO-APIC assers interrupt on its APIC bus.
*). CPU LAPIC receives interrupt and vector.
*). CPU calls appropriate vector routine.
*). Operating System signals ACK/EOI.
*). LAPIC acknowledges with IO-APIC.
*). (Operating system might mask directly in the IO-APIC).
*). IO-APIC internally resets the line.
IO-APICs are not that complex, but interrupt routing is. There are many possible ways to hook up an interrupt from a physical device to the corresponding pin on a particular IO-APIC chip (each might support 24 interrupt lines, and there might be up to 6 or more of these in a particular system, though often fewer). Fortunately, the Operating System doesn’t have to guess this because the ACPI tables (provided by the system level firmware – you call it a BIOS in peecee land) supposedly provide comprehensive interrupt routing information.
When Linux boots, and initializes the IO-APIC(s), it reads the ACPI tables, checks the corresponding physical pin routing to physical APIC IDs in the system (every APIC has an individual ID – depending upon the mode it is, determined partly by the maximum number of CPUs in the system, e.g. flat virtual/physical mode, cluster mode, or other horrible modes besides) and wires those up to specific software vectors via the vector tables contained within each IO-APIC. These vectors are used to call different routines for different IRQs – in reality, typically the same routine, just via an extra set of jumping of hoops.
You can see your interrupt configuration on a sensible operating system (read: Linux) via the /proc/interrupts special kernel virtual psedo file:
CPU0 CPU1
0: 105 0 IO-APIC-edge timer
1: 17 32 IO-APIC-edge i8042
6: 0 6 IO-APIC-edge floppy
7: 0 0 IO-APIC-edge parport0
8: 149818 267265667 IO-APIC-edge rtc
9: 0 0 IO-APIC-fasteoi acpi
12: 592506 8616725 IO-APIC-edge i8042
14: 42 4229761 IO-APIC-edge libata
15: 0 0 IO-APIC-edge libata
16: 96132501 1629692 IO-APIC-fasteoi firewire_ohci, radeon@pci:0000:02:00.0
18: 0 15875643 IO-APIC-fasteoi eth0
20: 34727470 163 IO-APIC-fasteoi HDA Intel
21: 5933715 5746 IO-APIC-fasteoi sata_nv
22: 16715 177692 IO-APIC-fasteoi ohci_hcd:usb2
23: 15138783 1130425 IO-APIC-fasteoi ehci_hcd:usb1
NMI: 0 0
LOC: 413043921 418168360
ERR: 0
(notice the NMI line. These are special Non-Maskable Interrupts. They only happen when something really critical happens that needs to be handled right now, even if the CPU is otherwise ignoring interrupts – like the machine is on fire, or something similar is about to happen. Errors are also logged, as are total interrupts on the local APIC).
But this is all wrong on very recent systems. PCIe (PCI Express), PCI-X and PCI standard compliant devices meeting revision 2.2 of the standard mandate the use of Message Signalled Interrupts. But what are these? These are fake memory write operations from devices on the PCI bus. And they are used in order to do away with the need for out-of-band interrupt signalling lines, the hassle that causes, and the extra state machine complexity inherent in their usage.
A modern PCI device asserts an interrupt by signalling an MSI – a fake write to a special address window that is trapped by the PCI bridge, not propagated to the system memory directly, which signals an interrupt assertion message. Interrupts are de-asserted via a special interrupt message. They are not exactly “edge”, the are more “level”, since they are both asserted an de-asserted, but there’s no easy mapping for them in the sense of traditional wire interrupts (I’m sure the PC columnists will continue to missunderstand MSI in their columns).
The IO-APIC will still receive the interrupt, and the processor will handle it, even though it was never a direct assertion of an interrupt pin any longer. Things get more complex in legacy systems (that’s every system you buy today still, pretty much, as the still tend to support legacy PCI devices, or PCI-X, and even if you think your system doesn’t, it probably has ye olde PCI devices on board). In this case, the PCI(-X)-to-PCIe bridge(s) will propagate interrupts that aren’t already MSI(s) in the form of a conversion to Virtual Wire Interrupt Messages. These are pseudo-encapsulated ye olde interrupts, sent as MSIs.
You can tell an MSI on sensible Operating Systems, from the output in /proc/interrupts, or a similar file:
CPU0 CPU1 CPU2 CPU3
0: 1166 0 1 0 IO-APIC-edge timer
1: 1 0 0 1 IO-APIC-edge i8042
3: 10348681 56181 10332107 56041 IO-APIC-edge serial
6: 0 0 0 0 IO-APIC-edge myirq
8: 0 0 0 0 IO-APIC-edge rtc
9: 0 0 0 0 IO-APIC-fasteoi acpi
12: 0 2 1 1 IO-APIC-edge i8042
14: 466688 129920 459730 129979 IO-APIC-edge ide0
16: 0 0 0 0 IO-APIC-fasteoi qla2xxx
17: 557 1102 559 1079 IO-APIC-fasteoi qla2xxx
20: 352354 57654 51367 56437 IO-APIC-fasteoi uhci_hcd:usb2
21: 10 5 6 6 IO-APIC-fasteoi uhci_hcd:usb1, uhci_hcd:usb3, ehci_hcd:usb4
78: 2989449 1567 2854570 1555 IO-APIC-fasteoi megasas
8406: 89 90 1238035 4091 PCI-MSI-edge eth0
NMI: 0 0 0 0
LOC: 133189596 133312432 133346309 133841761
ERR: 0
(MSI interrupts always have high numbers, like 8406, on Linux anyway).
There’s more fun. Modern Linux systems tend to handle interrupts using the special EOI (or “fasteoi” naming), which is a fancy way of saying that you can tell the LAPIC that it’s EOI time and it’ll go tell the IO-APIC concerned to shutup. There’s just one write required to shut up an interrupt and get the next one. Except on even more modern Linux systems, using the realtime-preempt superuberdupermagicalness kernel patchset. These systems handle interrupts in the form of two additional “top half handlers”:
*). Receive the hardware interrupt, schedule a thread.
*). Thread takes care of processing.
Yes, Linux does threaded interrupts. Eat that Solaris, Windows, whatever. But to achieve this transition, interrupts must be masked out when first received (no fast EOI), and then re-enabled after the threaded interrupt handler has done its magic. You can’t call EOI later, because you might be running on a different CPU by then, and you don’t know what you’re EOIing about anyway any more.
This has been a random wibble on modern interrupts. I might turn this into a more comprehensive paper, if there’s interest. I’ve spent the last few weeks reading up on IO-APIC, APIC, PCI, PCI-X, PCIe and a lot more besides. We can discuss PCI sub-ordinate bridge enumeration in depth first fashion later.
Jon.
Interrupts were threaded in Solaris back when I hacked on it, that is… 2.5.1 or so. Put it to 1999 or about that. Frankly I’m not convinced of their merits and it was very clear that the whole idea was pushed by RT and low-lat people (this is the area where Solaris was way ahead of Linux as well… their glacial pace of development in the little proprietary world allowed us to catch up).
“(MSI interrupts always have high numbers, like 8406, on Linux anyway).”
Can you elaborate more on why is that? And how is the number determined?
Thanks.