I’m a total sucker for lounge versions of 80s pop hits. And for Bruce Campbell.
-
16 May 2007 / Personal
-
11 May 2007 / Code
It’s easy to think of a C pointer as containing a physical memory location; if my program calls
int x = 42; int *p = &x; printf("p = 0x%pn", p);and the output is p = 0xdeadbeef, I could imagine that dereferencingpwould actually set the address bus to 11011110101011011011111011101111 = 0xdeadbeef to read its value. The real story of the x86 is a bit more complicated…The Intel 8086 has a 20-bit address bus, capable of addressing one megabyte (2^20 bytes) of memory, but only a 16-bit data bus — all registers are 16 bits wide. To access the entire address space, then, an offset stored in one of the general purpose registers is added to the base address of a memory segment. The segment number is stored in one of the segment registers (CS, where program code resides; SS, for the stack, DS, for global data, and ES, often used for extra program data like strings), and the final address is computed as segment*16 + offset.
The 80286 introduces a 16-bit protected mode, in which the segment registers no longer index physical locations in memory, but rather contain a 13-bit index into a table of 8-byte segment descriptors. 24 bits of these 8 bytes contain a base address, and physical addresses are computed by adding the offset directly to it, allowing 2^24 bytes = 16MiB to be addressed. The descriptors also contain segment limits, allowing the kernel to detect when a piece of code addresses memory it’s not supposed to.
There are three table types: the global descriptor table, the interrupt descriptor table, and local descriptor tables. There is a single global table which is available to all processes, but each process can have its own local table. The tables can be located anywhere in memory, and there are special instructions (
lgdt,lidt, andlldt) for setting them up. These instructions tell the CPU about both the location and size of the tables; since the table index from the segment register is 13 bits, the maximum table length is 8192 entries.In addition to the 13-bit table index, the segment register also contains a bit which selects between the global table or the current local table (the remaining two bits specify a privilege level, which I won’t discuss here). The x86 architecture distinguishes between the logical address stored in the segment and offset registers visible to the programmer and the linear address, which the CPU forms using the segmentation table lookup. Since we have 13 + 1 bits in the segment register used for addressing and 16 bits in the offset register, logical addresses in the 286 are 30 bits long. The documentation for the 286 will refer to 1 GiB of virtual address space, which refers to logical addressing. Of course, the address bus on the 286 is only 24 bits, and the mapping of logical addresses to linear addresses is not bijective, since the segments described in a descriptor table can overlap. In practice, the CPU raises an exception when a segment register contains a table index which is out of bounds, and the kernel can trap this exception to retrieve the data from its virtual memory implementation.
When protected mode was introduced with the 80286, the old 8086-style of addressing was referred to as real mode. To remain compatible with older code, the 80286 defaults to real mode and must be explicitly told to go into protected mode, from which it could not return without resetting the chip.
The intel 80386 was introduced in 1985 and extended the data bus to 32 bits. The segment registers were still 16 bits, but the segment descriptors they indexed were extended to allow 32-bit base addresses and handle 32-bit offset limits. This scheme is known as 32-bit protected mode.
Since the 386 allows each segment to address 2^32 = 4 Gib of linear address space, it is possible to set up one segment each for code and data, and not worry about segmentation; in fact, this is how the Linux kernel operates. The nice protection features allowed by segment limits can be implemented using the hardware’s paging mechanisms, which are not so x86-specific. I’ll write about paging in a future article.
-
09 May 2007 / Personal
I’m still not sure whether I made the decision, or if it was made for me, but it’s been made and it feels right: I’m staying in Seattle. I love it here; I’m happy to be back to my first love, engineering; and there’s nothing more important to me at this point in my life than supporting my family. It really was a no-brainer, but I can’t say that I’m leaving my Pittsburgh life behind without regrets. It will be hard to be across the country from some of my best friends in life.
So now it’s time to get moving on setting myself up here. I’ve spent this week dusting off the old resume and ironing my collared shirts for some interviews. Next month will be apartment hunting time. Then, I will proceed to take over the world. The first commenter gets Australia.
-
09 May 2007 / Code
I was talking with a friend the other day about questions we’ve used in interviews for programmers. One of my pet peeves is the obscenely specific questions which boil down to “how would you solve this particular problem we have right now?” If I were asked that kind of question in an interview, I would have to wonder if the company was at all interested in the long-term effectiveness of their employees, and whether they care to let them learn and grow into more productive programmers. Interviews need to establish a basic level of competence, but most importantly they need to determine whether the candidate knows what he doesn’t know and can learn on the job.
On the other hand, I always tried to avoid asking questions that had nothing to do with programming, such as why are manhole covers round?. My friend told me of a great one he uses: “if I type ‘
ping www.kqk4663.com‘ into a terminal at a UNIX box and hit return, what happens?” It’s been a while since I thought about the low-level hardware and software that makespingpossible, so as an exercise I wrote up the chain of events that occur on an 80×86 box running Linux:
A typical PS/2 keyboard has an onboard chip which detects keypresses and handles things like “debouncing” the switches. Different keyboards have different key mechanisms, so there is a variety of encoder chips (the Intel 8048 was popular early on), but the end result is a serial data signal sent through the keyboard cord to a controller chip on the motherboard. This data consists of (possibly multibyte) sequences called “scan codes.” For each key there are two unique scan codes: a “make code” which is sent when the key is pressed, and a “break code” which is sent when it is released.The controller chip is an Intel 8042-compatible device which handles decoding the serial stream from the keyboard, and telling the CPU about keypresses. It may also be integrated into the motherboard’s chipset. The CPU talks to it via I/O space addressing at locations 0×60 (data buffer) and 0×64 (status/commands).
A typical PC contains two 8259 Programmable Interrupt Controller chips (which may be part of a chipset), or one of it’s more advanced descendants.

The first one which controls the INT line of the CPU. It has eight interrupt inputs, IRQ0 – IRQ7, each of which can be hooked up to a peripheral device, one of which is the second “cascaded” 8259 on IRQ2. This allows for fifteen devices to interrupt the PC. The keyboard controller’s interrupt line is connected to IRQ1.When the INT line is raised, the processor immediately pushes the flags register and a return address onto the stack. It then fetches the interrupt vector (which IRQ line generated the interrupt) by lowering the INTA line to retrieve it from the 8259.
Given the IRQ vector, it CPU must look up the address of the code which will handle the interrupt (a.k.a. the interrupt service routine, or ISR). On the original 8086, the CPU expects to find a table of 32-bit ISR pointers at address 0×0. But the 80286 and later processors have a register called
idtr, which points to the system’s interrupt descriptor table in memory. Either way, the CPU uses the IRQ number as an index into these tables, retrieves a pointer to the ISR, pushes the flags register onto the stack, and jumps to that address. The ISR will return using theIRETinstruction rather thanRETto tell the processor to pop the flags register.Now, assuming that it set up the interrupt vector table properly at bootup, the kernel software is handling the interrupt. Next post, I’m going to dig into the Linux kernel and find out how it deals.
