Cap'n Arbyte's

Advertisements


Local interest


Other sites


Blogroll

Too Complicated

A few days ago I complained briefly that I couldn't boot OpenBSD into MP mode on my new computer. I've done some more investigating and since it's occupied a lot of my time I might as well blog about it. See the references at the end of this post to view the source files and specifications mentioned herein.

The crash that throws me into the kernel debugger when I boot the MP kernel occurs in the intr_find_mpmapping function of intr.c. At the point of failure it's trying to look up the mapping for my network interface (em0 at pci4 dev 0 function 0 "Intel PRO/1000MT (82573E)" rev 0x03) which is, as you see, on PCI bus #4. The code reads from the mp_busses array without bounds checking, which is arguably a bug. In my case it is reading past the end of the array.

To figure that out, I had to look at where the array was allocated -- the mpbios_scan function of mpbios.c. It's allocated with zero length because mp_nbus is 0. Why is that? The loop immediately before the malloc is traversing the MP Configuration Table, a structure defined in the Intel MultiProcessor Specification and created by BIOS.

The MP Configuration Table (hereafter mptable) being traversed by the kernel looks valid, but it's only a stub. The header indicates only a single entry exists in the table. Examining memory reveals that it's an entry for a processor. There are no bus entries at all, which explains why 0 bytes was allocated for the mp_busses array. I expected to see four processor entries (because my CPU supports four threads), several bus entries, and a smattering of APIC and interrupt entries too. But I have a stub mptable.

Is this a BIOS bug? I'm not sure yet. The MP spec is 9 years old and has been largely supplanted by the ACPI spec. I don't know whether BIOS is expected to create a proper mptable anymore, or if operating systems are expected to parse the ACPI tables instead. But I have a nice social network; I know people who know people who wrote this BIOS, so I'll ask them. :)

It's also possible that I'm looking at a corrupted or otherwise incorrect mptable. There are several possible locations for this structure and the MP Spec defines the proper search sequence. Maybe the kernel is searching in the wrong order and found a stub. (Which would be weird, but possible.) Maybe I'm looking at a modified copy of BIOS's original mptable.

Another mystery is why the system boots successfully with the non-MP kernel. Does it use a different and correct mptable or does it ignore the mptable entirely and do something else?

I'm long on questions and short on answers at this point. Computers are Too Complicated™.

References:

Tiny Island