This Space for Rent

Annoying open source®™© featureette of the day

Since we use R*dh*t Linux at work, we end up using their not-quite-proprietary but documented as if it was proprietary package management software, which isn't completely horrible (at least not compared to the other package management software I've used; I've been able to actually write rpm command files to generate packages, which is not something I've been able to do with some of the other offerings in the package management world), but which has some extraordinarily annoying features. The run of the mill annoying features (it's really slow, even when you're doing nothing more than looking at a list of packages) can be easily worked around, but it's got one feature that occasionally locks the whole shebang up and forces a reboot.

What's this feature? Well, there are many ways to arbitrate access to a multi-user database, and rpm uses what I consider to be one of the worst possible ones: They use something that is hilariously called a "fast userspace locking system call", or futex if you look for the system call that has locked up your rpm -ivh on you. This system call drops a lock into shared memory (or somewhere else) so that subsequent calls will hang on the lock until it is removed, and the really fun thing (that separates it from a simple flock) is that if the process terminates without unlocking this lock, the lock sticks around forever, or until you give the three finger salute to the offending computer.

We have a dedicated build machine that builds proprietary drivers for our Linux distribution (because, as you know, a fixed public driver interface would allow people to build proprietary drivers for Linux, which would be bad) and as part of the build process unpacks kernel and driver source packages, then packs up a finished driver binary package. Periodically (every 6-9 months, we'll find that our build process has come to a shrieking halt because the proprietary driver machine is hung up on a futex lock and no rpms will be packed or unpacked until we clear the lock by the simple process of turning off the machine.

Futexes were introduced to the Linux kernel by a team of coders that is 50% IBM employees. This is worth a few moments of sick humor when I'm rebooting an IBM x346 because that's the only way I can clear an IBM-designed software lock. It's not quite so much fun when I'm rebooting the IBM (aka Lenova) workstation that' I'm running RHEL3 on, because I tend to fill up the screen with xterms and I loathe having to go back and reopen a dozen or so windows just because some stupid badly-implemented kernel locking scheme has gone to hell again.

Comments


I don’t know if it’s related, but we have a situation where we are upgrading 20-30 machines at the same time, and generally about 3 of them will hang in the rpm -Uvh stage while it’s doing a bunch of rpms at once. We use apt to drive these upgrades, but i’ve seen the same thing happen using yum. It’s definitely the rpm process that’s locking up.

Paul Tomblin Fri Mar 31 06:09:04 2006

I’ve experienced this problem also, but searching revealed this lkml post, which resolves the issue without booting:

http://article.gmane.org/gmane.linux.kernel/459204

Asgeir S. Nilsen Tue Nov 14 01:27:56 2006

Comments are closed