This Space for Rent

Sneaky software bug

At work, there's a little piece of code that walks /dev, looking for interesting devices that it can build softlinks to. It works in a pretty straightforward manner:

     function dig(directory d):
         foreach (de in d)
             if (about = stat(d+"/"+de))
                 if (isachardev(about) || isablockdev(about))
                     dosomethingfunwith(d+"/"+de, about);
                 else if (isadir(about))
                     dig(d+"/"+de);
                 end;
             end;
         end;
     end;

Pretty simple and bugfree, right? Well, um, no. On modern Linuxes, there are symbolic links /dev/stdin, /dev/stdout, /dev/stderr, all of which are symlinked to stdin/stdout/stderr of the currently running process. Normally those are the tty that the process is attached to, or /dev/null if the process is running in a daemon, but there are, um, certain times when they are attached somewhere else.

One of those certain times is when you're running the process as part of the linux hotplug architecture (the linked page is about Debian GNU/Linux, but the kernel part of the hotplug code is in the baseline kernel and is thus found on every Linux distribution that uses at least a 2.4 kernel.) It has an architecture where you can tell the kernel the name of a userland callback program to run whenever a hotplug event happens, at which point the kernel will run the program a few times. BUT, when it runs it doesn't seem to actually set up a stdin/stdout/stderr for that program, and, as I discovered with the version of the Linux kernel we used today, stdout is set to /dev.

So. dig("/dev") goes strolling through /dev, looking for interesting things. It finds /dev/stdout (->/dev), which, thanks to the magic of stat(), is a (drum roll) directory. dig("/dev/stdout") goes strolling through /dev/stdout, looking for interesting things. It finds /dev/stdout/stdout (->/dev), which, thanks to the magic of stat(), is a (drum roll) directory. dig("/dev/stdout/stdout") goes strolling through ....

And this goes on and on and on, at full speed, until memory fills up and the machine either (a) goes on a process killing spree or (b) just falls over dead. What's the solution? Replace the stat() call with lstat(), because lstat() doesn't follow symlinks. This gives you back an amazing number of cpu cycles, which you can then waste on programs that actually do something useful for your customers.