Proposal: A file system for Live CDs
CDs or DVDs containing a full Linux system for installation, testing, repair or other special purposes are quite common these days. Chances are high that people make their first steps with the Linux, BSD or Solaris operating system using these so-called Live CDs: They are convenient (no need to install the OS), they are safe (doesn’t write anything to disk unless you really want it to) … but they are slow. Booting from a Live CD like Knoppix or the Ubuntu Desktop CD takes ages and makes you wonder if your CD/DVD drive will actually survive that whole operation, considering that it is permanently seeking. And even if you made it to the desktop, you’ll still have to be patient if you intend to open any application, because the drive has to spin up again and load libraries and data for whatever program you start. Or even worse: In the modern GUI-based environments you have to wait for icons to load even if you just click on a launcher menu. As useful as those Live CDs might be, this a major source of annoyance.
In this post, I will present a method to solve this problem. I do not claim to be the first one to invent it – in fact, I refuse to believe that no one had this idea before me.
The basic idea: Load it into RAM!
We’re living in times where even the cheapest netbooks are equipped with no less than 1 GiB of RAM. At least as long as we’re talking about CDs and not DVDs here, it’s perfectly feasible just to load the (typically compressed) image of the root filesystem into memory and still have plenty of RAM left for doing actual work. The CD drive would not have to do any time-consuming seeks any longer, it can spin down after booting, and you can use the CD drive for other things (like burning discs).
This option is so obvious that it is already implemented in many Live CDs: Knoppix has it, Fedora apparently has it, and it looks as if it might be available for Ubuntu some day, too.
However, this solution is far from perfect, because it means that you have to wait for the CD to be read before booting continues. This usually takes 2 to 4 minutes during which you can’t do anything but wait.
The wait-for-copying problem can be solved by reading the CD in the background: Start copying the CD into RAM as early as possible during the boot process and continue until the whole CD has been read. If someone tries to read a block from the live filesystem that has already been loaded into RAM, the copy in RAM is loaded instead of fetching the block from the CD again. If a block is requested that is not already in RAM, you’re out of luck, though, as it needs to be loaded from the CD anyway. This means that background preloading alone isn’t enough: You won’t gain much speed during booting because in this phase as the CD drive is mostly busy with fetching blocks from all over the filesystem image. But if you give the system a few minutes of rest after the desktop appears, you won’t be bothered by the CD drive spinning up to load an application ever again. At least we have an improvement.
To my knowledge, background preloading is not implemented in any Live CD. The reason is simple: It just isn’t possible with the built-in tools. The load-to-RAM implementations for Linux Live CDs currently create a large tmpfs and copy the squashfs image there before mounting it. Even if they would copy it in the background, there’s no standard way to use the preloaded data. They could re-mount the root filesystem once the preloading process is finished, but this would invalidate all open file handles and thus confuse quite a few programs. Even if this was possible, you still would not be able to use the parts of the image in RAM until loading is finished.
The only way to overcome these limitations is writing a new filesystem with exactly this kind of usage in mind – think of an improved squashfs, for example.
But if we’re going to reinvent the wheel anyway, we might as well do it thoroughly and improve boot performance, too, right?
Sort the files for extra speed
During booting, the order in which files are read from disk is more or less constant:
/bin/sh and a few other tools from
/bin, a few libraries from
/usr/lib, some files from
/lib/modules … you get the idea. If these files would be stored on the CD in the order in which they are read, chances are high that they are already preloaded when they are requested, thus speeding up the boot process.
Determining which files should be moved to the front of the image can be a tricky task. Anyway, this can be automated a bit: When mastering a Live CD, create a »unordered« image first. Boot it (in a VM, preferrably) and enable some secret option that logs all opened files. Then use this file list to build a new, final root filesystem image. All files in the list will be included in the image in the order of first appearance. All other files are then written to the image in no particular order (just like it is done today).
An image created using this method might still not be optimal, though – after all, it was created by booting only once on a single (maybe virtual) machine. That means it might make sense to manually alter the file list to include e.g. other drivers than just those that were used on the test machine. It may also be useful to include some additional actions in the logged test run after the machine booted – stuff that a normal user is likely to do right after booting, like browsing through menus, adjusting WiFi settings, opening a web browser or launching OpenOffice.
Don’t seek while booting
The file reodering trick should improve boot times quite a bit already, but I think it can still be made better. By sorting the files properly, we made it sure (or at least probable) that the file(s) that are going to be loaded next are already close to the point on the CD we’re reading anyway. However, if the CD drive is just the tiniest bit too slow, it may still be that a new file will be opened that’s not in the cache, forcing a time-consuming seek operation to open the file, another seek back to the original preloading position after the file has been loaded, and finally a third seek to the position after the intervening file after the blocks between the original position and the start of the file have been loaded.
That’s sad. However this time, the solution is really simple: Just don’t allow seeking while booting! If a file is requested that has not been loaded from disk yet, just wait until it has been loaded. This will delay booting a bit, but I’m certain that waiting a second for a few files is still faster than seeking even once a second.
In practice, there should still be a timeout of 2 or 3 seconds after which a seek is forced – users might not be amused if a non-standard driver requires a file that happens to be on the end of the disk and being forced to wait 4 minutes ;) The timeout can be set to a lower value or even zero after booting, e.g. by having the desktop environment auto-start a small script that changes the timeout just after all essential files have been loaded.
A proof-of-concept implementation
I have prepared a proof-of-concept implementation that realizes a lightweight version of the proposed filesystem. It goes by the name »ProFS», which is an abbreviation for »preloading read-only file system«. Written in Python and using FUSE, it’s not very fast and it’s missing some features (for example, it always seeks to requested blocks and doesn’t know about the booting phase) and there are still lots of issues left. The filesystem images are split into an index and data file. The index is a compressed dump of a Python data structure describing all files and directories, and the data file is a squashfs-like block-by-block compressed image of the data itself. The file system driver simulates slow media by loading not more than two 4k blocks of the data image per second. The implementation is public domain, so do with it what you want:
- profs.tgz (13k)