This document describes a filesystem organizational technique that solves several problems associated with software package management and distribution under a Unix-like operating system. Though the document uses examples from development in a GNU/Linux (hereafter refered to simply as "Linux") environment, it is straightforward to mimic the process on other Unix systems.
The original motivation for the /pkg hierarchy was to find a generic solution for situations such as this:
To install package A, I needed library L version n (L.n), but I only had version m (L.m) installed. So I download and installed L.n, but this overwrote L.m, which broke package B. In order to upgrade package B to work with library L.n, I had to perform a system-wide (distribution) upgrade, which left package C in an ususable state. So I downloaded the source to package C, but when I tried to compile it agains library L.n, it reported the following errors... [etc]
A brief search through the Web or Usenet reveals that this is hardly an uncommon situtation, and that no Linux distribution is entirely immune to this problem of "dependency management".* The approach Linux distributors have generally taken in solving this problem is to find a collection of software packages that more-or-less work together, and then version the collection (i.e. give a version number to the distribution). However, there are problems with this approach: The two most prominent problems being that (1) it is often difficult to integrate new software packages that were not in the original distribution, and (2) third-party library version upgrades can potentially put the entire system into an unstable state.
The /pkg hierarchy has its roots in being a solution to dependency management; however, it turns out to be an adequate solution for several common problems:
While many of these problems have already been solved independently, the advantage of the /pkg hierarchy is that it simultaneously addresses all of these problems in an elegant and comprehensive manner.
The /pkg hierarchy derives its name from the way packages are installed on the system. Every time a package is compiled from source, it is installed in a unique location similar to the following:
/pkg/glibc/2.2.5/.karmaki686/.000
These path elements will be referred to in this document as:
/pkg: The package root.glibc: The package name.2.2.5: The package
version..karmaki686: The package
distribution..000: The package build.It is beneath a path like this that all files related to a given package are confined. The traditional root-level directories are re-created as subdirectories here, giving something like:
/pkg/glibc/2.2.5/.karmaki686/.000/
|-bin/
|-etc/
|-include/
|-lib/
|-var/
Once a package is installed using this technique, symlinks are created to the package subdirectories all the way up the hierarchy. The resulting structure looks like the following:
/pkg/glibc/
|-bin -> 2.2.5/bin/
|-etc -> 2.2.5/etc/
|-lib -> 2.2.5/lib/
|-2.2.5/
|-bin -> .karmaki686/bin/
|-etc-> .karmaki686/etc/
|-lib -> .karmaki686/lib/
|-.karmaki686/
| |-bin -> .002/bin/
| |-etc-> .002/etc/
| |-lib -> .002/lib/
| |-.001/
| |-.002/
|
|-.johndoei386/
|-.000/
|-.001/
/pkg/glibc/: By putting all packages
under /pkg, we get rid of the mess that has become /opt/package,
/usr/package, /home/package, and the process of spreading package
contents all over the filesystem to the point where a custom tool
and a database are required to track it all./pkg/glibc/2.2.5/: Each version of a package
is given it's own directory. This makes compiling and installing
new programs incredibly easy. With minimal effort, it completely
fixes problems with failed dependencies or breaking packages with
an upgrade./pkg/glibc/2.2.5/.karmaki686/: Dotfile directories at
this level represent distributions, and packages from several
distributions
may be intermixed without conflict. Each version directory will
be symlinked to the subdirectories of a particular distribution,
the preference of which is easy to set on both a system-wide and
individual-package basis./pkg/glibc/2.2.5/.karmaki686/.000/: Dotfile directories
at this allow for sequential package builds. By giving each
build its own directory, we guarantee that essential files are never
overwritten. If a faulty build gets inadvertently installed or
distributed, it is trivial to perform a rollback to the working
build.Consider the ldd output from the ping binary:
karmak@ariel$ ldd /bin/ping
libm.so.6 => /pkg/glibc/2.2.5/.karmaki686/lib/libm.so.6 (0x40016000)
libreadline.so.4.1 => /pkg/readline/4.3/.karmaki686/lib/libreadline.so.4.1 (0x40033000)
libresolv.so.2 => /pkg/glibc/2.2.5/.karmaki686/lib/libresolv.so.2 (0x40059000)
libnsl.so.1 => /pkg/glibc/2.2.5/.karmaki686/lib/libnsl.so.1 (0x40068000)
libncurses.so.5 => /pkg/ncurses/5.2/.karmaki686/lib/libncurses.so.5 (0x4007f000)
libc.so.6 => /pkg/glibc/2.2.5/.karmaki686/lib/libc.so.6 (0x400c1000)
/pkg/glibc/2.2.5/.karmaki686/lib/ld.so => /pkg/glibc/2.2.5/.karmaki686/lib/ld.so (0x40000000)
What we see here is that packages in the /pkg hierarchy are
not linked against the standard locations (/lib and
/usr/lib), but instead are linked against the
distribution directories. Thus it is possible to have
different applications linked against different library versions,
even when those libraries share the same name. By taking the linking
as far as the distribution directory, we can support multiple
distributions under the same hierarchy, and cross compilation
becomes simply a matter of a few changes to the standard build scripts.
Furthermore, by not linking against the build directories,
we are free to rebuild a package as many times as necessary, and
freely experiment with cross-distributor package compatibility.
The symlinks may appear to be a point of vulnerability in the system, but this is not the case. As the ldd output shows almost all of the symlinks are there for the user's convenience. The only exceptions are the symlinks to the build directories, which require only a statically linked version of 'ln' or 'sash' to repair. The alternative, overwriting files during an upgrade, is no any less error-prone and much harder to fix when things go wrong.
Because of the highly structured layout, it is easy to write scripts that automate everything from the build procedure to nightly backups. In the long run, this structure is much more efficient than the traditional filesystem hierarchy. Some examples of the efficiency and power:
Michael Carmack
karmak@karmak.org