A symlink is a filesystem feature shared by all filesystems native to Unix variants that I can think of - although, as Mindfoot's writeup reminds us, it wasn't always present in Unix; and if your Unix variant supports non-Unix filesystem types such FAT, you will find that symlinks cannot be used on them.

Like a Windows shortcut, a symlink is a file that contains the name of another file; to be precise, it is a absolute or relative pathname to the file, that describes its location within the directory tree.

Unlike a Windows shortcut, symlinks are supported by the OS kernel: system calls to open a file, state or change its attributes attempt to follow symlinks.

This makes the normal Unix file attributes of symlinks, such as owner and permissions, quite useless for symlinks, as the operations that use and set them use those of the file referenced by the symlink instead. For this reason, you often see symlinks with all permissions set and owned by root.

Most Unix variants now have special system calls, e.g. lstat next to stat, that work on a symlink itself instead of trying to follow it. There is also a readlink function to follow a given symlink.

Command line utilities such as ln, chmod, chown also follow symlinks by default, and have special options to make them operate on the symlinks themselves. ls however does not follow symlinks unless the -L option tells it to.

Programs that operate on files must take special care in dealing with symlinks. Like hard links and filesystem mounting, they allow files to appear in the same directory tree under multiple names; on top of that, symlinks can create recursion. For example, the command


  ln a b  # create a hard link "b" to the file already known as "a"

finds the file to which a refers, and creates an additional directory entry b for it; therefore,


  ln a a
will always fail, whether or not there is a file a. By contrast, the command

  ln -s a b  # create the symlink "b" to refer to whatever "a" refers to
does not involve the file known as a at all: it only checks the syntactic validity of the reference "a", but no attempt is made to actually follow it and see if it refers to an existing file. Therefore, the sequence

  ln -s a b
  ln -s b a
is perfectly valid, and any program that operates on whole directory trees of files must take care to follow symlinks recursively to reach the eventual file they refer to, if any, without getting into infinite loops.

Symlink recursion is an example of a user error that is easy to make with symlinks, while hard links or filesystem mounts do not allow it. Everything that follows symlinks, including all Unix system calls that accept filename arguments, must prevent infinite looping. Some implementations take the lazy way out, by assuming recursion once there too many symlinks have been followed.

A more common example of such a user error is to make symlinks too absolute. For example, suppose I want to make two versions of a file a available. A common solution is to have two files, say, a.1 and a.2, plus a symlink a/lib/libc.so is always a symlink to /lib/libc.so.1 or /lib/libc.so.2 or some other version.

The common user mistake here is to actually make an absolute symlink to /lib/libc.so.1. The correct way to do it is to make a relative symlink, e.g., to libc.so.1 or ./libc.so.1. That way, the files remain relocatable: the whole tree of libraries can be moved, e.g., to /usr/lib, without the symlinks breaking. In general, always make every symlink you make as relative as possible, to make the trees they are in as relocatable as possible.

A related user mistake, primarily committed by software installation scripts, is to follow symlinks prematurely. This also breaks relocatability. If a software setup program asks me to supply a specific location, e.g. a directory where the binaries should be installed, and I supply a symlink, I only have one reason for doing so: I may want to change the location of these files later, by replacing the symlink with a different one, without having to redo this installation procedure. Many installation scripts defy this by following the symlinks and hardcoding the results into the references the installed software makes to itself.

To summarize, symlinks are a useful feature, but they can easily be misused, and to make software robust against the presence of such misused symlinks is a nontrivial amount of work. Among experienced Unix users, many resent symlinks, and some would completely ban them out if they could.

Thanks to ariels for advice.