ext2 filesystem basics (idea) by tribbel

Knowing your filesystem is an essential part of using a computer. Not only on Linux, but on all operating systems.

There are four different types of files on the ext2 (Linux) filesystem. These are:

You should know that your harddrive is divided up in several "blocks". The size of these blocks depend on how you formatted your harddrive. These blocks can hold data. But what good is data without meaning?

Why such a complicated fs?

"seven and four!". That made no sense to you, did it? If I were to tell you that are the ages of my nice and my nephew, it would make sense. We have to label the data. These labels are called "filenames". The problem is that we also need to store it somewhere, and point to the data. This is where "inodes" (no, these have nothing to do with iMacs) come in. The "I" is for "Information". And the node part... well, you're on Everything, so you can make that out. These Inodes are placed on your filesystem. They store the filename, the permissions, where the data starts on your harddrive, and where it stops. If you read a file, the inode is queried to find out the information of the file. Your harddrive will then "seek" to the appropriate magnetical data on it, and pass n blocks of data to your CPU. The CPU will process that data, and give it back to your browser.

Of course, this is nice, but we don't want the age of my nephew anywhere near your physics thesis. This is a job for directories. Directories store several files, or more directories (which are then called subdirectories). This way you can organize information in a way you can find it again.

Introduction to file attributes

Files are used to store data. This can be textual data, like your thesis, or "binary" data, like the game you play to get your mind off the thesis. Binary data can not be read by humans, but the filesystem doesn't care about that, since it isn't human. Files on the ext2 systems have permissions and attributes. Here's the output of a command to show permissions on a regular text file:

tribbel:~/docs/culture]% ls -li hamlet.all.txt
  98700 -rw-r--r--    1 tribbel  staff      200059 May 30 17:41 hamlet.all.txt

Just so you know, this is the full Hamlet play by William Shakespeare.

The first command shows:

  98700    - The number of the inode which provides information about the 
             file. Usually, this is irrelevant. But we need it to explain 
             hard links later on.

-rw-r--r-- - The permissions on the file. The first position (which is a -
             in this case) is used to indicate a so-called "sticky bit" or
             to indicate that the file is a directory, or a soft link. 
             When the "sticky bit" is set, the file will be saved on the swap partition
             for faster execution. The next three indicate the owner's
             permissions (in this case "tribbel"). The 'r' is read, the
             'w' write, and the - can be used for 'x' which means execute.
             The next three are for the group's permission (staff). Same
             story here. The group can only read (r) this file.
             The last three are for everyone who is not the owner, and
             not in the group staff.

   1       - The number of links to a file or directory. We'll get to this
             later.

tribbel    - The owner of the file.
staff      - The group who "owns" this file. 
200059     - The size of the file in bytes.
May 30 17:41 - The date and time the file was last modified.
hamlet.all.txt - The filename.

Here's one for a directory:

tribbel:~/docs% ls -ldi culture
  98691 drwxr-xr-x    3 tribbel  staff        1024 Sep  6 16:24 culture

As you can see, the first character in the permissions list is a "d". This indicates that it is a directory. Also, note that it is executable. When you change to a directory it is "executed", so if it is not executable you can't change to that directory. Directories take up space, depending on how much subdirectories and files it has. Usually it takes up 1024 bytes (1 kilobyte).

The `3', which indicates the number of hard links, indicates the number of subdirectories for a directory. The directory itself (called `.') and the underlying directory (called `..') are counted too. This directory has one "real" subdirectory.

Hard links

Now for the famous hard link. A hard link is fairly simple. It is basically a duplicate of an inode. Normally when you copy a file it becomes something like this:

 INODE 234 -> DATA-START .... DATA-END
 INODE 235 -> DATA-START .... DATA-END

There are now two inodes, pointing to two different places on the harddrive with the same data. A hardlink works like so:

 INODE 234 -> DATA-START .... DATA-END
 INODE 235 _____/^

The data will be stored only once on the harddrive, but there are now two inodes pointing to it. Simple, eh?

Soft links

Another concept is the soft link, or symbolic link. These are much like hard links, only different (well, duh). This is a visualization of a soft link:

 INODE 234 -> DATA-START .... DATA-END
 INODE 235 -> DATA-START ->234 DATA-END

The two inodes now point to different parts of the hard drive, but the filesystem makes you believe they point to the same place. The softlink can be deleted at will, and the original file will not be altered. When you "edit a softlink" you will actually be editing the original file.

Back to Linux for Monkeys

Linux For Monkeys	self-extracting executable	KB	Stupid things script kiddies do
Beginning Linux Programming	MICROS~1	Steganographic File System	sticky bit
Using your filesystem	ext2	Why Linux sucks	Recording your sound card's output
Beej's Guide to Network Programming	extent	inode	fsck
Fidonet	Linux Programmers Guide