LESS IS MORE…

FILE COMPRESSION IN PLAIN ENGLISH

WHAT IS FILE COMPRESSION?

File compression is the removal of redundant, unnecessary information from a file on your computer in order to reduce the size of the file. For example, consider this sentence:

Quizzical Quinton questioned whether the quaking quarry was the quaint destination of the quest.
The above sentence is 96 characters in length. Notice any patterns in the sentence? All Q's are followed by U's, such as in Quinton and quest. So, the U's are redundant information. The U's do not add any meaning to the sentence, since the reader should be smart enough to realize that all Q's are followed by U's. So lets take the U's out of the sentence:
Qizzical Qinton qestioned whether the qaking qarry was the qaint destination of the qest.
Though it is true that the resulting sentence takes a little longer to read, you could easily mentally place the U's after the Q's. The new sentence is 89 characters in length, 7% shorter than the original. We managed to reduce the size of the sentence without altering or removing from its meaning. What we have just done is compression.

File compression occurs when your computer removes unnecessary information from a file to make it smaller, which is something like removing all the unnecessary U's after the Q's. Of course, your computer doesn't actually remove U's from after Q's when it's compressing something. It uses mathematical formulas and the like to find and remove redundant information at the binary level. The type of redundant information that is removed determines the type of compression. The most common type of compression that currently exists is ZIP compression. The ZIP specification determines what type of redundant information your computer will remove from a regular file to compress it, resulting in a compressed ZIP file.

WHY COMPRESS?

Initially, file compression was a handy way to reduce disk space usage. It was often used to fit over-large files onto floppy disks. File compression still is and can be used for these purposes, but now it has a much more popular use: the Internet. When downloading files off of the Internet, reduced file size means decreased download time. Most things you will download from the Internet are compressed, even web pages. Most of the compressed items that are not web pages are compressed using the ZIP specification. The ZIP format has become so popular that the ability to compress and decompress ZIP files is built into Windows XP.

File compression is also useful for email attachments. The vast majority of personal email accounts are free accounts, and most free email providers put a limit on the size of emails that can be sent and/or received by its users. Compression can reduce the size of a file so it will fit comfortably into a small-sized email. For example, for the newsletter I'm a co-editor of, we email the newsletter to the print shop. Regularly, the email would be quite large. However, we can use file compression to significantly reduce the size of the email we send to the print shop.

LET'S GET TECHNICAL

ZIP Compression

ZIP compression doesn't work by removing the U's after the Q's all the time, but it sure could. What ZIP compression does is look for repeating patterns, make a table of these patterns, and replace the patterns with unique tokens corresponding to the table. If that doesn't mean anything to you, consider the following sentence:

Crazy people are crazy when my crazy pets are biting those crazy people.
As you can see, the words "crazy", "people", and "are" are repeated. So let's assign each of those words a unique "token".
Token | Word
  1   | crazy
  2   | people
  3   | are
Now, we can replace the occurence of every repeating word in the sentence with its unique token:
Crazy 2 3 1 when my pets 1 pets 3 biting those 1 2.
Note that, the first "Crazy" was not replaced. This is because it is capitalized, so it is not the same as "crazy". As you can see, the resulting sentence is much smaller, but it still has the same meaning since you can refer back to the table to find out what the 1's, 2's, and 3's mean. ZIP compression would store this table along with a compressed file so the decompresser would know what each unique symbol the ZIP compression inserted meant.

As you can see, though, storing the table and the compressed sentence would take up more space than storing the original sentence in our case. This is because the original sentence is quite small. If you try to compress small files, they tend to get bigger. You can probably imagine how this would save space in a large file, though.

ZIP compression is a lossless type of compression. This is because no data is lost between compression and decompression. Nonetheless, some types of compression, such as JPEG and GIF *, are not lossless.

* Technically, GIF is lossless, but can only support 256 colors. However, if you use any colors other than these 256, that color data will be lost.

JPEG Compression

JPEG compression is a type of compression that was developed by the Joint Photgraphics Experts Group. JPEG is an extremely popular format for compressing images. JPEG works by analyzing the luminance (amount of visible light leaving a point on a surface) and chrominance (the color information). Seemingly, for a good image, it is more important to see luminance than chrominance. JPEG cuts up the picture into 8 pixel by 8 pixel squares and uses an algorithm which sacrifices chrominance but increases the amount of luminance kept.

After this, JPEG finds the "average" color of the entire picture. It then replaces each pixel of the file with data (called a DCT coefficient) regarding how far the pixel's color is from the average color. This is called a direct cosine transform (DCT). To actually compress the data, JPEG rounds off the smaller DCT coefficient (for example, replacing a coefficient of 0.001 with 0), and reduces precision on large coefficients (replacing 9238889 with 9230000). In the end, this results in a smaller file, though some color data is lost. Since some data is lost, JPEG is a lossy type of compression.

COMPRESSED PROBLEMS

Like all technologies, compression has its drawbacks. One of the biggest ones is compatibility. When your computer compresses (removes unnecessary information from) a file one way and another computer doesn't know the exact details of how your computer compressed the file, this second computer will be unable to read the file. For example, imagine that your computer compresses a file using the RAR compression specification, but your friend's computer only knows how to decompress (put back into regular form) files that were compressed using the ZIP specification, you will not be able to share compressed files with your friend.

Another drawback is file corruption. Sometimes an error occurs when your computer is compressing a file. This results in a compressed file that will not look like the original file when it is decompressed. Sometimes, though much less frequently, an error occurs when your computer is decompressing a file. This will cause an invalid version of the compressed file to be reassembled. In either case, you will experience data loss.

Another concern is speed. It's true that our second sentence above provided just as much information as the first sentence, but it took longer to read. The same is true of compressed files. It takes time to decompress them into a readable format. Also, it took a few seconds to remove the U's after the Q's from the first sentence to create the second sentence. In a similar way, it can take a while to compress a file. As a general rule, larger files take longer to compress and longer to decompress than smaller files.

Lastly, non-computer-oriented individuals may attempt to compress JPG, GIF, or PNG picture files, or WMV, AVI, or MP3 multimedia files. These types of files, like most types of picture and multimedia files, are already compressed according to a separate specification (such as MP3). Compressing these files usually has no effect on their size. Sometimes, compressing one of these files may actually increase its size. The same is true for very small files.

COMPRESSING FILES IN WINDOWS XP

The ability to compress and decompress ZIP files is built into Windows XP. If you ever come across a ZIP file, Windows XP will automatically open and decompress it for you. If, however, you need to compress a file, follow these instructions:

  1. Highlight the file(s) and/or folder(s) you want to compress.
  2. Right-click one of the higlighted files or folders.
  3. Select: Send To >> Compressed Folder.
This will cause a compressed version of the file(s) and/or folder(s) you selected to be created.


Further reading: HTTP Compression