A quick and dirty, unsophisticated method of data compression wherein runs of identical entities are stored as a count followed by the value they share. "gggggggg" could be stored as "8g"; "foofoofoofoo" could be stored as something like "4foo". Obviously, you'll need to preserve a distinction between the count and the "real" data. That becomes trivial when compressing bitmaps which use indexed color; if there are only 256 colors, then each pixel is represented by a one-byte index into a color table. You'll often get runs of multiple pixels having the same index (this is more likely if there are fewer colors, naturally). You can then store each run of like pixels as a one-byte count followed by a one-byte index. Windows adds a few refinements1 to this scheme and sometimes supports it in their own little bitmap format, a.k.a. BMP. Usually run-length encoded BMP files are identified with a file extension of .rle, but they're not common. Their deal only happens with color depths of eight bits or less; when you think about it, runs of like pixels in a 24-bit image are unlikely (and if there are any, the whole thing will probably fit into eight bits anyway).

This method is not very useful for compressing text unless the author fell asleep with his forehead on the keyboard (see Slashdot).

The above represents the sum total of my firm knowledge of data compression. I Am Not An Algorithm-Monger. All corrections are welcome.



1 .rle refinements: One of them goes like this: If a count byte is zero, then it will be followed by another count byte, which represents a number of unadorned single-byte color indices to follow. Each of those indices will appear without a count and will represent a single pixel. This is necessary because there will often be areas in a bitmap, even with only 256 colors, where each pixel will differ from the next. If each of the pixels in such an area were represented as a count of one followed by an index, your worst-case compression ratio for the whole file would be 1:2, which is not so good because it's actually expansion rather than compression.

Another improvement is to encode the image as the first pixel, and then the difference between each pixel and the last. This works best if the pixels are arranged lightest to darkest, or otherwise organized. Runs of a single color become runs of 0s. Some gradients become runs of 1s or 2s. So, more things can be represented by runs. The disadvantage is that it takes another bit per pixel to encode differences; depending on the image, this technique may or may not be more efficient.

Log in or registerto write something here or to contact authors.