A screwed-up
Cyrillic character set.
KOI stands for "Kod dla Obmena i obrabotki Informacii" (Code for Exchange and processing of Information), while the
8 represents a 8-bit character set. It was defined by the
GOST in 1974, and caught on since, especially on
UNIX systems, thus happening to be the "primary" Cyrillic encoding of the early
Internet.
Structure
In this character set, the Cyrillic characters are placed somewhere between (127-255), in such a fashion that masking off the highest bit of each byte will result in a rough
phonetic equivalent in Latin letters. Of course, to make this hack possible, the letters don't follow their natural Cyrillic alphabet order.
Here's the lowercase order of this character set, brought to you in glorious Unicode:
юабцдефгхийклмнопярстужвьюзшэщчь
(Hint: Switch to UTF-8 character set on stupid browsers.)
In KOI7 (the previous 7-bit revision, where Cyrillic capital letters replaced the Latin small letters, so that even on Latin systems, the Cyrillic would be transliterated), it actually made sense.
Variants
Cyril and Methodius's alphabet went through some changes wherever it was adopted, usually adding some unique characters not present in other alphabets. KOI8 variants were developed for that reason:
KOI8-R (for
Russian),
KOI8-U (for
Ukrainian) and
KOI8-E (for
Ukrainian,
Byelorussian,
Serbian and
Macedonian letters).
Other Cyrillic character sets
ISO-8859-5,
CP-1251 (a.k.a.
Windows-1251) and, of course,
Unicode.
References
- http://czyborra.com/charsets/cyrillic.html