A screwed-up Cyrillic character set
stands for "Kod dla Obmena i obrabotki Informacii" (Code for Exchange and processing of Information), while the 8
represents a 8-bit character set. It was defined by the GOST
in 1974, and caught on since, especially on UNIX
systems, thus happening to be the "primary" Cyrillic encoding of the early Internet
In this character set, the Cyrillic characters are placed somewhere between (127-255), in such a fashion that masking off the highest bit of each byte will result in a rough phonetic
equivalent in Latin letters. Of course, to make this hack possible, the letters don't follow their natural Cyrillic alphabet order.
Here's the lowercase order of this character set, brought to you in glorious Unicode:
(Hint: Switch to UTF-8 character set on stupid browsers.)
In KOI7 (the previous 7-bit revision, where Cyrillic capital letters replaced the Latin small letters, so that even on Latin systems, the Cyrillic would be transliterated), it actually made sense.
Cyril and Methodius
's alphabet went through some changes wherever it was adopted, usually adding some unique characters not present in other alphabets. KOI8 variants were developed for that reason: KOI8-R
) and KOI8-E
Other Cyrillic character sets
) and, of course, Unicode