A screwed-up Cyrillic character set. KOI stands for "Kod dla Obmena i obrabotki Informacii" (Code for Exchange and processing of Information), while the 8 represents a 8-bit character set. It was defined by the GOST in 1974, and caught on since, especially on UNIX systems, thus happening to be the "primary" Cyrillic encoding of the early Internet.

Structure

In this character set, the Cyrillic characters are placed somewhere between (127-255), in such a fashion that masking off the highest bit of each byte will result in a rough phonetic equivalent in Latin letters. Of course, to make this hack possible, the letters don't follow their natural Cyrillic alphabet order.

Here's the lowercase order of this character set, brought to you in glorious Unicode:

юабцдефгхийклмнопярстужвьюзшэщчь

(Hint: Switch to UTF-8 character set on stupid browsers.)

In KOI7 (the previous 7-bit revision, where Cyrillic capital letters replaced the Latin small letters, so that even on Latin systems, the Cyrillic would be transliterated), it actually made sense.

Variants

Cyril and Methodius's alphabet went through some changes wherever it was adopted, usually adding some unique characters not present in other alphabets. KOI8 variants were developed for that reason: KOI8-R (for Russian), KOI8-U (for Ukrainian) and KOI8-E (for Ukrainian, Byelorussian, Serbian and Macedonian letters).

Other Cyrillic character sets

ISO-8859-5, CP-1251 (a.k.a. Windows-1251) and, of course, Unicode.

References

  • http://czyborra.com/charsets/cyrillic.html

Log in or register to write something here or to contact authors.