There are couple of things that often causes problems in Cyrillic: the fact that there are many different character sets (well, you can get pretty far with KOI8 and ISO 8859-5...) and then there's the problem of transliteration (romanization).
Character sets
As mentioned, in the Internet it's best to stick with either of the two character sets mentioned. Most browsers seem to support at least KOI8, so that's a good start. Latin-5 (ISO 8859-5) is pretty nice, too. There are some other character sets (Most notable is Windows-1251, aka CP1251 (thanks, AT.)).
However, in many cases it's advisable to move to UTF-8. You get the freedom to use weird European characters too if you need them. Or those Oriental languages. Too bad Klingon and Elvish didn't get into the standard.
In Everything2, Latin-1 character set is used. This means you cannot use Cyrillic as is - unless you use HTML character entities. Here's a tip: get GNU Recode, and use it to translate your text to "html4" character set. Cyrillic HTML character entities probably look like .. (that is, the first two hex digits are 04. In decimal, it's somewhere around 1070-1120???)
Transliteration
Transliteration does not have a widely used fixed system! Personally, when I was studying Russian, all transliterated passages followed, naturally, Finnish orthography. In English-using websites, I've seen English-style orthography being used. Then there are those bastards who don't follow the letter-for-letter thing and end up making a pronouncation-based transliteration.
(In keis juu didn't get wai it's wroong: Wot wud juu sei if ai'd staat wraiting inglish laik this? Juu probabli wudn't laik it! =)
Transliterating foreign names to Cyrillic usually follows the phonetic rules in Russian, but that's more like a necessity... Often I've seen people's names that are in Russian text mentionmed in both transliterated and Latin forms.
There is a standard for Cyrillic-to-Latin transliteration, however. The current international Cyrillic to Latin transliteration standard is ISO 9:1995. Following describes a quick look at it. (This was taken from Jukka Korpela's excellent document "Venäjän translitterointi", http://www.cs.tut.fi/~jkorpela/iso9.htm8). This covers only the modern Russian, though.
- The mostly easy, unambiguous, simple cases
-
а ⇒ a
б ⇒ b
в ⇒ v
г ⇒ g
д ⇒ d
и ⇒ i
к ⇒ k
л ⇒ l
м ⇒ m
н ⇒ n
о ⇒ o
п ⇒ p
р ⇒ r
т ⇒ t
г ⇒ u
ф ⇒ f
- The "h"
-
х ⇒ h
Seen spelled as 'h' or 'kh' (it's a hard 'h' sound). It's h. Trust me.
- The "Yee-yee gang"
-
е ⇒ e
ё ⇒ ë (Unicode U+00EB)
э ⇒ è (Unicode U+00E8)
ю ⇒ û (Unicode U+00FB)
я ⇒ â (Unicode U+00E2)
Most problems are caused by e (should it be "e", "ye" or "je"???)
- The "i" variants
-
й ⇒ j
ы ⇒ y
- The Seven Sibilants
-
с ⇒ s
ш ⇒ š (Unicode U+0161)
щ ⇒ ŝ (Unicode U+105D)
ц ⇒ c
ч ⇒ č (Unicode U+010D)
з ⇒ z
ж ⇒ ž (Unicode U+017E)
Here, a huge pool of really, really nasty stuff. Variations on the spelling of this stuff cannot be counted with just fingers, I think I need toes too.
- Inaudible
-
ь ⇒ ʹ (Unicode U+02B9)
ъ ⇒ ʺ (Unicode U+02BA)
Some transliterations leave these out entirely. (Well, the "hard sign" is pretty rare though...)
A helpful/helpless transliteration example
Но я очень плохо говорю по-русски! = No â očenʹ ploho govorû po-russki!
Hey, them Switzerlanders invented a way to automatically translate Russian to Czech! =)
And, how it would have gone in...
- English, pronouncation-wise:
- Nu ya oochen plookha gavaryu pa-russki! (Ack)
- English, somewhat better:
- No ya ochen ploha gavaryu po-russki!
- Finnish, likewise:
- No ja otshen ploha gavarju po-russki!
(In case you're wondering what that means: "Well, I speak Russian pretty badly!")
This "official" transliteration is pretty painful to use without Unicode-capable text editor, but at least it's never ambiguous, works both ways, and is mostly understandable!