Here is a quick guide to
Unicode characters used with some non-Western-European languages. It is organized by language.
For Western languages, see HTML symbol reference. They have HTML entity codes beginning with ampersand and ending with semicolon, around a name, for example é . Most of these should also be creatable on your keyboard using a combination with Alt, Ctrl, or Option keys: see Special Alt key characters & accents. The Western European character set covers English, French, Spanish, Italian, Portuguese, German, Danish, Swedish, Norwegian, Finnish, and in theory Icelandic though in practice the letters thorn and edh often come out wrong. Blame your browser. Greek letters can also be represented by HTML entities such as α .
For brevity I am not repeating those letters that are found in the Western set, with acute, grave, circumflex, umlaut, and so on. See Accent marks used with the Latin alphabet for a list of Western and Eastern accented letters arranged by accent.
In general, do not use accented letters in node titles or in hard links. Even if you think they're better that way. They're not. What's better is if other noders can find them. The E2 Search facility is limited in what it can find: it cannot find ü if you search for u, nor vice versa. Acutes and graves are okay, but umlauts won't work. It is better to leave other accents off. E2 is written in English, not Hungarian, and in English we usually leave all accents off. Please do not put in title edit requests asking for them to be added. If you want the accents to appear in your text, pipelink them, e.g. [Lowenbrau|Löwenbräu]. See E2 FAQ: Using Special HTML Characters for more detail on this.
Never use HTML entities or Unicode in names in node titles. Don't be pedantic about names. Pedantry is bad. Usefulness is good.
In the following tables capital letters come before lowercase. If you can't see them properly, this won't be of use to you. That's a limitation of your browser. A lot of browsers won't be able to show them, and they'll just appear as rectangles or question marks. And I use proper human numbers, not hexadecimal, which means there's no "x" in the code, just &#nnn;.
Large scripts like Chinese and Devanagari are beyond the scope of this write-up, as are extras like the vowel pointing of Hebrew and Arabic. Go to www.unicode.org/charts for all the rest, like Mongolian, Tamil, Ogham, -- the lot.
No non-Western letters. Has Ç and Ë.
ا ا alif
ب ب ba
ة ة ta marbuta
ت ت ta
ث ث tha
ج ج jim
ح ح ha emphatic
خ خ kha
د د dal
ذ ذ dhal
ر ر ra
ز ز za
س س sin
ش ش shin
ص ص sad
ض ض dad
ط ط ta emphatic
ظ ظ za emphatic
ع ع ain
غ غ ghain
a gap in numbers
ف ف fa
ق ق qaf
ك ك kaf
ل ل lam
م م mim
ن ن nun
ه ه ha
و و waw
ى ى ya undotted
ي ي ya dotted
Letters with hamza:
ء ء no bearer
أ أ alif hamza above
ؤ ؤ waw hamza
إ إ alif hamza below
ئ ئ ya hamza
Other diacritics:
آ آ alif maddah
ً ً fathah with nunation
ٌ ٌ dammah with nunation
ٍ ٍ kasrah with nunation
َ َ fathah
ُ ُ dammah
ِ ِ kasrah
ّ ّ shaddah
ْ ْ sukun
Numerals:
٠ ٠ 0
١ ١ 1
٢ ٢ 2
٣ ٣ 3
٤ ٤ 4
٥ ٥ 5
٦ ٦ 6
٧ ٧ 7
٨ ٨ 8
٩ ٩ 9
Arabic transliteration
Ā Ā ā ā A-macron
Ḍ Ḍ ḍ ḍ D-dot-below
Ḥ Ḥ ḥ ḥ H-dot-below
Ī Ī ī ī I-macron
Ṣ Ṣ ṣ ṣ S-dot-below
Ṭ Ṭ ṭ ṭ T-dot-below
Ū Ū ū ū U-macron
Ə Ə ə ə schwa
Ğ Ğ ğ ğ G-breve (yumuşak-G)
İ İ I dotted capital
ı ı I undotted lowercase
Ş Ş ş ş S-cedilla
Also uses Ç, Ö, Ü. Formerly used Ä for Ə and this is still used when symbol Ə is unavailable.
Belarusian uses (part of) the
Cyrillic alphabet (see under
Russian below) with the following additional letters:
Ґ Ґ ґ ґ G-hook
І І і і I
Ў Ў ў ў U-breve
Bulgarian uses (part of) the
Cyrillic alphabet (see under
Russian below) but with no additional letters.
Ŀ Ŀ ŀ ŀ L-mid-dot
Has a new Roman alphabet which however has numerous letters not yet representable in Unicode.
Ć Ć ć ć C-acute
Č Č č č C-hacek
Đ Đ đ đ D-bar
Š Š š š S-hacek
Ž Ž ž ž Z-hacek
Č Č č č C-hacek
Ď Ď ď ď D-hook
Ě Ě ě ě E-hacek
Ň Ň ň ň N-hacek
Ř Ř ř ř R-hacek
Š Š š š S-hacek
Ť Ť ť ť T-hook
Ů Ů ů ů U-circle
Ž Ž ž ž Z-hacek
Also uses Á, É, Í, Ó, Ú, Ý.
Ĉ Ĉ ĉ ĉ C-circumflex
Ĝ Ĝ ĝ ĝ G-circumflex
Ĥ Ĥ ĥ ĥ H-circumflex
Ĵ Ĵ ĵ ĵ J-circumflex
Ŝ Ŝ ŝ ŝ S-circumflex
Ŭ Ŭ ŭ ŭ U-breve
No non-Western letters. Has Õ, Ö, Ü.
ʻ ʻ 'okina
Ā Ā ā ā A-macron
Ē Ē ē ē E-macron
Ī Ī ī ī I-macron
Ō Ō ō ō O-macron
Ū Ū ū ū U-macron
(These letter names are
Biblical Hebrew because I know more about that.)
א א aleph
ב ב beth
ג ג gimel
ד ד daleth
ה ה he
ו ו waw
ז ז zayin
ח ח heth
ט ט teth
י י yod
ך ך kaph final
כ כ kaph
ל ל lamedh
ם ם mem final
מ מ mem
ן ן nun final
נ נ nun
ס ס samekh
ע ע ayin
ף ף pe final
פ פ pe
ץ ץ sadhe final
צ צ sadhe
ק ק qoph
ר ר resh
ש ש shin/sin
ת ת taw
Ő Ő ő ő O-double-acute
Ű Ű ű ű U-double-acute
Also has Ö, Ü, and Á, É, Í, Ó, Ú.
See the nodes
hiragana and
katakana.
Japanese transliteration
Ā Ā ā ā A-macron
Ē Ē ē ē E-macron
Ō Ō ō ō O-macron
Ū Ū ū ū U-macron
Korean transliteration
In one common
romanization (no longer officially used) of
Hangul these two are used:
Ŏ Ŏ ŏ ŏ O-breve
Ŭ Ŭ ŭ ŭ U-breve
Ā Ā ā ā A-macron
Ă Ă ă ă A-breve
Ē Ē ē ē E-macron
Ĕ Ĕ ĕ ĕ E-breve
Ī Ī ī ī I-macron
Ĭ Ĭ ĭ ĭ I-breve
Ō Ō ō ō O-macron
Ŏ Ŏ ŏ ŏ O-breve
Ū Ū ū ū U-macron
Ŭ Ŭ ŭ ŭ U-breve
Ā Ā ā ā A-macron
Č Č č č C-hacek
Ē Ē ē ē E-macron
Ģ Ģ ģ ģ G-cedilla
Ī Ī ī ī I-macron
Ķ Ķ ķ ķ K-cedilla
Ļ Ļ ļ ļ L-cedilla
Ņ Ņ ņ ņ N-cedilla
Ō Ō ō ō O-macron
Ŗ Ŗ ŗ ŗ R-cedilla
Š Š š š S-hacek
Ū Ū ū ū U-macron
Ž Ž ž ž Z-hacek
Ą Ą ą ą A-ogonek
Č Č č č C-hacek
Ę Ę ę ę E-ogonek
Ė Ė ė ė E-dot-above
Į Į į į I-ogonek
Š Š š š S-hacek
Ū Ū ū ū U-macron
Ų Ų ų ų U-ogonek
Ž Ž ž ž Z-hacek
Macedonian uses (part of) the
Cyrillic alphabet (see under
Russian below) with the following additional letters:
Ѓ Ѓ ѓ ѓ GJ (G-acute)
Ѕ Ѕ ѕ ѕ DZ
Ј Ј ј ј J
Љ Љ љ љ LJ
Њ Њ њ њ NJ
Ќ Ќ ќ ќ KJ (K-acute)
Џ Џ џ џ DZ-hacek
Ċ Ċ ċ ċ C-dot-above
Ġ Ġ ġ ġ G-dot-above
Ħ Ħ ħ ħ H-bar
Ż Ż ż ż Z-dot-above
Ā Ā ā ā A-macron
Ē Ē ē ē E-macron
Ī Ī ī ī I-macron
Ō Ō ō ō O-macron
Ū Ū ū ū U-macron
The following are additions to the
Arabic alphabet used in Persian.
پ پ p
چ چ ch
ژ ژ zh
گ گ g
Ą Ą ą ą A-ogonek
Ć Ć ć ć C-acute
Ę Ę ę ę E-ogonek
Ł Ł ł ł L-slash
Ń Ń ń ń N-acute
Ś Ś ś ś S-acute
Ź Ź ź ź Z-acute
Ż Ż ż ż Z-dot-above
Also has Ó.
Ă Ă ă ă A-breve
Ş Ş ş ş S-cedilla
Ţ Ţ ţ ţ T-cedilla
The Romanians actually prefer underposed
commas instead of cedillas, and there are symbols defined for these too, but they are less likely to show up:
Ș Ș ș ș S-comma
Ț Ț ț ț T-comma
Also has Â, Î.
А А а а a
Б Б б б b
В В в в v
Г Г г г g
Д Д д д d
Е Е е е ye
Ё Ё ё ё yo (N.B. out of order!)
Ж Ж ж ж zh
З З з з z
И И и и i
Й Й й й y
К К к к k
Л Л л л l
М М м м m
Н Н н н n
О О о о o
П П п п p
Р Р р р r
С С с с s
Т Т т т t
У У у у u
Ф Ф ф ф f
Х Х х х kh
Ц Ц ц ц ts
Ч Ч ч ч ch
Ш Ш ш ш sh
Щ Щ щ щ shch
Ъ Ъ ъ ъ hard sign
Ы Ы ы ы y
Ь Ь ь ь soft sign
Э Э э э e
Ю Ю ю ю yu
Я Я я я ya
Ā Ā ā ā A-macron
Ḍ Ḍ ḍ ḍ D-dot-below
Ḥ Ḥ ḥ ḥ H-dot-below
Ī Ī ī ī I-macron
Ḷ Ḷ ḷ ḷ L-dot-below
Ṃ Ṃ ṃ ṃ M-dot-below
Ṅ Ṅ ṅ ṅ N-dot-above
Ṇ Ṇ ṇ ṇ N-dot-below
Ṛ Ṛ ṛ ṛ R-dot-below
Ṝ Ṝ ṝ ṝ R-dot-and-macron
Ś Ś ś ś S-acute
Ṣ Ṣ ṣ ṣ S-dot-below
Ṭ Ṭ ṭ ṭ T-dot-below
Ū Ū ū ū U-macron
Also uses Ñ.
Serbian uses (part of) the
Cyrillic alphabet (see under
Russian above) with the following additional letters:
Ђ Ђ ђ ђ D-bar
Ј Ј ј ј J
Љ Љ љ љ LJ
Њ Њ њ њ NJ
Ћ Ћ ћ ћ C-acute
Џ Џ џ џ DZ-hacek
Č Č č č C-hacek
Ď Ď ď ď D-hook
Ĺ Ĺ ĺ ĺ L-acute
Ľ Ľ ľ ľ L-apostrophe
Ň Ň ň ň N-hacek
Ŕ Ŕ ŕ ŕ R-acute
Š Š š š S-hacek
Ť Ť ť ť T-hook
Ž Ž ž ž Z-hacek
Also has Á, É, Í, Ó, Ú, Ý, and also Ô.
Ğ Ğ ğ ğ G-breve (yumuşak-G)
İ İ I dotted capital
ı ı I undotted lowercase
Ş Ş ş ş S-cedilla
Also has Ç, Ö, Ü.
Ň Ň ň ň N-hacek
Ş Ş ş ş S-cedilla
Ž Ž ž ž Z-hacek
Also uses Ä, Ç Ö, Ü, Ý. Originally reported as using currency symbols $, ¢, ¥, but it seems these have now been replaced.
Ukrainian uses the
Cyrillic alphabet (see under
Russian above) with the following additional letters:
Є Є є є curved-E
І І і і I
Ї Ї ї ї I-umlaut
Ґ Ґ ґ ґ G-hook
Ă Ă ă ă A-breve
Đ Đ đ đ D-bar
Ơ Ơ ơ ơ O-hook
Ư Ư ư ư U-hook
These and Â, Ê are letters of the Vietnamese alphabet; there are also numerous other accents for
tone marks, which may be combined with any of the vowels.
Ŵ Ŵ ŵ ŵ W-circumflex
Ŷ Ŷ ŷ ŷ Y-circumflex
Also has Â, Ê, Î, Ô Û, and occasionally some others such as Ï.
Ẹ Ẹ ẹ ẹ E-dot-below
Ọ Ọ ọ ọ O-dot-below
Ṣ Ṣ ṣ ṣ S-dot-below