Khmer is an Austroasiatic language. Much like the situation between Chinese and Japanese, though the Khmer script resembles Lao or Thai, the Khmer language has a completely different structure. Here is a sample phrase in Khmer:

"neuv m'dohm nih mian miin reu te?" = "Are there any land mines around here?"
Land mines are quite a problem in Cambodia.

The Khmer alphabet is the largest alphabet in the world (Guinness Book of World Records, 1995). It consists of 33 consonants, 23 vowels and 12 independent vowels. The 23 main consonants are separated into 2 groups: the first series (small sound) and second series (big sound). Furthermore, each consonant has two representations, regular and subscript (like upper and lower case, except in most cases the two do not resemble one another at all). Lastly, 10 of the most used consonants have a more “formal" appearance when written in titles, in advertising, on temple walls, etc. Hence, there are 101 distinct symbols.

Each of the 23 vowels has two distinct sounds, one when it is paired with a consonant from the first series and another when it is paired with a consonant from the second series. Sometimes, a vowel takes on a third sound when the pairing is followed by a certain other consonant. Confused yet? Too bad. There is more. There are 6 accents that change all rules and add even more vowel sounds. They appear over top of words and act to either shorten, lengthen or emphasize sounds. The bontop looks a lot like a straightened out apostrophe, shortens a vowel sound, at the same time altering it. The bontop pii, which looks like two straightened apostrophes, (pii=two), changes the sound of a first series consonant to a second series sound. There is another symbol to change the sound of a second series consonant to a first series, but this only applies to a limited number. The other 3 accents occur infrequently.

Khmer words are written without spaces in between. This leads to confusion because many words are compounds of two others. For example, the word for pencil is “black hand". If you start reading a story about little Phanith learning how to read and write in the countryside with his favorite pencil, and you don’t know this, you might wonder if he is a psychotic serial killer and furthermore, question why anyone wrote a story about him.

The Khmer language, unlike others in the area is not tonal. (compare: Thai has 5 tones, Laotian has 6 tones, Vietnamese sounds like it is spoken backwards AND it has 6 tones) This might lead a new learner to the language to the delusional idea that it is a simple language to acquire. The joke, of course, is on that learner, or in the case of my illustration, me.

Reading the above description of the division of the alphabet and the schizophrenic nature of its vowels, one can quickly calculate there is a vast array of vowel sounds. To the untrained (in this case English speaking) ear, many of these seem virtually indistinguishable. This can lead to problems.

The Khmer word for thief/robber is jao.
There is often occasion to use this word in Phnom Penh, the capital.

The Khmer word for grandchild is jao!.
I do not have any of these. I am only 27. .

NB. The exclamation point is my attempt to denote the slight difference in pronunciation.

Picture a young foreign woman, hours of language training under her belt, in a crowd of Khmers at a football game, waving her arms in the air, shouting:

My grandchild has taken my wallet!!

Can you hear the laughter? I still can.

You can have a look at the script at:

Khmer, also known as Cambodian, is the official language of Kampuchea. Mutually intelligible dialects are also spoken in northeastern Thailand and the Mekong Delta region of Vietnam.

The Khmer script, called a'saa kmae (Khmer letters), is descended from the Brahmi script of South India, as are Thai, Myanmar, Old Mon and others. There is a great similarity between the earliest Khmer inscriptions and the Pallawa script of the Coromandel coast of India. There are two basic styles of the script: a'saa criang (slanted script) and a'saa muul (round script), but there is no structural difference between them.

U+17D2    Khmer sign coeng   plays the conjunct formation role of the Indic virama, killing the vowel of its preceding consonant, and indicating that the following consonant should be treated as a subscript. Sign coeng should not be confused with U+17D1    Khmer sign viriam   which has a similar name to virama but an unrelated function -- indicating that the base character is part of the previous word.

The Khmer consonants are organized into two series or registers, whose inherent vowels are nominally a and o. Two shifter signs convert a consonant from one series to another. The dependent vowel signs then do not have a single phonetic value, but rather are interpreted in the context of the consonant to which they are attached.

The marks U+17C9    Khmer sign muusikatoan   and U+17CA    Khmer sign triisap   are used to shift the base consonant between registers. In the presence of other superscript glyphs, both of these signs may be rendered via the same glyph shape as U+17BB    Khmer vowel sign u. Selection of the proper rendering form is left to the display software.

Khmer does not use any whitespace between words.

Unicode's Khmer code block reserves the 128 code points from U+1780 to U+17FF, of which 114 are currently assigned.

Tagbanwa <-- Khmer --> Mongolian

Number of characters added in each version of the Unicode standard :
Unicode 3.0 : 103
Unicode 4.0 : 11

Number of characters in each General Category :

Letter, Modifier         Lm :  1
Letter, Other            Lo : 53
Mark, Non-Spacing        Mn : 20
Mark, Spacing Combining  Mc : 11
Number, Decimal Digit    Nd : 10
Number, Other            No : 10
Punctuation, Other       Po :  6
Symbol, Currency         Sc :  1
Other, Format            Cf :  2

Number of characters in each Bidirectional Category :

Left To Right                 L : 83
European Number Terminator   ET :  1
Non Spacing Mark            NSM : 20
Other Neutral                ON : 10

The columns below should be interpreted as :

  1. The Unicode code for the character
  2. The character in question
  3. The Unicode name for the character
  4. The Unicode General Category for the character
  5. The Unicode Bidirectional Category for the character
  6. The Unicode version when this character was added

If the characters below show up poorly, or not at all, see Unicode Support for possible solutions.




U+1780   ក   Khmer letter ka Lo L 3.0
U+1781   ខ   Khmer letter kha Lo L 3.0
U+1782   គ   Khmer letter ko Lo L 3.0
U+1783   ឃ   Khmer letter kho Lo L 3.0
U+1784   ង   Khmer letter ngo Lo L 3.0
U+1785   ច   Khmer letter ca Lo L 3.0
U+1786   ឆ   Khmer letter cha Lo L 3.0
U+1787   ជ   Khmer letter co Lo L 3.0
U+1788   ឈ   Khmer letter cho Lo L 3.0
U+1789   ញ   Khmer letter nyo Lo L 3.0
U+178A   ដ   Khmer letter da Lo L 3.0
U+178B   ឋ   Khmer letter ttha Lo L 3.0
U+178C   ឌ   Khmer letter do Lo L 3.0
U+178D   ឍ   Khmer letter ttho Lo L 3.0
U+178E   ណ   Khmer letter nno Lo L 3.0
* as this character belongs to the first register, its correct transliteration is nna, not nno
U+178F   ត   Khmer letter ta Lo L 3.0
U+1790   ថ   Khmer letter tha Lo L 3.0
U+1791   ទ   Khmer letter to Lo L 3.0
U+1792   ធ   Khmer letter tho Lo L 3.0
U+1793   ន   Khmer letter no Lo L 3.0
U+1794   ប   Khmer letter ba Lo L 3.0
U+1795   ផ   Khmer letter pha Lo L 3.0
U+1796   ព   Khmer letter po Lo L 3.0
U+1797   ភ   Khmer letter pho Lo L 3.0
U+1798   ម   Khmer letter mo Lo L 3.0
U+1799   យ   Khmer letter yo Lo L 3.0
U+179A   រ   Khmer letter ro Lo L 3.0
U+179B   ល   Khmer letter lo Lo L 3.0
U+179C   វ   Khmer letter vo Lo L 3.0
U+179D   ឝ   Khmer letter sha Lo L 3.0
* used only for Pali/Sanskrit transliteration
U+179E   ឞ   Khmer letter sso Lo L 3.0
* used only for Pali/Sanskrit transliteration
* as this character belongs to the first register, its correct transliteration is ssa, not sso
U+179F   ស   Khmer letter sa Lo L 3.0
U+17A0   ហ   Khmer letter ha Lo L 3.0
U+17A1   ឡ   Khmer letter la Lo L 3.0
U+17A2   អ   Khmer letter qa Lo L 3.0
* glottal stop

     Independent vowel (deprecated)

U+17A3   ឣ   Khmer independent vowel qaq Lo L 3.0
* originally intended only for Pali/Sanskrit transliteration
* use of this character is strongly discouraged; 17A2 should be used instead

     Independent vowels

U+17A4   ឤ   Khmer independent vowel qaa Lo L 3.0
* used only for Pali/Sanskrit transliteration
* use of this character is discouraged; the sequence 17A2 17B6 should be used instead
U+17A5   ឥ   Khmer independent vowel qi Lo L 3.0
U+17A6   ឦ   Khmer independent vowel qii Lo L 3.0
U+17A7   ឧ   Khmer independent vowel qu Lo L 3.0
U+17A8   ឨ   Khmer independent vowel quk Lo L 3.0
* obsolete ligature for the sequence 17A7 1780
* use of the sequence is now preferred
U+17A9   ឩ   Khmer independent vowel quu Lo L 3.0
U+17AA   ឪ   Khmer independent vowel quuv Lo L 3.0
U+17AB   ឫ   Khmer independent vowel ry Lo L 3.0
U+17AC   ឬ   Khmer independent vowel ryy Lo L 3.0
U+17AD   ឭ   Khmer independent vowel ly Lo L 3.0
U+17AE   ឮ   Khmer independent vowel lyy Lo L 3.0
U+17AF   ឯ   Khmer independent vowel qe Lo L 3.0
U+17B0   ឰ   Khmer independent vowel qai Lo L 3.0
U+17B1   ឱ   Khmer independent vowel qoo type one Lo L 3.0
U+17B2   ឲ   Khmer independent vowel qoo type two Lo L 3.0
* this is a variant for 17B1, used in only two words
* 17B1 is the normal variant of this vowel
U+17B3   ឳ   Khmer independent vowel qau Lo L 3.0

     Inherent vowels
These are for phonetic transcription to distinguish Indic language inherent vowels from Khmer inherent vowels. These characters are included solely for compatibility with particular applications; their use in other contexts is discouraged.

U+17B4   ឴   Khmer vowel inherent aq Cf L 3.0
U+17B5   ឵   Khmer vowel inherent aa Cf L 3.0

     Dependent vowel signs

U+17B6   ា   Khmer vowel sign aa Mc L 3.0
U+17B7   ិ   Khmer vowel sign i Mn NSM 3.0
U+17B8   ី   Khmer vowel sign ii Mn NSM 3.0
U+17B9   ឹ   Khmer vowel sign y Mn NSM 3.0
U+17BA   ឺ   Khmer vowel sign yy Mn NSM 3.0
U+17BB   ុ   Khmer vowel sign u Mn NSM 3.0
U+17BC   ូ   Khmer vowel sign uu Mn NSM 3.0
U+17BD   ួ   Khmer vowel sign ua Mn NSM 3.0

     Two-part dependent vowel signs
These two-part dependent vowel signs have glyph pieces which stand on both sides of the consonant. These vowel signs follow the consonant in logical order, and should be handled as a unit for processing.

U+17BE   ើ   Khmer vowel sign oe Mc L 3.0
U+17BF   ឿ   Khmer vowel sign ya Mc L 3.0
U+17C0   ៀ   Khmer vowel sign ie Mc L 3.0

     Dependent vowel signs

U+17C1   េ   Khmer vowel sign e Mc L 3.0
U+17C2   ែ   Khmer vowel sign ae Mc L 3.0
U+17C3   ៃ   Khmer vowel sign ai Mc L 3.0

     Two-part dependent vowel signs
These two-part dependent vowel signs have glyph pieces which stand on both sides of the consonant. These vowel signs follow the consonant in logical order, and should be handled as a unit for processing.

U+17C4   ោ   Khmer vowel sign oo Mc L 3.0
U+17C5   ៅ   Khmer vowel sign au Mc L 3.0

     Various signs

U+17C6   ំ   Khmer sign nikahit Mn NSM 3.0
aka srak am
aka anusvara
* final nasalization
* this character is usually regarded as a vowel sign am, along with om and aam
ref U+0E4D   ํ   Thai character nikhahit (Thai)
ref U+1036   ံ   Myanmar sign anusvara (Myanmar)
U+17C7   ះ   Khmer sign reahmuk Mc L 3.0
aka srak ah
aka visarga
ref U+1038   း   Myanmar sign visarga (Myanmar)
U+17C8   ៈ   Khmer sign yuukaleapintu Mc L 3.0
* inserts a short inherent vowel with abrupt glottal stop
* the preferred transliteration is yukaleakpintu

     Consonant shifters
These signs shift the base consonant between registers.

U+17C9   ៉   Khmer sign muusikatoan Mn NSM 3.0
* changes the second register to the first
* the preferred transliteration is muusekatoan
U+17CA   ៊   Khmer sign triisap Mn NSM 3.0
* changes the first register to the second
* the preferred transliteration is treisap

     Various signs

U+17CB   ់   Khmer sign bantoc Mn NSM 3.0
* shortens the vowel sound in the previous orthographic syllable
* the preferred transliteration is bantak
U+17CC   ៌   Khmer sign robat Mn NSM 3.0
* a diacritic historically corresponding to the repha form of ra in Devanagari
U+17CD   ៍   Khmer sign toandakhiat Mn NSM 3.0
* indicates that the base character is not pronounced
U+17CE   ៎   Khmer sign kakabat Mn NSM 3.0
* sign used with some exclamations
U+17CF   ៏   Khmer sign ahsda Mn NSM 3.0
* denotes stressed intonation in some single-consonant words
U+17D0   ័   Khmer sign samyok sannya Mn NSM 3.0
* denotes deviation from the general rules of pronunciation, mostly used in loan words from Pali/Sanskrit, French, and so on
U+17D1   ៑   Khmer sign viriam Mn NSM 3.0
* mostly obsolete, a "killer"
* indicates that the base character is the final consonant of a word without its inherent vowel sound
U+17D2   ្   Khmer sign coeng Mn NSM 3.0
* functions to indicate that the following Khmer letter is to be rendered subscripted
* shape shown is arbitrary and is not visibly rendered

     Lunar date sign (deprecated)

U+17D3   ៓   Khmer sign bathamasat Mn NSM 3.0
* originally intended as part of lunar date symbols
* use of this character is strongly discouraged in favor of the complete set of lunar date symbols
ref U+19E0   ᧠   Khmer symbol pathamasat (Khmer Symbols)

     Various signs

U+17D4   ។   Khmer sign khan Po L 3.0
* functions as a full stop, period
ref U+0E2F   ฯ   Thai character paiyannoi (Thai)
ref U+104A   ၊   Myanmar sign little section (Myanmar)
U+17D5   ៕   Khmer sign bariyoosan Po L 3.0
* indicates the end of a section or a text
ref U+0E5A   ๚   Thai character angkhankhu (Thai)
ref U+104B   ။   Myanmar sign section (Myanmar)
U+17D6   ៖   Khmer sign camnuc pii kuuh Po L 3.0
* functions as colon
* the preferred transliteration is camnoc pii kuuh
ref U+00F7   ÷   division sign (Latin-1 Supplement)
ref U+0F14   ༔   Tibetan mark gter tsheg (Tibetan)
U+17D7   ៗ   Khmer sign lek too Lm L 3.0
* repetition sign
ref U+0E46   ๆ   Thai character maiyamok (Thai)
U+17D8   ៘   Khmer sign beyyal Po L 3.0
* et cetera
* use of this character is discouraged; other abbreviations for et cetera also exist
* preferred spelling: 17D4 179B 17D4
U+17D9   ៙   Khmer sign phnaek muan Po L 3.0
* indicates the beginning of a book or a treatise
* the preferred transliteration is phnek moan
ref U+0E4F   ๏   Thai character fongman (Thai)
U+17DA   ៚   Khmer sign koomuut Po L 3.0
* indicates the end of a book or treatise
* this forms a pair with 17D9
* the preferred transliteration is koomoot
ref U+0E5B   ๛   Thai character khomut (Thai)

     Currency symbol

U+17DB   ៛   Khmer currency symbol riel Sc ET 3.0
* Riel in Cambodia

     Various signs

U+17DC   ៜ   Khmer sign avakrahasanya Lo L 3.0
* rare, shows an omitted Sanskrit vowel, like an apostrophe
* the preferred transliteration is avakraha sannya
ref U+093D   ऽ   Devanagari sign avagraha (Devanagari)
U+17DD   ៝   Khmer sign atthacan Mn NSM 4.0
* mostly obsolete
* indicates that the base character is the final consonant of a word with its inherent vowel sound
ref U+17D1   ៑   Khmer sign viriam (Khmer)


U+17E0   ០   Khmer digit zero Nd L 3.0
U+17E1   ១   Khmer digit one Nd L 3.0
U+17E2   ២   Khmer digit two Nd L 3.0
U+17E3   ៣   Khmer digit three Nd L 3.0
U+17E4   ៤   Khmer digit four Nd L 3.0
U+17E5   ៥   Khmer digit five Nd L 3.0
U+17E6   ៦   Khmer digit six Nd L 3.0
U+17E7   ៧   Khmer digit seven Nd L 3.0
U+17E8   ៨   Khmer digit eight Nd L 3.0
U+17E9   ៩   Khmer digit nine Nd L 3.0

     Numeric symbols for divination lore
These characters have numeric values 0-9, respectively, but are not used for calculation.

U+17F0   ៰   Khmer symbol lek attak son No ON 4.0
U+17F1   ៱   Khmer symbol lek attak muoy No ON 4.0
U+17F2   ៲   Khmer symbol lek attak pii No ON 4.0
U+17F3   ៳   Khmer symbol lek attak bei No ON 4.0
U+17F4   ៴   Khmer symbol lek attak buon No ON 4.0
U+17F5   ៵   Khmer symbol lek attak pram No ON 4.0
U+17F6   ៶   Khmer symbol lek attak pram muoy No ON 4.0
U+17F7   ៷   Khmer symbol lek attak pram pii No ON 4.0
U+17F8   ៸   Khmer symbol lek attak pram bei No ON 4.0
U+17F9   ៹   Khmer symbol lek attak pram buon No ON 4.0
