16-bit characters

16-bit characters (as in Unicode) is enough to represent any single language. What it isn't enough to do is represent all languages at the same time, enabling you to mix various asian languages in the same document(1). In order to facilitate this, the asians use so called shift codes (e.g, Shift-JIS) - some values in the strings that would normally contain character codes are defined to be an escape. When this escape comes, the next values are read from the string and combined to find the actual character to use. This allows an arbitary number of bits per character, but is a pain to program with.

(1) If I remember correctly, there is an extra constraint, too: People are unwilling to have the same glyph (graphical symbol) encode to the same value when it has different semantic meanings. If we were encoding english to one value per word, that would be the same as wanting a different value for the to in "Go to London" and the to in "To be or not to be."

Shift-JIS	Unicode	wchar_t	JIS
Chisato Madison	hanzi	Futurama	Charles Bukowski

Recommended Reading

About Everything2

User Picks

Editor Picks

New Writeups

Login
Password

16-bit characters

Sign In

Recommended Reading

About Everything2

User Picks

Editor Picks

New Writeups