Standard Unicode Disclaimer:
Not all the Unicode characters represented below or in other writeups may be viewable in your browser. In fact, some of the characters may not be viewable in any browser. This is because Unicode is an evolving and ever-growing standard which has the ability to store/represent literally millions of characters and symbols from hundreds of languages and cultures past and present, and not all software has the ability to display all the characters. If your browser does not understand the Unicode value of the character, it will usually display a small square box or a question mark. This is normal and expected behavior, and does not mean there is a problem with this writeup or your web browser. For additional information see Using Unicode on E2. In addition, you may have luck changing your font in the ekw Preferences to something such as "Arial Unicode MS" which has better Unicode support.
Unicode is an international standard system for displaying letters, numbers, characters, glyphs, ideograms, and other symbols on a computer. Basically Unicode assigns a number for every symbol that has been defined, and the computer uses this number to show you the symbols you are supposed to see. Before Unicode was developed, there were hundreds of different encodings and character sets to deal with the problem of character display. More often than not, these encodings conflicted with one another, so sharing documents with unusual symbols, or in different languages or platforms caused lots of problems.
The Unicode Standard, developed by the Unicode Consortium, fixes all of these problems by being platform and language independent. It has its own ISO standard (versioned), and has been accepted by all major computing giants such as Sun, Microsoft, Oracle, HP, IBM, Xerox, Apple, Adobe Systems, and many others. It is the standard of other computer and internet standards such as XML, ECMAScript, JavaScript, Java, LDAP, CORBA, WML, and again, many others.
While ASCII characters are encoded with 8 bits, Unicode uses 8 (byte), 16 (word), or 32 (double word/dword) bits for encoding. Various standards such as UTF-8, UTF-16, and UTF-32 are based on Unicode and allow the characters to be stored in different formats depending on whether the characters need to be compacted into a small memory space, or quickly accessible.
Using Unicode on the Web
First, read Using Unicode on E2. Then read HTML Symbol Reference (not Unicode-specific, but it may give you want you want without resorting to Unicode).
After this, you need to determine what character(s) you want to use and get its number. There are numerous references online, but you might as well go to the source at http://www.unicode.org/charts. You can also use some of the links below to browse the available characters (and see first-hand what they look like in your browser).
Basically there are two ways to display a Unicode character on a web page. As stated above, Unicode assigns a number to each character. This number is specified in the standards as being hexadecimal, meaning it uses a 16-bit encoding (0-9 and A-F). However, the character can be represented using its hexadecimal or decimal representation.
If you know the decimal (base 10) representation of the number, simply add an ampersand (&) and a hash (#) in front of the number and a semicolon (;) after it. For example, the decimal value of the plus minus symbol is 177, so to display it simply type "±" and you will see "±" show up on your page. The decimal representation seems to be the standard on E2 for node titles, so this is probably the best option.
If you know the hexadecimal (base 16) representation of the number, simply add an ampersand (&), a hash (#), and an x in front of the number and a semicolon after it. For example, the hexadecimal value of the plus minus symbol is B1, so to display it simply type "±" and you will again see "±" show up on your page.
Available Unicode Characters
Unicode Scripts
African Scripts
American Scripts - Canadian Syllabics, Cherokee, Deseret
Ancient Scripts
- Ancient Greek - Ancient Greek Numbers, Ancient Greek Musical Symbols
- Cuneiform - Cuneiform (scheduled for Unicode 5.0), Cuneiform Numbers, Old Persian, Ugaritic
- Linear B - Linear B Syllabary, Linear B Ideograms
- Other Ancient Scripts - Aegean Numbers, Counting Rod Numbers, Cypriot Syllabary, Gothic, Old Italic, Ogham, Runic, Phoenician
Central Asian Scripts - Kharoshthi, Mongolian, Phags-Pa ('Phags Pa), Tibetan
East Asian Scripts
- Han Ideographs - Unified CJK Ideographs, CJK Ideographs Extended-A, CJK Ideographs Extended-B, Compatibility Ideographs, Compatibility Ideographs Supplement, Kanbun
- Radicals and Strokes - CJK Radicals, KangXi Radicals, CJK Strokes, Ideographic Description, Compatibility Ideographs Supplement, Kanbun
- Chinese-specific - Bopomofo
- Japanese-specific - Hiragana, Katakana, Katakana Phonetic Extensions, Halfwidth Katakana
- Korean-specific - Hangul Syllables, Hangul Jamo, Hangul Compatibility Jamo, Halfwidth Jamo
- Yi - Yi, Yi Radicals
European Alphabets
- Armenian - Armenian, Armenian Ligatures
- Coptic - Coptic
- Cyrillic - Cyrillic, Cyrillic Supplement
- Georgian - Georgian, Georgian Supplement
- Greek - Greek, Greek Extended
- Latin - Basic Latin, Latin-1, Latin Extended A, Latin Extended B, Latin Extended C (scheduled for Unicode 5.0), Latin Extended Additional, Latin Ligatures, Fullwidth Latin Letters, Small Forms
Indic Scripts - Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Limbu, Malayalam, Oriya, Sinhala, Syloti Nagri, Tamil, Telugu
Middle Eastern Scripts
Philippine Scripts - Buhid, Hanunoo, Tagalog, Tagbanwa
South East Asian Scripts - Buginese, Balinese, Khmer, Lao, Myanmar, New Tai Lue, Tai Le, Thai
Unicode Other Scripts - Shavian, Osmanya, Glagolitic
Unicode Symbols and Punctuation
Combining Diacritical Marks - Combining Diacritical Marks, Combining Diacritical Marks for Symbols, Combining Diacritical Marks Supplement, Combining Half Marks
Enclosed and Square - Enclosed Alphanumerics, CJK Letters and Months, CJK Compatibility
Mathematical Symbols
- Numbers and Digits - ASCII Digits, Fullwidth ASCII Digits, Number Forms, Superscripts and Subscripts
- Letterlike Symbols - Letterlike Symbols, Math Alphanumeric Symbols
- Arrows and Operators - Arrows, Mathematical Operators, Supplemental Mathematical Operators, Miscellaneous Mathematical Operators A, Miscellaneous Mathematical Operators B, Supplemental Arrows A, Supplemental Arrows B, Miscellaneous Symbols and Arrows
- Geometrical Symbols - Geometrical Shapes, Box Drawing, Block Elements
- Technical Symbols - Control Pictures, Miscellaneous Technical, OCR
Phonetic Symbols - IPA Extensions, Phonetic Extensions, Phonetic Extensions Supplement, Modifier Tone Letters, Spacing Modifier Letters
Private Use - Private Use Area, Supplemental Private User Area A, Supplemental Private User Area B
Unicode Punctuation
Specials - Controls C0 and C1, Layout Controls, Invisible Operators, Specials, Tags, Variation Selectors, Variation Selectors Supplement
Surrogates - High Surrogates, High Private Use Surrogates, Low Surrogates
Symbols
- Miscellaneous Symbols - Dingbats, Miscellaneous Symbols, Tai Xuan Jing Symbols, Yijing Hexagrams, Braille Patters
- Musical Notation - Ancient Greek Musical Notation, Byzantine Musical Symbols, Western Musical Symbols
- Currency Symbols - Dollar Sign, Yen, Pound, Cent, Mark, Pfennig, Rial, Currency Symbols, Fullwidth Currency Symbols
Specific Letter, Number, and Symbol Representations
Unicode Versions and Encodings
See Also