Unicode 3.0 was released in
February, 2000 , updated to
Unicode 3.0.1 in
August, 2001 , and later updated once again to
Unicode 3.0.1 with Corrigendum. The previous version was
Unicode 2.1 and the next version is
Unicode 3.1.
Unicode 3.0.1 with Corrigendum
Unicode 3.0.1 with
Corrigendum is defined by: The Unicode Standard, Version 3.0 (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5) as amended by the Unicode 3.0.1 Update Notice (http://www.unicode.org/versions/Unicode3.0.1.html ), Corrigendum #1: UTF-8 Shortest Form (http://www.unicode.org/versions/corrigendum1.html ) and Corrigendum #2: Yod with Hiriq Normalization (http://www.unicode.org/versions/corrigendum2.html )
Corrigendum #1: UTF-8 Shortest Form states
The conformance clause C12 in The Unicode Standard, Version 3.0 forbids the generation of "non-shortest form" UTF-8, and forbids the interpretation of illegal sequences, but not the interpretation of "non-shortest form". Where software does interpret the non-shortest forms, security issues can arise. For example:
- Process A performs security checks, but does not check for non-shortest forms.
- Process B accepts the byte sequence from process A, and transforms it into UTF-16 while interpreting non-shortest forms.
- The UTF-16 text may then contain characters that should have been filtered out by process A.
To address this issue, the Unicode Technical Committee has modified the definition of UTF-8 to forbid conformant implementations from interpreting non-shortest forms for BMP characters, and clarified some of the conformance clauses.
Corrigendum #2:
Yod with Hiriq Normalization states :
In the production of the normalization tables for Unicode 3.0, the character U+FB1D
יִ
Hebrew letter yod with hiriq was mistakenly omitted from Composition Exclusions . During the public review period, this mistake was reported, but the report was misinterpreted and thus overlooked. In Unicode 3.1 , this character is now included in Composition Exclusions.
Version 3.0.1
Version 3.0.1 is defined by: The Unicode Standard, Version 3.0 (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5) as amended by the Unicode 3.0.1 Update Notice (http://www.unicode.org/versions/Unicode3.0.1.html ).
Unicode 3.0.1 does not contain character additions or major normative change
Three new data files have been added to the Unicode 3.0.1 release:
- BidiMirroring.txt (see UAX #9: The Bidirectional Algorithm)
- Informative properties for substituting characters in an implementation of bidirectional mirroring.
- CaseFolding.txt see UTR #21: Case Mappings)
- Informative file mapping characters to their case-folded form.
- NormalizationTest.txt (see UAX #15 Unicode Normalization Forms)
- Normative test file for conformance to Unicode Normalization Forms.
Version 3.0
Version 3.0 is defined by: The Unicode Standard, Version 3.0 (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5).
The Unicode Standard, Version 3.0 contains descriptions and properties for many new characters. It is synchronized with ISO/IEC 10646-1 second edition, and includes a number of new characters.
The following technical reports are approved and upgraded to the status of Unicode Technical Annex] and thus considered part of the Unicode Standard, Version 3.0. These reports may contain either normative or informative material, or both. Any reference to version 3.0 of the standard automatically includes these technical reports.
UAX #09: The Bidirectional Algorithm
UAX #11: East Asian Character Width
UAX #13: Unicode Newline Guidelines
UAX #14: Line Breaking Properties
UAX #15: Unicode Normalization Forms
The most significant additions to the standard include the following:
- Transformation Formats
- The precise definitions of the common Unicode Transformation Formats are provided, including UTF-8, UTF-16, UTF-16BE, and UTF-16LE. The relations between abstract characters, code points (scalar values) and code units (8, 16 or 32 bit) are clarified.
- Bidirectional properties
- Bidirectional properties are now more consistent with the General Category property, and new bidirectional properties were created. See UAX #09: The Bidirectional Algorithm.
- Case
- Case properties have been extended for those situations where there is a mapping to multiple characters and where case is locale dependent.
- Combining classes
- These were updated significantly to resolve problems of normalization and decomposition for Indic scripts in particular.
- Decomposition and Composition
- Unicode character decompositions have been significantly updated to fix errors in the original assignments, to allow correct collation weighting, and to make decompositions consistent for normalization. Certain characters are excluded from composition, and the precise algorithm for composition is provided. See UAX #15: Unicode Normalization Forms.
- General Category
- A series of general category changes were made to assist the convergence of the Unicode definition of identifier with ISO TR 10176.
- Newlines
- Line handling characteristics have been documented more fully for Unicode environments. See UAX #13: Unicode Newline Guidelines
- Linebreak properties
- Linebreaking properties (normative and informative) are added to the standard to support consistent linebreaking behavior over all Unicode characters. See UAX #14: Line Breaking Properties
- East-Asian width properties
- Properties for supporting correct choice of full-width vs. half-width glyphs in an East-Asian context are provided. See UAX #11: East Asian Character Width.
The major differences from
Unicode 2.1 to
Unicode 3.0 include :
New Code Blocks
19 new
code blocks were added in 3.0
U+0700 to U+074F Syriac 71/80
U+0780 to U+07BF Thaana 49/64
U+0D80 to U+0DFF Sinhala 80/128
U+1000 to U+109F Myanmar 78/160
U+1200 to U+137F Ethiopic 345/384
U+13A0 to U+13FF Cherokee 85/96
U+1400 to U+167F Unified Canadian Aboriginal Syllabics 630/640
U+1680 to U+169F Ogham 29/32
U+16A0 to U+16FF Runic 81/96
U+1780 to U+17FF Khmer 103/128
U+1800 to U+18AF Mongolian 155/176
U+2800 to U+28FF Braille Patterns 256/256
U+2E80 to U+2EFF CJK Radicals Supplement 115/128
U+2F00 to U+2FDF Kangxi Radicals 214/224
U+2FF0 to U+2FFF Ideographic Description Characters 12/16
U+31A0 to U+31BF Bopomofo Extended 24/32
U+3400 to U+4DBF CJK Unified Ideographs Extension A 6582/6592
U+A000 to U+A48F Yi Syllables 1165/1168
U+A490 to U+A4CF Yi Radicals 50/64
New Bidirectional Categories
8 new
Bidirectional Categories were added in 3.0
- LeftToRightEmbedding (LRE)
- LeftToRightOverride (LRO)
- RightToLeftArabic (AL)
- RightToLeftEmbedding (RLE)
- RightToLeftOverride (RLO)
- PopDirectionalFormat (PDF)
- NonSpacingMark (NSM)
- BoundaryNeutral (BN)
New Characters
Excluding those in the new
code blocks, there were 183 new characters added in Unicode 3.0
Number of characters in each General Category :
Letter, Uppercase Lu : 21
Letter, Lowercase Ll : 30
Letter, Modifier Lm : 1
Letter, Other Lo : 9
Mark, Non-Spacing Mn : 22
Mark, Enclosing Me : 4
Number, Letter Nl : 4
Punctuation, Dash Pd : 1
Punctuation, Other Po : 6
Symbol, Currency Sc : 3
Symbol, Modifier Sk : 5
Symbol, Other So : 73
Separator, Space Zs : 1
Other, Format Cf : 3
Number of characters in each Bidirectional Category :
LeftToRight L : 73
RightToLeft R : 1
RightToLeftArabic AL : 9
EuropeanNumberTerminator ET : 3
NonSpacingMark NSM : 26
BoundaryNeutral BN : 3
Whitespace WS : 1
OtherNeutrals ON : 67
The columns below should be interpreted as :
- The Unicode code for the character
- The character in question
- The Unicode name for the character
- The Unicode General Category for the character
- The Unicode Bidirectional Category for the character
If the characters below show up poorly, or not at all, see Unicode Support for possible solutions.
Latin Extended B
Additions
- U+01F6 Ƕ Latin capital letter hwair Lu L
- * lowercase is 0195
- U+01F7 Ƿ Latin capital letter wynn Lu L
- aka wen
- * lowercase is 01BF
- U+01F8 Ǹ Latin capital letter N with grave Lu L
- U+01F9 ǹ Latin small letter N with grave Ll L
- * Pinyin
Additions for Romanian
- U+0218 Ș Latin capital letter S with comma below Lu L
- U+0219 ș Latin small letter S with comma below Ll L
- * Romanian, when distinct comma below form is required
- ref U+015F Latin small letter S with cedilla (Latin Extended A)
- U+021A Ț Latin capital letter T with comma below Lu L
- U+021B ț Latin small letter T with comma below Ll L
- * Romanian, when distinct comma below form is required
- ref U+0163 Latin small letter T with cedilla (Latin Extended A)
Miscellaneous additions
- U+021C Ȝ Latin capital letter yogh Lu L
- ref U+01B7 Latin capital letter ezh (Latin Extended B)
- U+021D ȝ Latin small letter yogh Ll L
- * Middle English, Scots
- ref U+0292 Latin small letter ezh (IPA Extensions)
- ref U+2125 ounce sign (Letterlike Symbols)
- U+021E Ȟ Latin capital letter H with caron Lu L
- U+021F ȟ Latin small letter H with caron Ll L
- * Finnish Romany
- U+0222 Ȣ Latin capital letter ou Lu L
- U+0223 ȣ Latin small letter ou Ll L
- * Algonquin, Huron
- ref U+0038 digit eight (Basic Latin)
- U+0224 Ȥ Latin capital letter Z with hook Lu L
- U+0225 ȥ Latin small letter Z with hook Ll L
- * Middle High German
- U+0226 Ȧ Latin capital letter A with dot above Lu L
- U+0227 ȧ Latin small letter A with dot above Ll L
- * Uralicist usage
- U+0228 Ȩ Latin capital letter E with cedilla Lu L
- U+0229 ȩ Latin small letter E with cedilla Ll L
Additions for Livonian
- U+022A Ȫ Latin capital letter O with diaeresis and macron Lu L
- U+022B ȫ Latin small letter O with diaeresis and macron Ll L
- * Livonian
- U+022C Ȭ Latin capital letter O with tilde and macron Lu L
- U+022D ȭ Latin small letter O with tilde and macron Ll L
- * Livonian
- U+022E Ȯ Latin capital letter O with dot above Lu L
- U+022F ȯ Latin small letter O with dot above Ll L
- * Livonian
- U+0230 Ȱ Latin capital letter O with dot above and macron Lu L
- U+0231 ȱ Latin small letter O with dot above and macron Ll L
- * Livonian
- U+0232 Ȳ Latin capital letter Y with macron Lu L
- U+0233 ȳ Latin small letter Y with macron Ll L
- * Livonian, Cornish
IPA Extensions
IPA characters for disordered speech
- U+02A9 ʩ Latin small letter feng digraph Ll L
- * velopharyngeal fricative
- U+02AA ʪ Latin small letter ls digraph Ll L
- * lateral alveolar fricative (lisp)
- U+02AB ʫ Latin small letter lz digraph Ll L
- * voiced lateral alveolar fricative
- U+02AC ʬ Latin letter bilabial percussive Ll L
- * audible lip smack
- U+02AD ʭ Latin letter bidental percussive Ll L
- * audible teeth gnashing
Spacing Modifier Letters
Additions based on 1989 IPA
- U+02DF ˟ modifier letter cross accent Sk ON
- * Swedish grave accent
Tone letters
- U+02EA ˪ modifier letter yin departing tone mark Sk ON
- U+02EB ˫ modifier letter yang departing tone mark Sk ON
IPA modifiers
- U+02EC ˬ modifier letter voicing Sk ON
- U+02ED ˭ modifier letter unaspirated Sk ON
Other modifier letters
- U+02EE ˮ modifier letter double apostrophe Lm L
- * Nenets
Combining Diacritical Marks
Additions for IPA
- U+0346 ͆ combining bridge above Mn NSM
- * IPA: dentolabial
- ref U+20E9 (null) (Combining Diacritical Marks for Symbols)
- U+0347 ͇ combining equals sign below Mn NSM
- * IPA: alveolar
- U+0348 ͈ combining double vertical line below Mn NSM
- * IPA: strong articulation
- U+0349 ͉ combining left angle below Mn NSM
- * IPA: weak articulation
- U+034A ͊ combining not tilde above Mn NSM
- * IPA: denasal
IPA diacritics for disordered speech
- U+034B ͋ combining homothetic above Mn NSM
- * IPA: nasal escape
- U+034C ͌ combining almost equal to above Mn NSM
- * IPA: velopharyngeal friction
- U+034D ͍ combining left right arrow below Mn NSM
- * IPA: labial spreading
- U+034E ͎ combining upwards arrow below Mn NSM
- * IPA: whistled articulation
Double diacritics
- U+0362 ͢ combining double rightwards arrow below Mn NSM
- * IPA: sliding articulation
Greek and Coptic
Variant letterforms
- U+03D7 ϗ Greek kai symbol Ll L
- * used as an ampersand
Archaic letters
- U+03DB ϛ Greek small letter stigma Ll L
- ref U+03C2 Greek small letter final sigma (Greek and Coptic)
- U+03DD ϝ Greek small letter digamma Ll L
- * used as a symbol with a numeric value of 6
- U+03DF ϟ Greek small letter koppa Ll L
- * used in modern Greek as a symbol with a numeric value of 90, as in the dating of legal documentation
- U+03E1 ϡ Greek small letter sampi Ll L
- * used as a symbol with a numeric value of 900
Cyrillic
Cyrillic extensions
- U+0400 Ѐ Cyrillic capital letter ie with grave Lu L
- U+040D Ѝ Cyrillic capital letter I with grave Lu L
Cyrillic extensions
- U+0450 ѐ Cyrillic small letter ie with grave Ll L
- * Macedonian
- U+045D ѝ Cyrillic small letter I with grave Ll L
- * Macedonian
Historic miscellaneous
- U+0488 ҈ combining Cyrillic hundred thousands sign Me NSM
- U+0489 ҉ combining Cyrillic millions sign Me NSM
Extended Cyrillic
- U+048C Ҍ Cyrillic capital letter semisoft sign Lu L
- U+048D ҍ Cyrillic small letter semisoft sign Ll L
- * Kildin Sami
- U+048E Ҏ Cyrillic capital letter er with tick Lu L
- U+048F ҏ Cyrillic small letter er with tick Ll L
- * Kildin Sami
- U+04EC Ӭ Cyrillic capital letter E with diaeresis Lu L
- U+04ED ӭ Cyrillic small letter E with diaeresis Ll L
- * Kildin Sami
Armenian
Punctuation
- U+058A ֊ Armenian hyphen Pd ON
- aka yentamna
Arabic
Combining maddah and hamza
- U+0653 ٓ Arabic maddah above Mn NSM
- U+0654 ٔ Arabic hamza above Mn NSM
- U+0655 ٕ Arabic hamza below Mn NSM
Extended Arabic letters
- U+06B8 ڸ Arabic letter lam with three dots below Lo AL
- U+06B9 ڹ Arabic letter noon with dot below Lo AL
- U+06BF ڿ Arabic letter tcheh with dot above Lo AL
- U+06CF ۏ Arabic letter waw with dot above Lo AL
Extended Arabic letters
- U+06FA ۺ Arabic letter sheen with dot below Lo AL
- U+06FB ۻ Arabic letter dad with dot below Lo AL
- U+06FC ۼ Arabic letter ghain with dot below Lo AL
Signs for Sindhi
- U+06FD ۽ Arabic sign sindhi ampersand So AL
- U+06FE ۾ Arabic sign sindhi postposition men So AL
Tibetan
Consonants
- U+0F6A ཪ Tibetan letter fixed form ra Lo L
- * used only in transliteration and transcription
Subjoined consonants
- U+0F96 ྖ Tibetan subjoined letter cha Mn NSM
- U+0FAE ྮ Tibetan subjoined letter zha Mn NSM
- U+0FAF ྯ Tibetan subjoined letter za Mn NSM
- U+0FB0 ྰ Tibetan subjoined letter a Mn NSM
- aka a-chung
- * rare, only used for full-sized subjoined letter
- ref U+0F71 Tibetan vowel sign aa (Tibetan)
- U+0FB8 ྸ Tibetan subjoined letter A Mn NSM
Fixed-form subjoined consonants
- U+0FBA ྺ Tibetan subjoined letter fixed form wa Mn NSM
- U+0FBB ྻ Tibetan subjoined letter fixed form ya Mn NSM
- U+0FBC ྼ Tibetan subjoined letter fixed form ra Mn NSM
Signs
- U+0FBE ྾ Tibetan ku ru kha So L
- * often repeated three times; indicates a refrain
- U+0FBF ྿ Tibetan ku ru kha bzhi mig can So L
- * marks point of text insertion or annotation
- ref U+203B reference mark (General Punctuation)
Cantillation signs
- U+0FC0 ࿀ Tibetan cantillation sign heavy beat So L
- * marks a heavy drum beat
- U+0FC1 ࿁ Tibetan cantillation sign light beat So L
- * marks a light drum beat
- U+0FC2 ࿂ Tibetan cantillation sign cang te u So L
- * symbol of a small Tibetan hand drum
- U+0FC3 ࿃ Tibetan cantillation sign sbub chal So L
- * symbol of a Tibetan cymbal
Symbols
- U+0FC4 ࿄ Tibetan symbol dril bu So L
- * symbol of a Tibetan hand bell
- U+0FC5 ࿅ Tibetan symbol rdo rje So L
- U+0FC6 ࿆ Tibetan symbol padma gdan Mn NSM
- U+0FC7 ࿇ Tibetan symbol rdo rje rgya gram So L
- U+0FC8 ࿈ Tibetan symbol phur pa So L
- U+0FC9 ࿉ Tibetan symbol nor bu So L
- U+0FCA ࿊ Tibetan symbol nor bu nyis khyil So L
- * the double body symbol
- ref U+262F yin yang (Miscellaneous Symbols)
- U+0FCB ࿋ Tibetan symbol nor bu gsum khyil So L
- * the tri-kaya or triple body symbol
- U+0FCC ࿌ Tibetan symbol nor bu bzhi khyil So L
- * the quadruple body symbol, a form of the swastika
- ref U+534D CJK Ideograph U+534D (CJK Unified Ideographs)
Astrological sign
- U+0FCF ࿏ Tibetan sign rdel nag gsum So L
General Punctuation
Formatting characters
- U+202F narrow no break space Zs WS
- ref U+00A0 no break space (Latin-1 Supplement)
Double punctuation for vertical text
- U+2048 ⁈ question exclamation mark Po ON
- U+2049 ⁉ exclamation question mark Po ON
General punctuation
- U+204A ⁊ tironian sign et Po ON
- * Irish Gaelic, ...
- U+204B ⁋ reversed pilcrow sign Po ON
- ref U+00B6 pilcrow sign (Latin-1 Supplement)
- U+204C ⁌ black leftwards bullet Po ON
- U+204D ⁍ black rightwards bullet Po ON
Currency Symbols
Currency symbols
- U+20AD ₭ kip sign Sc ET
- * Kip in Laos
- * Laos
- U+20AE ₮ tugrik sign Sc ET
- * Tugrik in Mongolia. Also spelled tugrug, tugric, tugrog or togrog
- * Mongolia
- * also transliterated as tugrug, tugric, tugrog, togrog, t?gr?g
- U+20AF ₯ drachma sign Sc ET
- * Drachma in Greece
- * Greece
Combining Diacritical Marks for Symbols
Additional enclosing diacritics
- U+20E2 ⃢ combining enclosing screen Me NSM
- ref U+239A clear screen symbol (Miscellaneous Technical)
- U+20E3 ⃣ combining enclosing keycap Me NSM
Letterlike Symbols
Additional letterlike symbols
- U+2139 ℹ information source Ll L
- * intended for use with 20DD
- U+213A ℺ rotated capital q So ON
- * a binding signature mark
Number Forms
Roman numerals
- U+2183 Ↄ Roman numeral reversed one hundred Nl L
- aka apostrophic C
- * used in combination with C and I to form large numbers
Arrows
Arrows
- U+21EB ⇫ upwards white arrow on pedestal So ON
- aka level 2 lock
- U+21EC ⇬ upwards white arrow on pedestal with horizontal bar So ON
- aka caps lock
- U+21ED ⇭ upwards white arrow on pedestal with vertical bar So ON
- aka numerics lock
- U+21EE ⇮ upwards white double arrow So ON
- aka level 3 select
- U+21EF ⇯ upwards white double arrow on pedestal So ON
- aka level 3 lock
- U+21F0 ⇰ rightwards white arrow from wall So ON
- aka group lock
- U+21F1 ⇱ north west arrow to corner So ON
- aka home
- U+21F2 ⇲ south east arrow to corner So ON
- aka end
- U+21F3 ⇳ up down white arrow So ON
- aka scrolling
Miscellaneous Technical
Miscellaneous technical
- U+2301 ⌁ electric arrow So ON
- * from ISO 2047
- * symbol for End of Transmission
Graphics for control codes
- U+237B ⍻ not check mark So ON
- * from ISO 2047
- * symbol for Negative Acknowledge
Graphics for control codes
- U+237D ⍽ shouldered open box So ON
- * from ISO 9995-7
- * keyboard symbol for No Break Space
- U+237E ⍾ bell symbol So ON
- * from ISO 2047
- U+237F ⍿ vertical line with middle dot So ON
- * from ISO 2047
- * symbol for End of Medium
Keyboard symbols from ISO 9995-7
- U+2380 ⎀ insertion symbol So ON
- U+2381 ⎁ continuous underline symbol So ON
- U+2382 ⎂ discontinuous underline symbol So ON
- U+2383 ⎃ emphasis symbol So ON
- U+2384 ⎄ composition symbol So ON
- U+2385 ⎅ white square with centre vertical line So ON
- U+2386 ⎆ enter symbol So ON
- U+2387 ⎇ alternative key symbol So ON
- U+2388 ⎈ helm symbol So ON
- aka control
- ref U+2638 wheel of dharma (Miscellaneous Symbols)
- U+2389 ⎉ circled horizontal bar with notch So ON
- U+238A ⎊ circled triangle down So ON
- U+238B ⎋ broken circle with northwest arrow So ON
- U+238C ⎌ undo symbol So ON
Electrotechnical symbols from IR 181
- U+238D ⎍ monostable symbol So ON
- U+238E ⎎ hysteresis symbol So ON
- U+238F ⎏ open circuit output h type symbol So ON
- U+2390 ⎐ open circuit output l type symbol So ON
- U+2391 ⎑ passive pull down output symbol So ON
- U+2392 ⎒ passive pull up output symbol So ON
- U+2393 ⎓ direct current symbol form two So ON
- U+2394 ⎔ software function symbol So ON
APL
- U+2395 ⎕ APL functional symbol quad So L
- ref U+2337 APL functional symbol squish quad (Miscellaneous Technical)
- ref U+25AF white vertical rectangle (Geometric Shapes)
Keyboard symbols from ISO 9995-7
- U+2396 ⎖ decimal separator key symbol So ON
- U+2397 ⎗ previous page So ON
- U+2398 ⎘ next page So ON
- U+2399 ⎙ print screen symbol So ON
- U+239A ⎚ clear screen symbol So ON
- ref U+20E2 combining enclosing screen (Combining Diacritical Marks for Symbols)
Control Pictures
Keyboard symbol
- U+2425 ␥ symbol for delete form two So ON
- * from ISO 9995-7
- * keyboard symbol for undoable delete
Specific symbol for control code
- U+2426 ␦ symbol for substitute form two So ON
- * from ISO 2047
- ref U+061F Arabic question mark (Arabic)
Geometric Shapes
Control code graphics
- U+25F0 ◰ white square with upper left quadrant So ON
- U+25F1 ◱ white square with lower left quadrant So ON
- U+25F2 ◲ white square with lower right quadrant So ON
- U+25F3 ◳ white square with upper right quadrant So ON
- U+25F4 ◴ white circle with upper left quadrant So ON
- U+25F5 ◵ white circle with lower left quadrant So ON
- U+25F6 ◶ white circle with lower right quadrant So ON
- U+25F7 ◷ white circle with upper right quadrant So ON
Miscellaneous Symbols
Miscellaneous symbol
- U+2619 ☙ reversed rotated floral heart bullet So ON
- * a binding signature mark
- ref U+2767 rotated floral heart bullet (Dingbats)
Syriac cross symbols
- U+2670 ♰ west Syriac cross So ON
- U+2671 ♱ east Syriac cross So ON
CJK Symbols and Punctuation
Additional Suzhou numerals
- U+3038 〸 Hangzhou numeral ten Nl L
- U+3039 〹 Hangzhou numeral twenty Nl L
- U+303A 〺 Hangzhou numeral thirty Nl L
Special CJK indicators
- U+303E 〾 ideographic variation indicator So ON
- * visual indicator that the following ideograph is to be taken as a variant of the intended character
Alphabetic Presentation Forms
Hebrew presentation forms
- U+FB1D יִ Hebrew letter yod with hiriq Lo R
Specials
Interlinear annotation
- U+FFF9 interlinear annotation anchor Cf BN
- * marks start of annotated text
- U+FFFA interlinear annotation separator Cf BN
- * marks start of annotating character(s)
- U+FFFB interlinear annotation terminator Cf BN
- * marks end of annotation block
Altered Characters
In addition to more than a thousand
General Category changes, and the addition of eight new Bidiretcional Categories, 8 characters altered their
Bidirectional Category in 3.0
Basic Latin
U+000C
form feed had its
Bidirectional Category changed from
ParagraphSeparator to
Whitespace
Latin-1 Supplement
U+0085
next line had its
Bidirectional Category changed from
OtherNeutrals to
ParagraphSeparator
Spacing Modifier Letters
U+02D0
ː modifier letter triangular colon had its
Bidirectional Category changed from
OtherNeutrals to
LeftToRight
U+02D1
ˑ modifier letter half triangular colon had its
Bidirectional Category changed from
OtherNeutrals to
LeftToRight
Letterlike Symbols
U+2118
℘ script capital p had its
Bidirectional Category changed from
LeftToRight to
OtherNeutrals
U+212E
℮ estimated symbol had its
Bidirectional Category changed from
LeftToRight to
EuropeanNumberTerminator
Katakana
U+30FB
・ Katakana middle dot had its
Bidirectional Category changed from
LeftToRight to
OtherNeutrals
Halfwidth and Fullwidth Forms
U+FF65
・ halfwidth Katakana middle dot had its
Bidirectional Category changed from
LeftToRight to
OtherNeutrals
http://unicode.org