Early on, it was obvious that the
Unicode standard was going to outgrow the 65 thousand code points that are availble to a sixteen bit encoding. To solve this, the
Surrogates Area was defined, to allow another
million code points to be defined, while still maintaining Unicode's 16 bit identity.
Unicode's Surrogates Area code block reserves the 2048 code points from U+D800 to U+DFFF.
Hangul Syllables <-- Surrogates Area --> Private Use Area
The Surrogate Area consists of 1,024 low-half surrogates and 1,024 high-half surrogates, which are interpreted as pairs to access 1,048,576 (2^20) code points. 44,944 of these have been assigned as of Unicode version 3.2.
The high-surrogates are assigned to the range U+D800 to U+DBFF, and
the low-surrogates to the range U+DC00 to U+DFFF. The high-surrogate is always first and the low-surrogate is always second. Surrogates have no meaning except as part of a pair.
The Unicode code value represented by a surrogate pair is calculated by
(H - 0xD800) * 0x400 + (L - 0xDC00) + 0x10000
Where H and L are the high and low surrogates respectively. The reverse mapping for a value S is
H = (S - 0x10000) / 0x400 + 0xD800
L = (S - 0x10000) % 0x400 + 0xDC00
This allows surrogates to specify the code range 0x10000 to 0x10FFFF
High-surrogates U+DB80 to U+DBFF are reserved for private use, which allows for a total of 131,068 private use characters representable by surrogate pairs. The total is 4 fewer than you might expect, as four code points (U+FFFFE, U+FFFFF, U+10FFFE and U+10FFFF) have been reserved as non-characters. These are in addition to the 6400 characters in the Private Use Area of the Basic Multilingual Plane.
The code block which in Unicode 3.2 was called Surrogates Area is now (as of Unicode 4.0) three code blocks :
- U+D800 to U+DB7F High Surrogates
- U+DB80 to U+DBFF High Private Use Surrogates
- U+DC00 to U+DFFF Low Surrogates
The code blocks currently defined within the reach of surrogate pairs include :
U+10000 to U+1007F Linear B Syllabary 88/128 4.0
U+10080 to U+100FF Linear B Ideograms 123/128 4.0
U+10100 to U+1013F Aegean Numbers 57/64 4.0
U+10140 to U+1018F Ancient Greek Numbers 75/80 4.1
U+10190 to U+101CF Ancient Symbols 12/64 5.1
U+101D0 to U+101FF Phaistos Disc 46/48 5.1
U+10200 to U+1027F vacant 0/128
U+10280 to U+1029F Lycian 29/32 5.1
U+102A0 to U+102DF Carian 49/64 5.1
U+102E0 to U+102FF vacant 0/32
U+10300 to U+1032F Old Italic 35/48 3.1
U+10330 to U+1034F Gothic 27/32 3.1
U+10350 to U+1037F vacant 0/48
U+10380 to U+1039F Ugaritic 31/32 4.0
U+103A0 to U+103DF Old Persian 50/64 4.1
U+103E0 to U+103FF vacant 0/32
U+10400 to U+1044F Deseret 80/80 3.1(76) 4.0(4)
U+10450 to U+1047F Shavian 48/48 4.0
U+10480 to U+104AF Osmanya 40/48 4.0
U+104B0 to U+107FF vacant 0/848
U+10800 to U+1083F Cypriot Syllabary 55/64 4.0
U+10840 to U+108FF vacant 0/192
U+10900 to U+1091F Phoenician 27/32 5.0
U+10920 to U+1093F Lydian 27/32 5.1
U+10940 to U+109FF vacant 0/192
U+10A00 to U+10A5F Kharoshthi 65/96 4.1
U+10A60 to U+11FFF vacant 0/5536
U+12000 to U+123FF Cuneiform 879/1024 5.0
U+12400 to U+1247F Cuneiform Numbers and Punctuation 103/128 5.0
U+12480 to U+1CFFF vacant 0/43904
U+1D000 to U+1D0FF Byzantine Musical Symbols 246/256 3.1
U+1D100 to U+1D1FF Musical Symbols 220/256 3.1(219) 5.1(1)
U+1D200 to U+1D24F Ancient Greek Musical Notation 70/80 4.1
U+1D250 to U+1D2FF vacant 0/176
U+1D300 to U+1D35F Tai Xuan Jing Symbols 87/96 4.0
U+1D360 to U+1D37F Counting Rod Numerals 18/32 5.0
U+1D380 to U+1D3FF vacant 0/128
U+1D400 to U+1D7FF Mathematical Alphanumeric Symbols 996/1024 3.1(991) 4.0(1) 4.1(2) 5.0(2)
U+1D800 to U+1EFFF vacant 0/6144
U+1F000 to U+1F02F Mahjong Tiles 44/48 5.1
U+1F030 to U+1F09F Domino Tiles 100/112 5.1
U+1F0A0 to U+1FFFF vacant 0/3936
U+20000 to U+2A6DF CJK Unified Ideographs Extension B 42711/42720 3.1
U+2A6E0 to U+2F7FF vacant 0/20768
U+2F800 to U+2FA1F CJK Compatibility Ideographs Supplement 542/544 3.1
U+2FA20 to U+DFFFF vacant 0/722400
U+E0000 to U+E007F Tags 97/128 3.1
U+E0080 to U+E00FF vacant 0/128
U+E0100 to U+E01EF Variation Selectors Supplement 240/240 4.0
U+E01F0 to U+EFFFF vacant 0/65040
U+F0000 to U+FFFFF Supplementary Private Use Area A 65536/65536 2.0
U+100000 to U+10FFFF Supplementary Private Use Area B 65536/65536 2.0