Early on, it was obvious that the Unicode standard was going to outgrow the 65 thousand code points that are availble to a sixteen bit encoding. To solve this, the Surrogates Area was defined, to allow another million code points to be defined, while still maintaining Unicode's 16 bit identity.

Unicode's Surrogates Area code block reserves the 2048 code points from U+D800 to U+DFFF.

Hangul Syllables <-- Surrogates Area --> Private Use Area

The Surrogate Area consists of 1,024 low-half surrogates and 1,024 high-half surrogates, which are interpreted as pairs to access 1,048,576 (2^20) code points. 44,944 of these have been assigned as of Unicode version 3.2.

The high-surrogates are assigned to the range U+D800 to U+DBFF, and the low-surrogates to the range U+DC00 to U+DFFF. The high-surrogate is always first and the low-surrogate is always second. Surrogates have no meaning except as part of a pair.

The Unicode code value represented by a surrogate pair is calculated by

(H - 0xD800) * 0x400 + (L - 0xDC00) + 0x10000
Where H and L are the high and low surrogates respectively. The reverse mapping for a value S is
H = (S - 0x10000) / 0x400 + 0xD800
L = (S - 0x10000) % 0x400 + 0xDC00
This allows surrogates to specify the code range 0x10000 to 0x10FFFF

High-surrogates U+DB80 to U+DBFF are reserved for private use, which allows for a total of 131,068 private use characters representable by surrogate pairs. The total is 4 fewer than you might expect, as four code points (U+FFFFE, U+FFFFF, U+10FFFE and U+10FFFF) have been reserved as non-characters. These are in addition to the 6400 characters in the Private Use Area of the Basic Multilingual Plane.

The code block which in Unicode 3.2 was called Surrogates Area is now (as of Unicode 4.0) three code blocks :

  • U+D800 to U+DB7F   High Surrogates
  • U+DB80 to U+DBFF   High Private Use Surrogates
  • U+DC00 to U+DFFF   Low Surrogates

The code blocks currently defined within the reach of surrogate pairs include :

U+10000 to U+1007F   Linear B Syllabary   88/128   4.0
U+10080 to U+100FF   Linear B Ideograms   123/128   4.0
U+10100 to U+1013F   Aegean Numbers   57/64   4.0
U+10140 to U+1018F   Ancient Greek Numbers   75/80   4.1
U+10190 to U+101CF   Ancient Symbols   12/64   5.1
U+101D0 to U+101FF   Phaistos Disc   46/48   5.1
U+10200 to U+1027F   vacant   0/128  
U+10280 to U+1029F   Lycian   29/32   5.1
U+102A0 to U+102DF   Carian   49/64   5.1
U+102E0 to U+102FF   vacant   0/32  
U+10300 to U+1032F   Old Italic   35/48   3.1
U+10330 to U+1034F   Gothic   27/32   3.1
U+10350 to U+1037F   vacant   0/48  
U+10380 to U+1039F   Ugaritic   31/32   4.0
U+103A0 to U+103DF   Old Persian   50/64   4.1
U+103E0 to U+103FF   vacant   0/32  
U+10400 to U+1044F   Deseret   80/80   3.1(76)   4.0(4)  
U+10450 to U+1047F   Shavian   48/48   4.0
U+10480 to U+104AF   Osmanya   40/48   4.0
U+104B0 to U+107FF   vacant   0/848  
U+10800 to U+1083F   Cypriot Syllabary   55/64   4.0
U+10840 to U+108FF   vacant   0/192  
U+10900 to U+1091F   Phoenician   27/32   5.0
U+10920 to U+1093F   Lydian   27/32   5.1
U+10940 to U+109FF   vacant   0/192  
U+10A00 to U+10A5F   Kharoshthi   65/96   4.1
U+10A60 to U+11FFF   vacant   0/5536  
U+12000 to U+123FF   Cuneiform   879/1024   5.0
U+12400 to U+1247F   Cuneiform Numbers and Punctuation   103/128   5.0
U+12480 to U+1CFFF   vacant   0/43904  
U+1D000 to U+1D0FF   Byzantine Musical Symbols   246/256   3.1
U+1D100 to U+1D1FF   Musical Symbols   220/256   3.1(219)   5.1(1)  
U+1D200 to U+1D24F   Ancient Greek Musical Notation   70/80   4.1
U+1D250 to U+1D2FF   vacant   0/176  
U+1D300 to U+1D35F   Tai Xuan Jing Symbols   87/96   4.0
U+1D360 to U+1D37F   Counting Rod Numerals   18/32   5.0
U+1D380 to U+1D3FF   vacant   0/128  
U+1D400 to U+1D7FF   Mathematical Alphanumeric Symbols   996/1024   3.1(991)   4.0(1)   4.1(2)   5.0(2)  
U+1D800 to U+1EFFF   vacant   0/6144  
U+1F000 to U+1F02F   Mahjong Tiles   44/48   5.1
U+1F030 to U+1F09F   Domino Tiles   100/112   5.1
U+1F0A0 to U+1FFFF   vacant   0/3936  
U+20000 to U+2A6DF   CJK Unified Ideographs Extension B   42711/42720   3.1
U+2A6E0 to U+2F7FF   vacant   0/20768  
U+2F800 to U+2FA1F   CJK Compatibility Ideographs Supplement   542/544   3.1
U+2FA20 to U+DFFFF   vacant   0/722400  
U+E0000 to U+E007F   Tags   97/128   3.1
U+E0080 to U+E00FF   vacant   0/128  
U+E0100 to U+E01EF   Variation Selectors Supplement   240/240   4.0
U+E01F0 to U+EFFFF   vacant   0/65040  
U+F0000 to U+FFFFF   Supplementary Private Use Area A   65536/65536   2.0
U+100000 to U+10FFFF   Supplementary Private Use Area B   65536/65536   2.0

Log in or register to write something here or to contact authors.