Specials (idea) by avjewe - Everything2.com

Specials is the name of a range of characters in the Unicode character encoding standard.

The Specials code block contains code values that are neither control characters nor graphic characters, but are provided to facilitate current software practices. Of the 16 reserved code points, only 5 have been allocated as of Unicode 3.2.

Byte Order Mark (BOM)
The special character code U+FFFE is guaranteed never to be a valid Unicode character. It is used in conjunction with U+FEFF zero width no break space (Arabic Presentation Forms B) to identify character set and byte order. By convention, a zero width no break space is often placed at the beginning of a Unicode text file, where it neither adds semantics nor alters the display. When Unicode is stored as 16-bit integers (UTF-16), the concept of byte order rears its ugly head. If your Unicode file begins with the 16-bit value FFFE, you know most likely you've got a valid Unicode file in the reverse byte order from what your machine expects. Similarly, if you have an unknown file that begins with the bytes FFFE or FEFF, you're probably looking at a Unicode text file. If the file starts with EF BB BF, you're probably looking at a UTF-8 encoded Unicode file (as EFBBBF is the UTF-8 encoding of U+FEFF). Files starting with 0000FEFF or FFFE0000 are probably UTF-32.

Interlinear Annotation
In some applications, there is annotating text that related so a string of annotated text. In these cases, there are some operations which need to ignore the annotations, and others that want to include them. To this end, Unicode provides three markup characters: an anchor, a separator and a terminator. To specify out of band data this way, the text stream stores

interlinear annotation anchor
The Annotated Text
interlinear annotation separator
The Annotating Text
interlinear annotation terminator

Multiple occurrences of interlinear annotation separator are allowed, which would then delimit the annotating text into application specific sections. Annotations may be nested.

Replacement Characters
U+FFFC object replacement character is used as an insertion point for objects located within a stream of text. All information about the object is kept outside of the character stream. This character simply provides an anchor to assure correct placement of the object within the text stream.

U+FFFD replacement character is a catchall for characters that cannot otherwise be encoded in terms of known Unicode values.

Non-characters
As described above U+FFFE will never be assigned and is reserved for use in determining byte order.

U+FFFF will also never be a valid Unicode character, an is suitable for use as an error code or other non-character value.

Unicode's Specials code block reserves the 16 code points from U+FFF0 to U+FFFF, of which 5 are currently assigned.

Halfwidth and Fullwidth Forms <-- Specials --> Linear B Syllabary

Number of characters added in each version of the Unicode standard :
Unicode 1.1 : 1
Unicode 2.1 : 1
Unicode 3.0 : 3

Number of characters in each General Category :

Symbol, Other  So :  2
Other, Format  Cf :  3

All the characters in this code block are in bidirectional category Other Neutral ON

The columns below should be interpreted as :

The Unicode code for the character
The character in question
The Unicode name for the character
The Unicode General Category for the character
The Unicode version when this character was added

If the characters below show up poorly, or not at all, see Unicode Support for possible solutions.

Specials

Interlinear annotation
Used internally for Japanese Ruby (furigana), etc.

U+FFF9 ￹ interlinear annotation anchor Cf 3.0: * marks start of annotated text
U+FFFA ￺ interlinear annotation separator Cf 3.0: * marks start of annotating character(s)
U+FFFB ￻ interlinear annotation terminator Cf 3.0: * marks end of annotation block

Replacement characters

U+FFFC object replacement character So 2.1: * used as placeholder in text for an otherwise unspecified object
U+FFFD � replacement character So 1.1: * used to replace an incoming character whose value is unknown or unrepresentable in Unicode; * compare the use of 001A as a control character to indicate the substitute function

http://unicode.org
Some prose may have been lifted verbatim from unicode.org,
as is permitted by their terms of use at http://www.unicode.org/copyright.html

The Specials	Halfwidth and fullwidth forms	Extras	code block
in defense of police officers	Pretties	Uglies	Roasted barramundi with leek, bacon and apple balsamic vinegar
Yi Syllables II	byte order	November 20, 2002	Unicode 2.1
Combining Diacritical Marks	byte order mark	UTF-32	UTF-16
Palisade Restaurant	The ska 101	My old man	suspicious
Unicode	UTF-8	Ubbi Dubbi	special