A
Unicode Technical Report (
UTR) may contain either
informative material or
normative specifications, or both. Each UTR may specify a
base version of the Unicode Standard. In that case,
conformance to the UTR requires conformance to that
version of the standard or higher.
There are two specially distinguished types of approved Unicode Technical Reports that are given more authoritative status by the Unicode Consortium.
A Unicode Standard Annex (UAX) forms an integral part of the Unicode Standard, but is published as a separate document. Note that conformance to a version of the Unicode Standard includes conformance to its Unicode Standard Annexes. The version number of a UAX document corresponds to the version number of the Unicode Standard at the last point that the UAX document was updated.
A Unicode Technical Standard (UTS) is an independent specification. Conformance to the Unicode Standard does not imply conformance to any UTS. Each UTS specifies a base version of the Unicode Standard.
A Draft Unicode Technical Report (DUTR) has the basic structure and content required, but still is not ready for final approval.
A Proposed Draft Unicode Technical Report (PDUTR) is in the earliest stages of development.
Last modified November 25, 1992
Proposals for the Burmese, Khmer and Ethiopian code blocks
Superceded: Included in Unicode 1.1
Last modified 1992
Proposals for the Sinhala, Tibetan and Mongolian code blocks
Superceded: Included in Unicode 1.1
Last modified 1993
Proposes quite a few interesting and obscure code blocks
Superceded: Included in Unicode 2.0
Described version 1.1 of the Unicode standard. Never available online.
How to handle Combining Diacritical Marks. Never available online.
Superceded: Included in Unicode 1.1
Last modified May 8, 2002
This report presents the specifications of a compression scheme for Unicode and sample implementation.
Last modified March 23, 2001
Superceded: Included in Unicode 3.1 as the Tags code block
U+E0000 to U+E007F.
Last modified November 21, 1999
This report documents the Unicode Standard, Version 2.1.
Last modified March 27, 2002
Describes specifications for the positioning of characters flowing from right to left, such as Arabic or Hebrew.
Last modified July 16, 2002
Provides the specification of the Unicode Collation Algorithm, which provides a specification for how to compare two Unicode strings while remaining conformant to the requirements of The Unicode Standard, Version 3.0.
Last modified March 15, 2001
This report presents the specifications of a informative property for Unicode characters that is useful when inter-operating with East Asian legacy character sets. Halfwidth vs fullwidth.
Withdrawn
Last modified March 27, 2002
This document describes guidelines for how to handle different characters used to represent CRLF and other representations of new lines on different platforms.
Last modified March 15, 2002
This report presents the specification of line breaking properties for Unicode characters.
Last modified March 26, 2002
This document describes specifications for four normalized forms of Unicode text. With these forms, equivalent text (canonical or compatibility) will have identical binary representations. When implementations keep strings in a normalized form, they can be assured that equivalent strings have a unique binary representation.
Unicode Technical Report #16 : UTF-EBCDIC
Last modified April 16, 2002
This document presents the specifications of UTF-EBCDIC - EBCDIC Friendly Unicode (or UCS) Transformation Format.
Last modified August 31, 2000
This document clarifies a number of the terms used to describe character encodings, and where the different forms of Unicode fit in. It elaborates the Internet Architecture Board (IAB) three-layer text stream definitions into a five-layer structure.
Last modified April 21, 2001
This document describes guidelines for how to adapt regular expression engines to use Unicode.
Unicode Standard Annex #19 : UTF-32
Last modified March 27, 2002
This document specifies a Unicode transformation format that serializes a Unicode code point (from U+0000 to U+10FFFF) as a sequence of four bytes.
Last modified February 18, 2002
Also known as W3C Note 18 February 2002
This document contains guidelines on the use of the Unicode Standard in conjunction with markup languages such as XML.
This is a Technical Report published jointly by the Unicode Technical Committee and by the W3C Internationalization Working Group/Interest Group (W3C Members only) in the context of the W3C Internationalization Activity.
Unicode Standard Annex #21 : Case Mappings
Last modified March 26, 2001
This document presents requirements for default case operations: case conversion, case detection, and caseless matching. These are the default definitions to be used in the absence of tailoring for particular languages and environments.
Last modified December 1, 2000
This document specifies an XML format for the interchange of mapping data for character encodings. It provides a complete description for such mappings in terms of a defined mapping to and from Unicode, and a description of alias tables for the interchange of mapping table names.
Proposed Draft Unicode Technical Report #23 : Character Properties
Last modified April 23, 2002
This report presents a survey of the character properties defined in the Unicode Standard as well as guidelines to their usage.
Unicode Technical Report #24 : Script Names
Last modified April 1, 2002
This document provides an assignment of script names to all Unicode code points. This information is useful in mechanisms such as regular expressions, where it produces much better results than simple matches on block names.
See also ISO 15924
Last modified May 8, 2002
Starting with version 3.2, Unicode includes virtually all of the standard characters used in mathematics. This set supports a variety of math applications on computers, including document presentation languages like TeX, math markup languages like MathML, computer algebra languages like OpenMath, internal representations of mathematics in systems like Mathematica and MathCAD, computer programs, and plain text. This technical report describes the Unicode mathematics character groups and gives some of their default math properties
Last modified April 16, 2002
This document specifies an 8-bit Compatibility Encoding Scheme for UTF-16 (CESU) that is intended for internal use within systems processing Unicode in order to provide an ASCII-compatible 8-bit encoding that is similar to UTF-8 but preserves UTF-16 binary collation. It is not intended nor recommended as an encoding used for open information exchange. The Unicode Consortium, does not encourage the use of CESU-8, but does recognize the existence of data in this encoding and supplies this technical report to clearly define the format and to distinguish it from UTF-8. This encoding does not replace or amend the definition of UTF-8.
Unicode Standard Annex #27 : Unicode 3.1
Last modified May 16, 2001
This document defines Version 3.1 of the Unicode Standard. It overrides certain features of Unicode 3.0.1, and adds a large number of coded characters.
Unicode Standard Annex #28 : Unicode 3.2
Last modified March 27, 2002
This document defines Version 3.2 of the Unicode Standard.
Draft Unicode Technical Report #29 : Text Boundaries
Last Modified August 9, 2002
This document describes guidelines for determining default boundaries between certain significant text elements: grapheme clusters (“user characters”), words, and sentences. For line-break boundaries, see Unicode Standard Annex #14 : Line Breaking Properties.
Proposed Draft Unicode Technical Report #30 : Character Foldings
Last modified May 8, 2002
This report identifies a set of character foldings, in other words, operations that map similar characters to a common target. Such operations are used to ignore certain distinctions between similar characters. The report also a provides an algorithm for applying these operations to searching.
http://unicode.org