Among the
Character Properties defined for each character in the
Unicode standard is the character's
General Category.
Category names are always two characters : the first is an uppercase letter giving the
major class (
letter,
number,
punctuation and the like) and the second is a lowercase letter giving a
subclass within that class. In each class, the subclass "other" merely collects the remaining characters, which generally have little in common besides
membership in the same major class.
Zs, Zl and Zp are considered format characters, but their membership in the Z (separator) class takes precedence over their membership in the Cf class, because each character can have only one General Category
A common use of General Category is to assist in the determination of word breaks and line breaks in text formatting. Each category is followed by the number of code points assigned to that category as of Unicode 3.2.
Lu Letter, Uppercase (1185)
Ll Letter, Lowercase (1350)
Lt Letter, Titlecase (31)
Lm Letter, Modifier (48)
Lo Letter, Other (87343)
Mn Mark, Non Spacing (518)
Mc Mark, Spacing Combining (125)
Me Mark, Enclosing (10)
Nd Number, Decimal Digit (248)
Nl Number, Letter (53)
No Number, Other (235)
Zs Separator, Space (18)
Zl Separator, Line (1)
Zp Separator, Paragraph (1)
Cc Other, Control (65)
Cf Other, Format (131)
Cs Other, Surrogate (2048)
Co Other, Private Use (137468)
Cn Other, not assigned (0)
Pc Punctuation, Connector (11)
Pd Punctuation, Dash (18)
Ps Punctuation, Open (64)
Pe Punctuation, Close (63)
Pi Punctuation, Initial quote (6) (may behave like Ps or Pe depending on usage)
Pf Punctuation, Final quote (4) (may behave like Ps or Pe depending on usage)
Po Punctuation, Other (195)
Sm Symbol, Math (899)
Sc Symbol, Currency (34)
Sk Symbol, Modifier (69)
So Symbol, Other (2496)