General Punctuation is the name of a range of characters defined by the Unicode standard.

See hyphen, quotation marks and space for further discussion of Unicode punctuation.

General Punctuation combines punctuation characters and characterlike elements used to achieve certain text layout effects. Some punctuation characters can be used with many different scripts. More punctuation can be found in the Basic Latin and Latin-1 Supplement code blocks.

Punctuation principally used with a specific script is found in the block corresponding to that script, such as U+061B  ؛  Arabic semicolon.

There are several forms of the apostrophe encoded in the Unicode standard.

U+2027     hyphenation point   is a raised dot used to indicate correct word breaking (like the hyphens in dic-tion-ar-ies). Don't confuse it with U+00B7   ·  middle dot   (Latin-1 Supplement) which has a variety of semantics, although sometimes an identical appearance.

U+2044     fraction slash   is used between digits to form numeric fractions. The exact appearance of such fractions is determined by higher level formatting software.

U+203E     overline   is the above the line counterpart of U+005F   _  low line   (Basic Latin). It is a spacing character, not to be confused with U+0305   ̅  combining overline   (Combining Diacritical Marks). A sequence of over- or under-scores should connect in an unbroken line, thus distinguishing themselves from U+0304   ̄  combining macron   (Combining Diacritical Marks) which does not connect horizontally.


As of Unicode 5.2, Unicode's General Punctuation code block reserves the 112 code points from U+2000 to U+206F, of which 107 are currently assigned.

Greek Extended <-- General Punctuation --> Superscripts and Subscripts
For additional general punctuation characters see also Basic Latin, Latin-1, Supplemental Punctuation and CJK Symbols and Punctuation.

Number of characters added in each version of the Unicode standard :
Unicode 1.1 : 76
Unicode 3.0 : 7
Unicode 3.2 : 12
Unicode 4.0 : 2
Unicode 4.1 : 9
Unicode 5.1 : 1

Number of characters in each General Category :

Punctuation, Connector      Pc :  3
Punctuation, Dash           Pd :  6
Punctuation, Open           Ps :  3
Punctuation, Close          Pe :  1
Punctuation, Initial quote  Pi :  5
Punctuation, Final quote    Pf :  3
Punctuation, Other          Po : 48
Symbol, Math                Sm :  2
Separator, Space            Zs : 13
Separator, Line             Zl :  1
Separator, Paragraph        Zp :  1
Other, Format               Cf : 21

Number of characters in each Bidirectional Category :

Left To Right                 L :  1
Left To Right Embedding     LRE :  1
Left To Right Override      LRO :  1
Right To Left                 R :  1
Right To Left Embedding     RLE :  1
Right To Left Override      RLO :  1
Pop Directional Format      PDF :  1
European Number Terminator   ET :  5
Common Number Separator      CS :  2
Boundary Neutral             BN : 14
Paragraph Separator           B :  1
Whitespace                   WS : 13
Other Neutral                ON : 65

The columns below should be interpreted as :

  1. The Unicode code for the character
  2. The character in question
  3. The Unicode name for the character
  4. The Unicode General Category for the character
  5. The Unicode Bidirectional Category for the character
  6. The Unicode version when this character was added

If the characters below show up poorly, or not at all, see Unicode Support for possible solutions.

 

General Punctuation

     Spaces

U+2000       en quad Zs WS 1.1
U+2001       em quad Zs WS 1.1
aka mutton quad
U+2002       en space Zs WS 1.1
html &ensp;
sgml &ensp;
aka nut
* half an em
U+2003       em space Zs WS 1.1
html &emsp;
sgml &emsp;
aka mutton
* nominally, a space equal to the type size in points
* may scale by the condensation factor of a font
U+2004       three per em space Zs WS 1.1
sgml &emsp13;
aka thick space
U+2005       four per em space Zs WS 1.1
sgml &emsp14;
aka mid space
U+2006       six per em space Zs WS 1.1
* in computer typography sometimes equated to thin space
U+2007       figure space Zs WS 1.1
sgml &numsp;
* space equal to tabular width of a font
* this is equivalent to the digit width of fonts with fixed-width digits
U+2008       punctuation space Zs WS 1.1
sgml &puncsp;
* space equal to narrow punctuation of a font
U+2009       thin space Zs WS 1.1
html &thinsp;
sgml &ThinSpace; &thinsp;
* a fifth of an em (or sometimes a sixth)
ref U+202F       narrow no break space (General Punctuation)
U+200A       hair space Zs WS 1.1
sgml &hairsp;
* thinner than a thin space
* in traditional typography, the thinnest space available
U+200B   ​   zero width space Cf BN 1.1
sgml &NegativeMediumSpace; &NegativeThickSpace; &NegativeThinSpace; &NegativeVeryThinSpace; &ZeroWidthSpace;
* commonly abbreviated ZWSP
* this character is intended for invisible word separation and for line break control; it has no width, but its presence between two characters does not prevent increased letter spacing in justification

     Format characters

U+200C   ‌   zero width non joiner Cf BN 1.1
html &zwnj;
sgml &zwnj;
* commonly abbreviated ZWNJ
U+200D   ‍   zero width joiner Cf BN 1.1
html &zwj;
sgml &zwj;
* commonly abbreviated ZWJ
U+200E   ‎   left to right mark Cf L 1.1
html &lrm;
sgml &lrm;
* commonly abbreviated LRM
U+200F   ‏   right to left mark Cf R 1.1
html &rlm;
sgml &rlm;
* commonly abbreviated RLM

     Dashes

U+2010   ‐   hyphen Pd ON 1.1
sgml &dash; &hyphen;
ref U+002D   -   hyphen minus (Basic Latin)
ref U+00AD   ­   soft hyphen (Latin-1 Supplement)
U+2011   ‑   non breaking hyphen Pd ON 1.1
ref U+002D   -   hyphen minus (Basic Latin)
ref U+00AD   ­   soft hyphen (Latin-1 Supplement)
U+2012   ‒   figure dash Pd ON 1.1
U+2013   –   en dash Pd ON 1.1
html &ndash;
sgml &ndash;
U+2014   —   em dash Pd ON 1.1
html &mdash;
sgml &mdash;
* may be used in pairs to offset parenthetical text
ref U+30FC   ー   Katakana Hiragana prolonged sound mark (Katakana)
U+2015   ―   horizontal bar Pd ON 1.1
sgml &horbar;
aka quotation dash
* long dash introducing quoted text

     General punctuation

U+2016   ‖   double vertical line Po ON 1.1
sgml &Verbar; &Vert;
* used in pairs to indicate norm of a matrix
ref U+20E6   ⃦   combining double vertical stroke overlay (Combining Diacritical Marks for Symbols)
ref U+2225   ∥   parallel to (Mathematical Operators)
U+2017   ‗   double low line Po ON 1.1
* this is a spacing character
ref U+005F   _   low line (Basic Latin)
ref U+0333   ̳   combining double low line (Combining Diacritical Marks)
U+2018   ‘   left single quotation mark Pi ON 1.1
html &lsquo;
sgml &OpenCurlyQuote; &lsquo; &rsquor;
aka single turned comma quotation mark
* this is the preferred character (as opposed to 201B)
ref U+0027   '   apostrophe (Basic Latin)
ref U+02BB   ʻ   modifier letter turned comma (Spacing Modifier Letters)
ref U+275B   ❛   heavy single turned comma quotation mark ornament (Dingbats)
U+2019   ’   right single quotation mark Pf ON 1.1
html &rsquo;
sgml &CloseCurlyQuote; &rsquo;
aka single comma quotation mark
* this is the preferred character to use for apostrophe
ref U+0027   '   apostrophe (Basic Latin)
ref U+02BC   ʼ   modifier letter apostrophe (Spacing Modifier Letters)
ref U+275C   ❜   heavy single comma quotation mark ornament (Dingbats)
U+201A   ‚   single low 9 quotation mark Ps ON 1.1
html &sbquo;
sgml &lsquor; &sbquo;
aka low single comma quotation mark
* used as opening single quotation mark in some languages
U+201B   ‛   single high reversed 9 quotation mark Pi ON 1.1
sgml &rdquor;
aka single reversed comma quotation mark
* has same semantic as 2018, but differs in appearance
ref U+02BD   ʽ   modifier letter reversed comma (Spacing Modifier Letters)
U+201C   “   left double quotation mark Pi ON 1.1
html &ldquo;
sgml &OpenCurlyDoubleQuote; &ldquo; &rdquor;
aka double turned comma quotation mark
* this is the preferred character (as opposed to 201F)
ref U+0022   "   quotation mark (Basic Latin)
ref U+275D   ❝   heavy double turned comma quotation mark ornament (Dingbats)
ref U+301D   〝   reversed double prime quotation mark (CJK Symbols and Punctuation)
U+201D   ”   right double quotation mark Pf ON 1.1
html &rdquo;
sgml &CloseCurlyDoubleQuote; &rdquo;
aka double comma quotation mark
ref U+0022   "   quotation mark (Basic Latin)
ref U+2033   ″   double prime (General Punctuation)
ref U+275E   ❞   heavy double comma quotation mark ornament (Dingbats)
ref U+301E   〞   double prime quotation mark (CJK Symbols and Punctuation)
U+201E   „   double low 9 quotation mark Ps ON 1.1
html &bdquo;
sgml &bdquo; &ldquor;
aka low double comma quotation mark
* used as opening double quotation mark in some languages
ref U+301F   〟   low double prime quotation mark (CJK Symbols and Punctuation)
U+201F   ‟   double high reversed 9 quotation mark Pi ON 1.1
sgml &rsquor;
aka double reversed comma quotation mark
* has same semantic as 201C, but differs in appearance
U+2020   †   dagger Po ON 1.1
html &dagger;
sgml &dagger;
aka obelisk, obelus, long cross
U+2021   ‡   double dagger Po ON 1.1
html &Dagger;
sgml &Dagger; &ddagger;
aka diesis, double obelisk
U+2022   •   bullet Po ON 1.1
html &bull;
sgml &bull; &bullet;
aka black small circle
ref U+00B7   ·   middle dot (Latin-1 Supplement)
ref U+2024   ․   one dot leader (General Punctuation)
ref U+2219   ∙   bullet operator (Mathematical Operators)
ref U+25D8   ◘   inverse bullet (Geometric Shapes)
ref U+25E6   ◦   white bullet (Geometric Shapes)
U+2023   ‣   triangular bullet Po ON 1.1
ref U+220E   ∎   end of proof (Mathematical Operators)
ref U+25B8   ▸   black right pointing small triangle (Geometric Shapes)
U+2024   ․   one dot leader Po ON 1.1
* also used as an Armenian semicolon (mijaket)
ref U+00B7   ·   middle dot (Latin-1 Supplement)
ref U+2022   •   bullet (General Punctuation)
ref U+2219   ∙   bullet operator (Mathematical Operators)
U+2025   ‥   two dot leader Po ON 1.1
sgml &nldr;
U+2026   …   horizontal ellipsis Po ON 1.1
html &hellip;
sgml &hellip; &mldr;
aka three dot leader
ref U+22EE   ⋮   vertical ellipsis (Mathematical Operators)
ref U+FE19   ︙   presentation form for vertical horizontal ellipsis (Vertical Forms)
U+2027   ‧   hyphenation point Po ON 1.1

     Format characters

U+2028   
   line separator Zl WS 1.1
* may be used to represent this semantic unambiguously
U+2029   
   paragraph separator Zp B 1.1
* may be used to represent this semantic unambiguously
U+202A   ‪   left to right embedding Cf LRE 1.1
* commonly abbreviated LRE
U+202B   ‫   right to left embedding Cf RLE 1.1
* commonly abbreviated RLE
U+202C   ‬   pop directional formatting Cf PDF 1.1
* commonly abbreviated PDF
U+202D   ‭   left to right override Cf LRO 1.1
* commonly abbreviated LRO
U+202E   ‮   right to left override Cf RLO 1.1
* commonly abbreviated RLO
U+202F       narrow no break space Zs CS 3.0
* commonly abbreviated NNBSP
* a narrow form of a no-break space, typically the width of a thin space or a mid space
ref U+00A0       no break space (Latin-1 Supplement)
ref U+2005       four per em space (General Punctuation)
ref U+2009       thin space (General Punctuation)

     General punctuation

U+2030   ‰   per mille sign Po ET 1.1
html &permil;
sgml &permil;
aka permille, per thousand
* used, for example, in measures of blood alcohol content, salinity, etc.
ref U+0025   %   percent sign (Basic Latin)
ref U+0609   ؉   Arabic indic per mille sign (Arabic)
U+2031   ‱   per ten thousand sign Po ET 1.1
sgml &pertenk;
aka permyriad
* percent of a percent, rarely used
ref U+0025   %   percent sign (Basic Latin)
ref U+060A   ؊   Arabic indic per ten thousand sign (Arabic)
U+2032   ′   prime Po ET 1.1
html &prime;
sgml &prime; &vprime;
aka minutes, feet
ref U+0027   '   apostrophe (Basic Latin)
ref U+00B4   ´   acute accent (Latin-1 Supplement)
ref U+02B9   ʹ   modifier letter prime (Spacing Modifier Letters)
U+2033   ″   double prime Po ET 1.1
html &Prime;
sgml &Prime;
aka seconds, inches
ref U+0022   "   quotation mark (Basic Latin)
ref U+02BA   ʺ   modifier letter double prime (Spacing Modifier Letters)
ref U+201D   ”   right double quotation mark (General Punctuation)
ref U+3003   〃   ditto mark (CJK Symbols and Punctuation)
ref U+301E   〞   double prime quotation mark (CJK Symbols and Punctuation)
U+2034   ‴   triple prime Po ET 1.1
sgml &tprime;
aka lines (old measure, 1/12 of an inch)
U+2035   ‵   reversed prime Po ON 1.1
sgml &backprime; &bprime;
ref U+0060   `   grave accent (Basic Latin)
U+2036   ‶   reversed double prime Po ON 1.1
ref U+301D   〝   reversed double prime quotation mark (CJK Symbols and Punctuation)
U+2037   ‷   reversed triple prime Po ON 1.1
U+2038   ‸   caret Po ON 1.1
sgml &caret;
ref U+2303   ⌃   up arrowhead (Miscellaneous Technical)
ref U+A788   ꞈ   modifier letter low circumflex accent (Latin Extended D)
U+2039   ‹   single left pointing angle quotation mark Pi ON 1.1
html &lsaquo;
sgml &lsaquo;
aka left pointing single guillemet
* usually opening, sometimes closing
ref U+003C   <   less than sign (Basic Latin)
ref U+2329   〈   left pointing angle bracket (Miscellaneous Technical)
ref U+3008   〈   left angle bracket (CJK Symbols and Punctuation)
U+203A   ›   single right pointing angle quotation mark Pf ON 1.1
html &rsaquo;
sgml &rsaquo;
aka right pointing single guillemet
* usually closing, sometimes opening
ref U+003E   >   greater than sign (Basic Latin)
ref U+232A   〉   right pointing angle bracket (Miscellaneous Technical)
ref U+3009   〉   right angle bracket (CJK Symbols and Punctuation)
U+203B   ※   reference mark Po ON 1.1
aka japanese kome
aka urdu paragraph separator
ref U+0FBF   ྿   Tibetan ku ru kha bzhi mig can (Tibetan)
ref U+200AD   𠂭   CJK Ideograph 200AD (CJK Unified Ideographs Extension B)

     Double punctuation for vertical text

U+203C   ‼   double exclamation mark Po ON 1.1
ref U+0021   !   exclamation mark (Basic Latin)

     General punctuation

U+203D   ‽   interrobang Po ON 1.1
ref U+0021   !   exclamation mark (Basic Latin)
ref U+003F   ?   question mark (Basic Latin)
ref U+2E18   ⸘   inverted interrobang (Supplemental Punctuation)
U+203E   ‾   overline Po ON 1.1
html &oline;
sgml &oline;
aka spacing overscore
U+203F   ‿   undertie Pc ON 1.1
aka Greek enotikon
ref U+2323   ⌣   smile (Miscellaneous Technical)
U+2040   ⁀   character tie Pc ON 1.1
aka z notation sequence concatenation
ref U+2322   ⌢   frown (Miscellaneous Technical)
U+2041   ⁁   caret insertion point Po ON 1.1
sgml &caret;
* proofreader's mark: insert here
ref U+22CC   ⋌   right semidirect product (Mathematical Operators)
U+2042   ⁂   asterism Po ON 1.1
U+2043   ⁃   hyphen bullet Po ON 1.1
sgml &hybull;
U+2044   ⁄   fraction slash Sm CS 1.1
html &frasl;
sgml &frasl;
aka solidus (in typography)
* for composing arbitrary fractions
ref U+002F   /   solidus (Basic Latin)
ref U+2215   ∕   division slash (Mathematical Operators)
U+2045   ⁅   left square bracket with quill Ps ON 1.1
U+2046   ⁆   right square bracket with quill Pe ON 1.1

     Double punctuation for vertical text

U+2047   ⁇   double question mark Po ON 3.2
U+2048   ⁈   question exclamation mark Po ON 3.0
U+2049   ⁉   exclamation question mark Po ON 3.0

     General punctuation

U+204A   ⁊   tironian sign et Po ON 3.0
* Irish Gaelic, Old English, ...
ref U+0026   &   ampersand (Basic Latin)
U+204B   ⁋   reversed pilcrow sign Po ON 3.0
ref U+00B6   ¶   pilcrow sign (Latin-1 Supplement)
U+204C   ⁌   black leftwards bullet Po ON 3.0
U+204D   ⁍   black rightwards bullet Po ON 3.0
U+204E   ⁎   low asterisk Po ON 3.2
ref U+002A   *   asterisk (Basic Latin)
ref U+0359   ͙   combining asterisk below (Combining Diacritical Marks)
U+204F   ⁏   reversed semicolon Po ON 3.2
sgml &bsemi;
ref U+003B   ;   semicolon (Basic Latin)
U+2050   ⁐   close up Po ON 3.2
* editing mark
U+2051   ⁑   two asterisks aligned vertically Po ON 3.2
U+2052   ⁒   commercial minus sign Sm ON 3.2
aka abz?glich (german), med avdrag av (swedish), piska (swedish, "whip")
* a common glyph variant and fallback representation looks like ./.
* may also be used as a dingbat to indicate correctness
* used in Finno-Ugric Phonetic Alphabet to indicate a related borrowed form with different sound
ref U+0025   %   percent sign (Basic Latin)
ref U+066A   ٪   Arabic percent sign (Arabic)
U+2053   ⁓   swung dash Po ON 4.0
ref U+007E   ~   tilde (Basic Latin)
U+2054   ⁔   inverted undertie Pc ON 4.0
U+2055   ⁕   flower punctuation mark Po ON 4.1
aka phul, puspika
* used as a punctuation mark with Syloti Nagri, Bengali and other Indic scripts
ref U+274B   ❋   heavy eight teardrop spoked propeller asterisk (Dingbats)

     Archaic punctuation

U+2056   ⁖   three dot punctuation Po ON 4.1

     General punctuation

U+2057   ⁗   quadruple prime Po ON 3.2
sgml &qprime;

     Archaic punctuation

U+2058   ⁘   four dot punctuation Po ON 4.1
U+2059   ⁙   five dot punctuation Po ON 4.1
aka Greek pentonkion
aka quincunx
ref U+2684   ⚄   die face 5 (Miscellaneous Symbols)
U+205A   ⁚   two dot punctuation Po ON 4.1
* historically used to indicate the end of a sentence or change of speaker
* extends from baseline to cap height
ref U+FE30   ︰   presentation form for vertical two dot leader (CJK Compatibility Forms)
ref U+1015B   𐅛   Greek acrophonic epidaurean two (Ancient Greek Numbers)
U+205B   ⁛   four dot mark Po ON 4.1
* used by scribes in the margin as highlighter mark
* this is centered on the line, but extends beyond top and bottom of the line
U+205C   ⁜   dotted cross Po ON 4.1
* used by scribes in the margin as highlighter mark
U+205D   ⁝   tricolon Po ON 4.1
aka epidaurean acrophonic symbol three
ref U+22EE   ⋮   vertical ellipsis (Mathematical Operators)
ref U+2AF6   ⫶   triple colon operator (Supplemental Mathematical Operators)
ref U+FE19   ︙   presentation form for vertical horizontal ellipsis (Vertical Forms)
U+205E   ⁞   vertical four dots Po ON 4.1
* used in dictionaries to indicate legal but undesirable word break
* glyph extends the whole height of the line

     Space

U+205F       medium mathematical space Zs WS 3.2
sgml &MediumSpace;
* abbreviated MMSP
* four-eighteenths of an em

     Format character

U+2060   ⁠   word joiner Cf BN 3.2
sgml &NoBreak;
* commonly abbreviated WJ
* a zero width non-breaking space (only)
* intended for disambiguation of functions for byte order mark
ref U+FEFF      zero width no break space (Arabic Presentation Forms B)

     Invisible operators

U+2061   ⁡   function application Cf BN 3.2
sgml &ApplyFunction;
* contiguity operator indicating application of a function
U+2062   ⁢   invisible times Cf BN 3.2
sgml &InvisibleTimes;
* contiguity operator indicating multiplication
U+2063   ⁣   invisible separator Cf BN 3.2
sgml &InvisibleComma; &ic;
aka invisible comma
* contiguity operator indicating that adjacent mathematical symbols form a list, e.g. when no visible comma is used between multiple indices
U+2064   ⁤   invisible plus Cf BN 5.1
* contiguity operator indicating addition

     Deprecated
Use of these characters is strongly discouraged.

U+206A      inhibit symmetric swapping Cf BN 1.1
U+206B      activate symmetric swapping Cf BN 1.1
U+206C      inhibit Arabic form shaping Cf BN 1.1
U+206D      activate Arabic form shaping Cf BN 1.1
U+206E      national digit shapes Cf BN 1.1
U+206F      nominal digit shapes Cf BN 1.1

http://unicode.org
Some prose may have been lifted verbatim from unicode.org,
as is permitted by their terms of use at http://www.unicode.org/copyright.html

Log in or register to write something here or to contact authors.