General Punctuation is the name of a range of characters defined by the
Unicode standard.
See hyphen, quotation marks and space for further discussion of Unicode punctuation.
General Punctuation combines punctuation characters and characterlike elements used to achieve certain text layout effects. Some punctuation characters can be used with many different scripts. More punctuation can be found in the Basic Latin and Latin-1 Supplement code blocks.
Punctuation principally used with a specific script is found in the block corresponding to that script, such as U+061B ؛ Arabic semicolon.
There are several forms of the apostrophe encoded in the Unicode standard.
U+2027
‧ hyphenation point is a raised
dot used to indicate correct word breaking (like the hyphens in dic-tion-ar-ies). Don't confuse it with U+00B7
· middle dot (
Latin-1 Supplement) which has a variety of semantics, although sometimes an identical appearance.
U+2044 ⁄ fraction slash is used between digits to form numeric fractions. The exact appearance of such fractions is determined by higher level formatting software.
U+203E ‾ overline is the above the line counterpart of
U+005F _ low line (Basic Latin). It is a spacing character, not to be confused with
U+0305 ̅ combining overline (Combining Diacritical Marks). A sequence of over- or under-scores should connect in an unbroken line, thus distinguishing themselves from U+0304 ̄ combining macron (Combining Diacritical Marks) which does not connect horizontally.
As of
Unicode 5.2,
Unicode's
General Punctuation code block reserves the
112 code points from U+2000 to U+206F, of which
107 are currently assigned.
Greek Extended <-- General Punctuation --> Superscripts and Subscripts
For additional general punctuation characters see also Basic Latin, Latin-1, Supplemental Punctuation and CJK Symbols and Punctuation.
Number of characters added in each version of the Unicode standard :
Unicode 1.1 : 76
Unicode 3.0 : 7
Unicode 3.2 : 12
Unicode 4.0 : 2
Unicode 4.1 : 9
Unicode 5.1 : 1
Number of characters in each General Category :
Punctuation, Connector Pc : 3
Punctuation, Dash Pd : 6
Punctuation, Open Ps : 3
Punctuation, Close Pe : 1
Punctuation, Initial quote Pi : 5
Punctuation, Final quote Pf : 3
Punctuation, Other Po : 48
Symbol, Math Sm : 2
Separator, Space Zs : 13
Separator, Line Zl : 1
Separator, Paragraph Zp : 1
Other, Format Cf : 21
Number of characters in each Bidirectional Category :
Left To Right L : 1
Left To Right Embedding LRE : 1
Left To Right Override LRO : 1
Right To Left R : 1
Right To Left Embedding RLE : 1
Right To Left Override RLO : 1
Pop Directional Format PDF : 1
European Number Terminator ET : 5
Common Number Separator CS : 2
Boundary Neutral BN : 14
Paragraph Separator B : 1
Whitespace WS : 13
Other Neutral ON : 65
The columns below should be interpreted as :
- The Unicode code for the character
- The character in question
- The Unicode name for the character
- The Unicode General Category for the character
- The Unicode Bidirectional Category for the character
- The Unicode version when this character was added
If the characters below show up poorly, or not at all, see Unicode Support for possible solutions.
General Punctuation
Spaces
- U+2000 en quad Zs WS 1.1
- U+2001 em quad Zs WS 1.1
- aka mutton quad
- U+2002 en space Zs WS 1.1
- html  
- sgml  
- aka nut
- * half an em
- U+2003 em space Zs WS 1.1
- html  
- sgml  
- aka mutton
- * nominally, a space equal to the type size in points
- * may scale by the condensation factor of a font
- U+2004 three per em space Zs WS 1.1
- sgml  
- aka thick space
- U+2005 four per em space Zs WS 1.1
- sgml  
- aka mid space
- U+2006 six per em space Zs WS 1.1
- * in computer typography sometimes equated to thin space
- U+2007 figure space Zs WS 1.1
- sgml  
- * space equal to tabular width of a font
- * this is equivalent to the digit width of fonts with fixed-width digits
- U+2008 punctuation space Zs WS 1.1
- sgml  
- * space equal to narrow punctuation of a font
- U+2009 thin space Zs WS 1.1
- html  
- sgml    
- * a fifth of an em (or sometimes a sixth)
- ref U+202F narrow no break space (General Punctuation)
- U+200A hair space Zs WS 1.1
- sgml  
- * thinner than a thin space
- * in traditional typography, the thinnest space available
- U+200B zero width space Cf BN 1.1
- sgml ​ ​ ​ ​ ​
- * commonly abbreviated ZWSP
- * this character is intended for invisible word separation and for line break control; it has no width, but its presence between two characters does not prevent increased letter spacing in justification
Format characters
- U+200C zero width non joiner Cf BN 1.1
- html ‌
- sgml ‌
- * commonly abbreviated ZWNJ
- U+200D zero width joiner Cf BN 1.1
- html ‍
- sgml ‍
- * commonly abbreviated ZWJ
- U+200E left to right mark Cf L 1.1
- html ‎
- sgml ‎
- * commonly abbreviated LRM
- U+200F right to left mark Cf R 1.1
- html ‏
- sgml ‏
- * commonly abbreviated RLM
Dashes
- U+2010 ‐ hyphen Pd ON 1.1
- sgml ‐ ‐
- ref U+002D - hyphen minus (Basic Latin)
- ref U+00AD soft hyphen (Latin-1 Supplement)
- U+2011 ‑ non breaking hyphen Pd ON 1.1
- ref U+002D - hyphen minus (Basic Latin)
- ref U+00AD soft hyphen (Latin-1 Supplement)
- U+2012 ‒ figure dash Pd ON 1.1
- U+2013 – en dash Pd ON 1.1
- html –
- sgml –
- U+2014 — em dash Pd ON 1.1
- html —
- sgml —
- * may be used in pairs to offset parenthetical text
- ref U+30FC ー Katakana Hiragana prolonged sound mark (Katakana)
- U+2015 ― horizontal bar Pd ON 1.1
- sgml ―
- aka quotation dash
- * long dash introducing quoted text
General punctuation
- U+2016 ‖ double vertical line Po ON 1.1
- sgml ‖ ‖
- * used in pairs to indicate norm of a matrix
- ref U+20E6 ⃦ combining double vertical stroke overlay (Combining Diacritical Marks for Symbols)
- ref U+2225 ∥ parallel to (Mathematical Operators)
- U+2017 ‗ double low line Po ON 1.1
- * this is a spacing character
- ref U+005F _ low line (Basic Latin)
- ref U+0333 ̳ combining double low line (Combining Diacritical Marks)
- U+2018 ‘ left single quotation mark Pi ON 1.1
- html ‘
- sgml ‘ ‘ ’
- aka single turned comma quotation mark
- * this is the preferred character (as opposed to 201B)
- ref U+0027 ' apostrophe (Basic Latin)
- ref U+02BB ʻ modifier letter turned comma (Spacing Modifier Letters)
- ref U+275B ❛ heavy single turned comma quotation mark ornament (Dingbats)
- U+2019 ’ right single quotation mark Pf ON 1.1
- html ’
- sgml ’ ’
- aka single comma quotation mark
- * this is the preferred character to use for apostrophe
- ref U+0027 ' apostrophe (Basic Latin)
- ref U+02BC ʼ modifier letter apostrophe (Spacing Modifier Letters)
- ref U+275C ❜ heavy single comma quotation mark ornament (Dingbats)
- U+201A ‚ single low 9 quotation mark Ps ON 1.1
- html ‚
- sgml ‚ ‚
- aka low single comma quotation mark
- * used as opening single quotation mark in some languages
- U+201B ‛ single high reversed 9 quotation mark Pi ON 1.1
- sgml ”
- aka single reversed comma quotation mark
- * has same semantic as 2018, but differs in appearance
- ref U+02BD ʽ modifier letter reversed comma (Spacing Modifier Letters)
- U+201C “ left double quotation mark Pi ON 1.1
- html “
- sgml “ “ ”
- aka double turned comma quotation mark
- * this is the preferred character (as opposed to 201F)
- ref U+0022 " quotation mark (Basic Latin)
- ref U+275D ❝ heavy double turned comma quotation mark ornament (Dingbats)
- ref U+301D 〝 reversed double prime quotation mark (CJK Symbols and Punctuation)
- U+201D ” right double quotation mark Pf ON 1.1
- html ”
- sgml ” ”
- aka double comma quotation mark
- ref U+0022 " quotation mark (Basic Latin)
- ref U+2033 ″ double prime (General Punctuation)
- ref U+275E ❞ heavy double comma quotation mark ornament (Dingbats)
- ref U+301E 〞 double prime quotation mark (CJK Symbols and Punctuation)
- U+201E „ double low 9 quotation mark Ps ON 1.1
- html „
- sgml „ „
- aka low double comma quotation mark
- * used as opening double quotation mark in some languages
- ref U+301F 〟 low double prime quotation mark (CJK Symbols and Punctuation)
- U+201F ‟ double high reversed 9 quotation mark Pi ON 1.1
- sgml ’
- aka double reversed comma quotation mark
- * has same semantic as 201C, but differs in appearance
- U+2020 † dagger Po ON 1.1
- html †
- sgml †
- aka obelisk, obelus, long cross
- U+2021 ‡ double dagger Po ON 1.1
- html ‡
- sgml ‡ ‡
- aka diesis, double obelisk
- U+2022 • bullet Po ON 1.1
- html •
- sgml • •
- aka black small circle
- ref U+00B7 · middle dot (Latin-1 Supplement)
- ref U+2024 ․ one dot leader (General Punctuation)
- ref U+2219 ∙ bullet operator (Mathematical Operators)
- ref U+25D8 ◘ inverse bullet (Geometric Shapes)
- ref U+25E6 ◦ white bullet (Geometric Shapes)
- U+2023 ‣ triangular bullet Po ON 1.1
- ref U+220E ∎ end of proof (Mathematical Operators)
- ref U+25B8 ▸ black right pointing small triangle (Geometric Shapes)
- U+2024 ․ one dot leader Po ON 1.1
- * also used as an Armenian semicolon (mijaket)
- ref U+00B7 · middle dot (Latin-1 Supplement)
- ref U+2022 • bullet (General Punctuation)
- ref U+2219 ∙ bullet operator (Mathematical Operators)
- U+2025 ‥ two dot leader Po ON 1.1
- sgml ‥
- U+2026 … horizontal ellipsis Po ON 1.1
- html …
- sgml … …
- aka three dot leader
- ref U+22EE ⋮ vertical ellipsis (Mathematical Operators)
- ref U+FE19 ︙ presentation form for vertical horizontal ellipsis (Vertical Forms)
- U+2027 ‧ hyphenation point Po ON 1.1
Format characters
- U+2028
line separator Zl WS 1.1
- * may be used to represent this semantic unambiguously
- U+2029
paragraph separator Zp B 1.1
- * may be used to represent this semantic unambiguously
- U+202A left to right embedding Cf LRE 1.1
- * commonly abbreviated LRE
- U+202B right to left embedding Cf RLE 1.1
- * commonly abbreviated RLE
- U+202C pop directional formatting Cf PDF 1.1
- * commonly abbreviated PDF
- U+202D left to right override Cf LRO 1.1
- * commonly abbreviated LRO
- U+202E right to left override Cf RLO 1.1
- * commonly abbreviated RLO
- U+202F narrow no break space Zs CS 3.0
- * commonly abbreviated NNBSP
- * a narrow form of a no-break space, typically the width of a thin space or a mid space
- ref U+00A0 no break space (Latin-1 Supplement)
- ref U+2005 four per em space (General Punctuation)
- ref U+2009 thin space (General Punctuation)
General punctuation
- U+2030 ‰ per mille sign Po ET 1.1
- html ‰
- sgml ‰
- aka permille, per thousand
- * used, for example, in measures of blood alcohol content, salinity, etc.
- ref U+0025 % percent sign (Basic Latin)
- ref U+0609 ؉ Arabic indic per mille sign (Arabic)
- U+2031 ‱ per ten thousand sign Po ET 1.1
- sgml ‱
- aka permyriad
- * percent of a percent, rarely used
- ref U+0025 % percent sign (Basic Latin)
- ref U+060A ؊ Arabic indic per ten thousand sign (Arabic)
- U+2032 ′ prime Po ET 1.1
- html ′
- sgml ′ &vprime;
- aka minutes, feet
- ref U+0027 ' apostrophe (Basic Latin)
- ref U+00B4 ´ acute accent (Latin-1 Supplement)
- ref U+02B9 ʹ modifier letter prime (Spacing Modifier Letters)
- U+2033 ″ double prime Po ET 1.1
- html ″
- sgml ″
- aka seconds, inches
- ref U+0022 " quotation mark (Basic Latin)
- ref U+02BA ʺ modifier letter double prime (Spacing Modifier Letters)
- ref U+201D ” right double quotation mark (General Punctuation)
- ref U+3003 〃 ditto mark (CJK Symbols and Punctuation)
- ref U+301E 〞 double prime quotation mark (CJK Symbols and Punctuation)
- U+2034 ‴ triple prime Po ET 1.1
- sgml ‴
- aka lines (old measure, 1/12 of an inch)
- U+2035 ‵ reversed prime Po ON 1.1
- sgml ‵ ‵
- ref U+0060 ` grave accent (Basic Latin)
- U+2036 ‶ reversed double prime Po ON 1.1
- ref U+301D 〝 reversed double prime quotation mark (CJK Symbols and Punctuation)
- U+2037 ‷ reversed triple prime Po ON 1.1
- U+2038 ‸ caret Po ON 1.1
- sgml ⁁
- ref U+2303 ⌃ up arrowhead (Miscellaneous Technical)
- ref U+A788 ꞈ modifier letter low circumflex accent (Latin Extended D)
- U+2039 ‹ single left pointing angle quotation mark Pi ON 1.1
- html ‹
- sgml ‹
- aka left pointing single guillemet
- * usually opening, sometimes closing
- ref U+003C < less than sign (Basic Latin)
- ref U+2329 〈 left pointing angle bracket (Miscellaneous Technical)
- ref U+3008 〈 left angle bracket (CJK Symbols and Punctuation)
- U+203A › single right pointing angle quotation mark Pf ON 1.1
- html ›
- sgml ›
- aka right pointing single guillemet
- * usually closing, sometimes opening
- ref U+003E > greater than sign (Basic Latin)
- ref U+232A 〉 right pointing angle bracket (Miscellaneous Technical)
- ref U+3009 〉 right angle bracket (CJK Symbols and Punctuation)
- U+203B ※ reference mark Po ON 1.1
- aka japanese kome
- aka urdu paragraph separator
- ref U+0FBF ྿ Tibetan ku ru kha bzhi mig can (Tibetan)
- ref U+200AD 𠂭 CJK Ideograph 200AD (CJK Unified Ideographs Extension B)
Double punctuation for vertical text
- U+203C ‼ double exclamation mark Po ON 1.1
- ref U+0021 ! exclamation mark (Basic Latin)
General punctuation
- U+203D ‽ interrobang Po ON 1.1
- ref U+0021 ! exclamation mark (Basic Latin)
- ref U+003F ? question mark (Basic Latin)
- ref U+2E18 ⸘ inverted interrobang (Supplemental Punctuation)
- U+203E ‾ overline Po ON 1.1
- html ‾
- sgml ‾
- aka spacing overscore
- U+203F ‿ undertie Pc ON 1.1
- aka Greek enotikon
- ref U+2323 ⌣ smile (Miscellaneous Technical)
- U+2040 ⁀ character tie Pc ON 1.1
- aka z notation sequence concatenation
- ref U+2322 ⌢ frown (Miscellaneous Technical)
- U+2041 ⁁ caret insertion point Po ON 1.1
- sgml ⁁
- * proofreader's mark: insert here
- ref U+22CC ⋌ right semidirect product (Mathematical Operators)
- U+2042 ⁂ asterism Po ON 1.1
- U+2043 ⁃ hyphen bullet Po ON 1.1
- sgml ⁃
- U+2044 ⁄ fraction slash Sm CS 1.1
- html ⁄
- sgml ⁄
- aka solidus (in typography)
- * for composing arbitrary fractions
- ref U+002F / solidus (Basic Latin)
- ref U+2215 ∕ division slash (Mathematical Operators)
- U+2045 ⁅ left square bracket with quill Ps ON 1.1
- U+2046 ⁆ right square bracket with quill Pe ON 1.1
Double punctuation for vertical text
- U+2047 ⁇ double question mark Po ON 3.2
- U+2048 ⁈ question exclamation mark Po ON 3.0
- U+2049 ⁉ exclamation question mark Po ON 3.0
General punctuation
- U+204A ⁊ tironian sign et Po ON 3.0
- * Irish Gaelic, Old English, ...
- ref U+0026 & ampersand (Basic Latin)
- U+204B ⁋ reversed pilcrow sign Po ON 3.0
- ref U+00B6 ¶ pilcrow sign (Latin-1 Supplement)
- U+204C ⁌ black leftwards bullet Po ON 3.0
- U+204D ⁍ black rightwards bullet Po ON 3.0
- U+204E ⁎ low asterisk Po ON 3.2
- ref U+002A * asterisk (Basic Latin)
- ref U+0359 ͙ combining asterisk below (Combining Diacritical Marks)
- U+204F ⁏ reversed semicolon Po ON 3.2
- sgml ⁏
- ref U+003B ; semicolon (Basic Latin)
- U+2050 ⁐ close up Po ON 3.2
- * editing mark
- U+2051 ⁑ two asterisks aligned vertically Po ON 3.2
- U+2052 ⁒ commercial minus sign Sm ON 3.2
- aka abz?glich (german), med avdrag av (swedish), piska (swedish, "whip")
- * a common glyph variant and fallback representation looks like ./.
- * may also be used as a dingbat to indicate correctness
- * used in Finno-Ugric Phonetic Alphabet to indicate a related borrowed form with different sound
- ref U+0025 % percent sign (Basic Latin)
- ref U+066A ٪ Arabic percent sign (Arabic)
- U+2053 ⁓ swung dash Po ON 4.0
- ref U+007E ~ tilde (Basic Latin)
- U+2054 ⁔ inverted undertie Pc ON 4.0
- U+2055 ⁕ flower punctuation mark Po ON 4.1
- aka phul, puspika
- * used as a punctuation mark with Syloti Nagri, Bengali and other Indic scripts
- ref U+274B ❋ heavy eight teardrop spoked propeller asterisk (Dingbats)
Archaic punctuation
- U+2056 ⁖ three dot punctuation Po ON 4.1
General punctuation
- U+2057 ⁗ quadruple prime Po ON 3.2
- sgml ⁗
Archaic punctuation
- U+2058 ⁘ four dot punctuation Po ON 4.1
- U+2059 ⁙ five dot punctuation Po ON 4.1
- aka Greek pentonkion
- aka quincunx
- ref U+2684 ⚄ die face 5 (Miscellaneous Symbols)
- U+205A ⁚ two dot punctuation Po ON 4.1
- * historically used to indicate the end of a sentence or change of speaker
- * extends from baseline to cap height
- ref U+FE30 ︰ presentation form for vertical two dot leader (CJK Compatibility Forms)
- ref U+1015B 𐅛 Greek acrophonic epidaurean two (Ancient Greek Numbers)
- U+205B ⁛ four dot mark Po ON 4.1
- * used by scribes in the margin as highlighter mark
- * this is centered on the line, but extends beyond top and bottom of the line
- U+205C ⁜ dotted cross Po ON 4.1
- * used by scribes in the margin as highlighter mark
- U+205D ⁝ tricolon Po ON 4.1
- aka epidaurean acrophonic symbol three
- ref U+22EE ⋮ vertical ellipsis (Mathematical Operators)
- ref U+2AF6 ⫶ triple colon operator (Supplemental Mathematical Operators)
- ref U+FE19 ︙ presentation form for vertical horizontal ellipsis (Vertical Forms)
- U+205E ⁞ vertical four dots Po ON 4.1
- * used in dictionaries to indicate legal but undesirable word break
- * glyph extends the whole height of the line
Space
- U+205F medium mathematical space Zs WS 3.2
- sgml  
- * abbreviated MMSP
- * four-eighteenths of an em
Format character
- U+2060 word joiner Cf BN 3.2
- sgml ⁠
- * commonly abbreviated WJ
- * a zero width non-breaking space (only)
- * intended for disambiguation of functions for byte order mark
- ref U+FEFF zero width no break space (Arabic Presentation Forms B)
Invisible operators
- U+2061 function application Cf BN 3.2
- sgml ⁡
- * contiguity operator indicating application of a function
- U+2062 invisible times Cf BN 3.2
- sgml ⁢
- * contiguity operator indicating multiplication
- U+2063 invisible separator Cf BN 3.2
- sgml ⁣ ⁣
- aka invisible comma
- * contiguity operator indicating that adjacent mathematical symbols form a list, e.g. when no visible comma is used between multiple indices
- U+2064 invisible plus Cf BN 5.1
- * contiguity operator indicating addition
Deprecated
Use of these characters is strongly discouraged.
- U+206A inhibit symmetric swapping Cf BN 1.1
- U+206B activate symmetric swapping Cf BN 1.1
- U+206C inhibit Arabic form shaping Cf BN 1.1
- U+206D activate Arabic form shaping Cf BN 1.1
- U+206E national digit shapes Cf BN 1.1
- U+206F nominal digit shapes Cf BN 1.1
http://unicode.org
Some prose may have been lifted verbatim from unicode.org,
as is permitted by their terms of use at http://www.unicode.org/copyright.html