Return to space (idea)

The Unicode standard encodes eighteen different space characters, differing in width and layout behavior.

The most commonly used space character is U+0020 space. Another big favorite is its non-breaking counterpart U+00A0 no break space. These two characters have the same width, but behave differently for line breaking. no break space behaves like a numeric separator for the purposes of bidirectional layout (see Bidirectional Behavior). In ideographic text, U+3000 ideographic space is commonly used because its width matches that of the ideographs (i.e. it is a fullwidth character).

The main difference among other space characters is their width.
U+2000 to U+2006 are standard quad widths used in typography.
U+2007 figure space has the same width as a digit.
U+2008 punctuation space has the same width as a period.
The fixed-width space characters U+2000 to U+200A are derived from conventional (hot lead) typography. Algorithmic kerning and justification in computerized typography do not use these characters. When they are used, they typically do not expand during justification, except for U+2009 thin space which sometimes does.

Space character with special behavior in word or line breaking are described in Line and Word Breaking and Layout Controls.

The use of U+FEFF zero width no break space as a spacing character has been deprecated in Unicode 3.2. The character U+2060 word joiner should be used instead, allowing U+FEFF to be used exclusively for its most common role as a Byte Order Marker (BOM).

Note that the list below contains every Unicode character with the General Category Zs or Spacing Separator.


As of version 4.0, the Unicode standard has 26 semantically distinct varients of the space character. They are enumerated below, separated by code block

Number of characters added in each version of the Unicode standard :
Unicode 1.1 : 22
Unicode 3.0 : 3
Unicode 3.2 : 1

Number of characters in each General Category :

Separator, Space      Zs : 18
Separator, Line       Zl :  1
Separator, Paragraph  Zp :  1
Other, Control        Cc :  6

Number of characters in each Bidirectional Category :

Common Number Separator   CS :  1
Paragraph Separator        B :  4
Segment Separator          S :  2
Whitespace                WS : 19

The columns below should be interpreted as :

  1. The Unicode code for the character
  2. The character in question
  3. The Unicode name for the character
  4. The Unicode General Category for the character
  5. The Unicode Bidirectional Category for the character
  6. The Unicode version when this character was added

If the characters below show up poorly, or not at all, see Unicode Support for possible solutions.

 

Basic Latin

     

U+0009 ( ) character tabulation Cc S 1.1
sgml 	
aka horizontal tabulation (ht), tab
U+000A ( ) line feed Cc B 1.1
sgml 

aka line feed (lf)
aka new line (nl), end of line (eol)
U+000B ( ) line tabulation Cc S 1.1
U+000C ( ) form feed Cc WS 1.1
aka form feed (ff)
U+000D ( ) carriage return Cc B 1.1
aka carriage return (cr)

     ASCII punctuation and symbols

U+0020 ( ) space Zs WS 1.1
* sometimes considered a control code
* other space characters: 2000-200A
ref U+00A0       no break space (Latin-1 Supplement)
ref U+200B   ​   zero width space (General Punctuation)
ref U+2060   ⁠   word joiner (General Punctuation)
ref U+3000       ideographic space (CJK Symbols and Punctuation)
ref U+FEFF      zero width no break space (Arabic Presentation Forms B)

 

Latin-1 Supplement

     C1 controls

U+0085 () next line Cc B 1.1
aka next line (nel)

     Latin-1 punctuation and symbols

U+00A0 ( ) no break space Zs CS 1.1
html  
sgml  
aka nbsp
ref U+0020     space (Basic Latin)
ref U+2007       figure space (General Punctuation)
ref U+202F       narrow no break space (General Punctuation)
ref U+2060   ⁠   word joiner (General Punctuation)
ref U+FEFF      zero width no break space (Arabic Presentation Forms B)

 

Ogham

     Punctuation

U+1680 () Ogham space mark Zs WS 3.0
* glyph is blank in "stemless" style fonts

 

Mongolian

     Format controls

U+180E () Mongolian vowel separator Zs WS 3.0
aka mvs

 

General Punctuation

     Spaces

U+2000 ( ) en quad Zs WS 1.1
U+2001 () em quad Zs WS 1.1
aka mutton quad
U+2002 () en space Zs WS 1.1
html  
sgml  
aka nut
* half an em
U+2003 () em space Zs WS 1.1
html  
sgml  
aka mutton
* nominally, a space equal to the type size in points
* may scale by the condensation factor of a font
U+2004 () three per em space Zs WS 1.1
sgml  
aka thick space
U+2005 () four per em space Zs WS 1.1
sgml  
aka mid space
U+2006 () six per em space Zs WS 1.1
* in computer typography sometimes equated to thin space
U+2007 () figure space Zs WS 1.1
sgml  
* space equal to tabular width of a font
* this is equivalent to the digit width of fonts with fixed-width digits
U+2008 () punctuation space Zs WS 1.1
sgml  
* space equal to narrow punctuation of a font
U+2009 () thin space Zs WS 1.1
html  
sgml    
* a fifth of an em (or sometimes a sixth)
U+200A () hair space Zs WS 1.1
sgml  
* thinner than a thin space
* in traditional typography, the thinnest space available

     Formatting characters

U+2028 () line separator Zl WS 1.1
* may be used to represent this semantic unambiguously
U+2029 () paragraph separator Zp B 1.1
* may be used to represent this semantic unambiguously
U+202F () narrow no break space Zs WS 3.0
aka nnbsp
ref U+00A0       no break space (Latin-1 Supplement)

     Space

U+205F () medium mathematical space Zs WS 3.2
sgml  
aka mmsp
* four-eighteenths of an em

 

CJK Symbols and Punctuation

     CJK symbols and punctuation

U+3000 ( ) ideographic space Zs WS 1.1
ref U+0020     space (Basic Latin)

http://unicode.org
Existing:


Non-Existing: