Return to space (idea)

The [Unicode] standard encodes [eighteen] different space characters, differing in width and [layout] [behavior].

The most commonly used space [character] is U+0020 [space]. Another big [favorite] is its [non-breaking] counterpart U+00A0 [no break space]. These two characters have the same [width], but behave differently for line breaking. [no break space] behaves like a [numeric separator] for the purposes of [bidirectional] layout (see [Bidirectional Behavior]). In [ideographic] text, U+3000 [ideographic space] is commonly used because its width matches that of the [ideograph|ideographs] (i.e. it is a [fullwidth] character).

The main difference among other space characters is their width.
U+2000 to U+2006 are standard [quad] widths used in [typography].
U+2007 [figure space] has the same width as a digit.
U+2008 [punctuation space] has the same width as a period.
The fixed-width space characters U+2000 to U+200A are derived from conventional (hot lead) typography. Algorithmic kerning and justification in computerized typography do not use these characters. When they are used, they typically do not expand during justification, except for U+2009 [thin space] which sometimes does.

Space character with special behavior in word or line breaking are described in [Line and Word Breaking] and [Layout Controls].

The use of U+FEFF [zero width no break space] as a spacing character has been [deprecated] in [Unicode 3.2]. The character U+2060 [word joiner] should be used instead, allowing U+FEFF to be used exclusively for its most common role as a [Byte Order Marker] ([BOM]).

Note that the list below contains every Unicode character with the [General Category] [Zs] or [Spacing Separator].


As of version [Unicode 4.0|4.0], the [Unicode] standard has [26] semantically distinct varients of the space character. They are enumerated below, separated by [code block]

Number of characters added in each version of the Unicode standard :
[Unicode 1.1] : 22
[Unicode 3.0] : 3
[Unicode 3.2] : 1

Number of characters in each [General Category] :

Separator, Space      Zs : 18
Separator, Line       Zl :  1
Separator, Paragraph  Zp :  1
Other, Control        Cc :  6

Number of characters in each [Bidirectional Category] :

Common Number Separator   CS :  1
Paragraph Separator        B :  4
Segment Separator          S :  2
Whitespace                WS : 19

The columns below should be interpreted as :

  1. The [Unicode] code for the character
  2. The character in question
  3. The Unicode name for the character
  4. The Unicode [General Category] for the character
  5. The Unicode [Bidirectional Category] for the character
  6. The Unicode version when this character was added

If the characters below show up poorly, or not at all, see [Unicode Support] for possible solutions.

 

[Basic Latin]

     

U+0009 ( ) [character tabulation] Cc S 1.1
sgml 	
aka horizontal tabulation (ht), tab
U+000A ( ) [line feed] Cc B 1.1
sgml 

aka line feed (lf)
aka new line (nl), end of line (eol)
U+000B ( ) [line tabulation] Cc S 1.1
U+000C ( ) [form feed] Cc WS 1.1
aka form feed (ff)
U+000D ( ) [carriage return] Cc B 1.1
aka carriage return (cr)

     ASCII punctuation and symbols

U+0020 ( ) [space] Zs WS 1.1
* sometimes considered a control code
* other space characters: 2000-200A
ref U+00A0       [no break space] ([Latin-1 Supplement])
ref U+200B   ​   [zero width space] ([General Punctuation])
ref U+2060   ⁠   [word joiner] ([General Punctuation])
ref U+3000       [ideographic space] ([CJK Symbols and Punctuation])
ref U+FEFF      [zero width no break space] ([Arabic Presentation Forms B])

 

[Latin-1 Supplement]

     C1 controls

U+0085 () [next line] Cc B 1.1
aka next line (nel)

     Latin-1 punctuation and symbols

U+00A0 ( ) [no break space] Zs CS 1.1
html  
sgml  
aka nbsp
ref U+0020     [space] ([Basic Latin])
ref U+2007       [figure space] ([General Punctuation])
ref U+202F       [narrow no break space] ([General Punctuation])
ref U+2060   ⁠   [word joiner] ([General Punctuation])
ref U+FEFF      [zero width no break space] ([Arabic Presentation Forms B])

 

[Ogham]

     Punctuation

U+1680 () [Ogham space mark] Zs WS 3.0
* glyph is blank in "stemless" style fonts

 

[Mongolian]

     Format controls

U+180E () [Mongolian vowel separator] Zs WS 3.0
aka mvs

 

[General Punctuation]

     Spaces

U+2000 ( ) [en quad] Zs WS 1.1
U+2001 () [em quad] Zs WS 1.1
aka mutton quad
U+2002 () [en space] Zs WS 1.1
html  
sgml  
aka nut
* half an em
U+2003 () [em space] Zs WS 1.1
html  
sgml  
aka mutton
* nominally, a space equal to the type size in points
* may scale by the condensation factor of a font
U+2004 () [three per em space] Zs WS 1.1
sgml  
aka thick space
U+2005 () [four per em space] Zs WS 1.1
sgml  
aka mid space
U+2006 () [six per em space] Zs WS 1.1
* in computer typography sometimes equated to thin space
U+2007 () [figure space] Zs WS 1.1
sgml  
* space equal to tabular width of a font
* this is equivalent to the digit width of fonts with fixed-width digits
U+2008 () [punctuation space] Zs WS 1.1
sgml  
* space equal to narrow punctuation of a font
U+2009 () [thin space] Zs WS 1.1
html  
sgml    
* a fifth of an em (or sometimes a sixth)
U+200A () [hair space] Zs WS 1.1
sgml  
* thinner than a thin space
* in traditional typography, the thinnest space available

     Formatting characters

U+2028 () [line separator] Zl WS 1.1
* may be used to represent this semantic unambiguously
U+2029 () [paragraph separator] Zp B 1.1
* may be used to represent this semantic unambiguously
U+202F () [narrow no break space] Zs WS 3.0
aka nnbsp
ref U+00A0       [no break space] ([Latin-1 Supplement])

     Space

U+205F () [medium mathematical space] Zs WS 3.2
sgml  
aka mmsp
* four-eighteenths of an em

 

[CJK Symbols and Punctuation]

     CJK symbols and punctuation

U+3000 ( ) [ideographic space] Zs WS 1.1
ref U+0020     [space] ([Basic Latin])

http://unicode.org
Existing:


Non-Existing: