The very first Unicode code block, Basic Latin encodes 7-bit ASCII, alternately known as ANSI X3.4, ISO/IEC 646:1991-IRV and the first 128 characters of ISO 8859-1. Only a small fraction of Latin languages can be written entirely with the characters in the Basic Latin code block, but these characters (a-z and A-Z) form the core of all Latin scripts.

The Latin script was derived from the Greek script. Today it is used to write a wide variety of languages all over the world. In the process of adapting it to other languages, numerous extensions have been devised. The most common is the addition of diacritical marks. Further, the creation of digraphs, inverse or reverse forms, and outright new characters have all been used to extend the Latin script.

Characters for more complex Latin scripts can be found in Latin-1 Supplement, Latin Extended A, Latin Extended B, IPA Extensions and Latin Extended Additional with additional help for more obscure cases in Letterlike Symbols Currency Symbols, Alphabetic Presentation Forms, Miscellaneous Symbols, Enclosed Alphanumerics, Halfwidth and Fullwidth Forms and Combining Diacritical marks.


Unicode's Basic Latin code block reserves the 128 code points from U+0000 to U+007F, of which all 128 are currently assigned.

Basic Latin --> Latin-1 Supplement

All the characters in this code block were added in Unicode 1.1

Number of characters in each General Category :

Letter, Uppercase       Lu : 26
Letter, Lowercase       Ll : 26
Number, Decimal Digit   Nd : 10
Punctuation, Connector  Pc :  1
Punctuation, Dash       Pd :  1
Punctuation, Open       Ps :  3
Punctuation, Close      Pe :  3
Punctuation, Other      Po : 15
Symbol, Math            Sm :  6
Symbol, Currency        Sc :  1
Symbol, Modifier        Sk :  2
Separator, Space        Zs :  1
Other, Control          Cc : 33

Number of characters in each Bidirectional Category :

Left To Right                 L : 52
European Number              EN : 10
European Number Separator    ES :  2
European Number Terminator   ET :  3
Common Number Separator      CS :  4
Boundary Neutral             BN : 24
Paragraph Separator           B :  5
Segment Separator             S :  3
Whitespace                   WS :  2
Other Neutral                ON : 23

The columns below should be interpreted as :

  1. The Unicode code for the character
  2. The character in question
  3. The Unicode name for the character
  4. The Unicode General Category for the character
  5. The Unicode Bidirectional Category for the character

If the characters below show up poorly, or not at all, see Unicode Support for possible solutions.

 

Basic Latin

     

U+0000   �   null Cc BN
U+0001      start of heading Cc BN
U+0002      start of text Cc BN
U+0003      end of text Cc BN
U+0004      end of transmission Cc BN
U+0005      enquiry Cc BN
U+0006      acknowledge Cc BN
U+0007      bell Cc BN
U+0008      backspace Cc BN
U+0009     character tabulation Cc S
sgml 	
aka horizontal tabulation (ht), tab
U+000A     line feed Cc B
sgml 

aka line feed (lf)
aka new line (nl), end of line (eol)
U+000B     line tabulation Cc S
aka vertical tabulation (vt)
U+000C     form feed Cc WS
aka form feed (ff)
U+000D     carriage return Cc B
aka carriage return (cr)
U+000E      shift out Cc BN
* known as LOCKING-SHIFT ONE in 8-bit environments
U+000F      shift in Cc BN
* known as LOCKING-SHIFT ZERO in 8-bit environments
U+0010      data link escape Cc BN
U+0011      device control one Cc BN
U+0012      device control two Cc BN
U+0013      device control three Cc BN
U+0014      device control four Cc BN
U+0015      negative acknowledge Cc BN
U+0016      synchronous idle Cc BN
U+0017      end of transmission block Cc BN
U+0018      cancel Cc BN
U+0019      end of medium Cc BN
U+001A      substitute Cc BN
ref U+FFFD   �   replacement character (Specials)
U+001B      escape Cc BN
U+001C      information separator four Cc B
aka file separator (fs)
U+001D      information separator three Cc B
aka group separator (gs)
U+001E      information separator two Cc B
aka record separator (rs)
U+001F      information separator one Cc S
aka unit separator (us)

     ASCII punctuation and symbols
Based on ISO/IEC 646.

U+0020     space Zs WS
* sometimes considered a control code
* other space characters: 2000-200A
ref U+00A0       no break space (Latin-1 Supplement)
ref U+200B   ​   zero width space (General Punctuation)
ref U+2060   ⁠   word joiner (General Punctuation)
ref U+3000       ideographic space (CJK Symbols and Punctuation)
ref U+FEFF      zero width no break space (Arabic Presentation Forms B)
U+0021   !   exclamation mark Po ON
sgml !
aka factorial
aka bang
ref U+00A1   ¡   inverted exclamation mark (Latin-1 Supplement)
ref U+01C3   ǃ   Latin letter retroflex click (Latin Extended B)
ref U+203C   ‼   double exclamation mark (General Punctuation)
ref U+203D   ‽   interrobang (General Punctuation)
ref U+2762   ❢   heavy exclamation mark ornament (Dingbats)
U+0022   "   quotation mark Po ON
html "
sgml "
* neutral (vertical), used as opening or closing quotation mark
* preferred characters in English for paired quotation marks are 201C & 201D
ref U+02BA   ʺ   modifier letter double prime (Spacing Modifier Letters)
ref U+030B   ̋   combining double acute accent (Combining Diacritical Marks)
ref U+030E   ̎   combining double vertical line above (Combining Diacritical Marks)
ref U+2033   ″   double prime (General Punctuation)
ref U+3003   〃   ditto mark (CJK Symbols and Punctuation)
U+0023   #   number sign Po ET
sgml #
aka pound sign, hash, crosshatch, octothorpe
ref U+2114   ℔   l b bar symbol (Letterlike Symbols)
ref U+266F   ♯   music sharp sign (Miscellaneous Symbols)
U+0024   $   dollar sign Sc ET
sgml $
aka milreis, escudo
* Dollar
* glyph may have one or two vertical bars
* other currency symbol characters: 20A0-20B5
ref U+00A4   ¤   currency sign (Latin-1 Supplement)
U+0025   %   percent sign Po ET
sgml %
ref U+066A   ٪   Arabic percent sign (Arabic)
ref U+2030   ‰   per mille sign (General Punctuation)
ref U+2031   ‱   per ten thousand sign (General Punctuation)
ref U+2052   ⁒   commercial minus sign (General Punctuation)
U+0026   &   ampersand Po ON
html &
sgml &
ref U+204A   ⁊   tironian sign et (General Punctuation)
ref U+214B   ⅋   turned ampersand (Letterlike Symbols)
U+0027   '   apostrophe Po ON
sgml '
aka apostrophe-quote (1.0)
aka APL quote
* neutral (vertical) glyph with mixed usage
* 2019 is preferred for apostrophe
* preferred characters in English for paired quotation marks are 2018 & 2019
ref U+02B9   ʹ   modifier letter prime (Spacing Modifier Letters)
ref U+02BC   ʼ   modifier letter apostrophe (Spacing Modifier Letters)
ref U+02C8   ˈ   modifier letter vertical line (Spacing Modifier Letters)
ref U+0301   ́   combining acute accent (Combining Diacritical Marks)
ref U+2032   ′   prime (General Punctuation)
ref U+A78C   ꞌ   Latin small letter saltillo (Latin Extended D)
U+0028   (   left parenthesis Ps ON
sgml (
aka opening parenthesis (1.0)
U+0029   )   right parenthesis Pe ON
sgml )
aka closing parenthesis (1.0)
* see discussion on semantics of paired bracketing characters
U+002A   *   asterisk Po ON
sgml *
aka star (on phone keypads)
ref U+066D   ٭   Arabic five pointed star (Arabic)
ref U+204E   ⁎   low asterisk (General Punctuation)
ref U+2217   ∗   asterisk operator (Mathematical Operators)
ref U+26B9   ⚹   sextile (Miscellaneous Symbols)
ref U+2731   ✱   heavy asterisk (Dingbats)
U+002B   +   plus sign Sm ES
sgml +
U+002C   ,   comma Po CS
sgml ,
aka decimal separator
ref U+060C   ،   Arabic comma (Arabic)
ref U+201A   ‚   single low 9 quotation mark (General Punctuation)
ref U+3001   、   ideographic comma (CJK Symbols and Punctuation)
U+002D   -   hyphen minus Pd ES
sgml ‐
aka hyphen or minus sign
* used for either hyphen or minus sign
ref U+2010   ‐   hyphen (General Punctuation)
ref U+2011   ‑   non breaking hyphen (General Punctuation)
ref U+2012   ‒   figure dash (General Punctuation)
ref U+2013   –   en dash (General Punctuation)
ref U+2212   −   minus sign (Mathematical Operators)
ref U+10191   𐆑   Roman uncia sign (Ancient Symbols)
U+002E   .   full stop Po CS
sgml .
aka period, dot, decimal point
* may be rendered as a raised decimal point in old style numbers
ref U+06D4   ۔   Arabic full stop (Arabic)
ref U+3002   。   ideographic full stop (CJK Symbols and Punctuation)
U+002F   /   solidus Po CS
sgml /
aka slash, virgule
ref U+01C0   ǀ   Latin letter dental click (Latin Extended B)
ref U+0338   ̸   combining long solidus overlay (Combining Diacritical Marks)
ref U+2044   ⁄   fraction slash (General Punctuation)
ref U+2215   ∕   division slash (Mathematical Operators)

     ASCII digits

U+0030   0   digit zero Nd EN
U+0031   1   digit one Nd EN
U+0032   2   digit two Nd EN
U+0033   3   digit three Nd EN
U+0034   4   digit four Nd EN
U+0035   5   digit five Nd EN
U+0036   6   digit six Nd EN
U+0037   7   digit seven Nd EN
U+0038   8   digit eight Nd EN
U+0039   9   digit nine Nd EN

     ASCII punctuation and symbols

U+003A   :   colon Po CS
sgml :
ref U+0589   ։   Armenian full stop (Armenian)
ref U+05C3   ׃   Hebrew punctuation sof pasuq (Hebrew)
ref U+2236   ∶   ratio (Mathematical Operators)
ref U+A789   ꞉   modifier letter colon (Latin Extended D)
U+003B   ;   semicolon Po ON
sgml ;
* this, and not 037E, is the preferred character for 'Greek question mark'
ref U+037E   ;   Greek question mark (Greek and Coptic)
ref U+061B   ؛   Arabic semicolon (Arabic)
ref U+204F   ⁏   reversed semicolon (General Punctuation)
U+003C   <   less than sign Sm ON
html &lt;
sgml &lt;
ref U+2039   ‹   single left pointing angle quotation mark (General Punctuation)
ref U+2329   〈   left pointing angle bracket (Miscellaneous Technical)
ref U+27E8   ⟨   mathematical left angle bracket (Miscellaneous Mathematical Symbols A)
ref U+3008   〈   left angle bracket (CJK Symbols and Punctuation)
U+003D   =   equals sign Sm ON
sgml &equals;
* other related characters: 2241-2263
ref U+2260   ≠   not equal to (Mathematical Operators)
ref U+2261   ≡   identical to (Mathematical Operators)
ref U+A78A   ꞊   modifier letter short equals sign (Latin Extended D)
ref U+10190   𐆐   Roman sextans sign (Ancient Symbols)
U+003E   >   greater than sign Sm ON
html &gt;
sgml &gt;
ref U+203A   ›   single right pointing angle quotation mark (General Punctuation)
ref U+232A   〉   right pointing angle bracket (Miscellaneous Technical)
ref U+27E9   ⟩   mathematical right angle bracket (Miscellaneous Mathematical Symbols A)
ref U+3009   〉   right angle bracket (CJK Symbols and Punctuation)
U+003F   ?   question mark Po ON
sgml &quest;
ref U+00BF   ¿   inverted question mark (Latin-1 Supplement)
ref U+037E   ;   Greek question mark (Greek and Coptic)
ref U+061F   ؟   Arabic question mark (Arabic)
ref U+203D   ‽   interrobang (General Punctuation)
ref U+2048   ⁈   question exclamation mark (General Punctuation)
ref U+2049   ⁉   exclamation question mark (General Punctuation)
U+0040   @   commercial at Po ON
sgml &commat;
aka at sign

     Uppercase Latin alphabet

U+0041   A   Latin capital letter A Lu L
U+0042   B   Latin capital letter B Lu L
ref U+212C   ℬ   script capital b (Letterlike Symbols)
U+0043   C   Latin capital letter C Lu L
ref U+2102   ℂ   double struck capital c (Letterlike Symbols)
ref U+212D   ℭ   black letter capital c (Letterlike Symbols)
U+0044   D   Latin capital letter D Lu L
U+0045   E   Latin capital letter E Lu L
ref U+2107   ℇ   euler constant (Letterlike Symbols)
ref U+2130   ℰ   script capital e (Letterlike Symbols)
U+0046   F   Latin capital letter F Lu L
ref U+2131   ℱ   script capital f (Letterlike Symbols)
ref U+2132   Ⅎ   turned capital f (Letterlike Symbols)
U+0047   G   Latin capital letter G Lu L
U+0048   H   Latin capital letter H Lu L
ref U+210B   ℋ   script capital h (Letterlike Symbols)
ref U+210C   ℌ   black letter capital h (Letterlike Symbols)
ref U+210D   ℍ   double struck capital h (Letterlike Symbols)
U+0049   I   Latin capital letter I Lu L
* Turkish and Azerbaijani use 0131 for lowercase
ref U+0130   İ   Latin capital letter I with dot above (Latin Extended A)
ref U+0406   І   Cyrillic capital letter byelorussian ukrainian i (Cyrillic)
ref U+04C0   Ӏ   Cyrillic letter palochka (Cyrillic)
ref U+2110   ℐ   script capital i (Letterlike Symbols)
ref U+2111   ℑ   black letter capital i (Letterlike Symbols)
ref U+2160   Ⅰ   Roman numeral one (Number Forms)
U+004A   J   Latin capital letter J Lu L
U+004B   K   Latin capital letter K Lu L
ref U+212A   K   kelvin sign (Letterlike Symbols)
U+004C   L   Latin capital letter L Lu L
ref U+2112   ℒ   script capital l (Letterlike Symbols)
U+004D   M   Latin capital letter M Lu L
ref U+2133   ℳ   script capital m (Letterlike Symbols)
U+004E   N   Latin capital letter N Lu L
ref U+2115   ℕ   double struck capital n (Letterlike Symbols)
U+004F   O   Latin capital letter O Lu L
U+0050   P   Latin capital letter P Lu L
ref U+2119   ℙ   double struck capital p (Letterlike Symbols)
U+0051   Q   Latin capital letter Q Lu L
ref U+211A   ℚ   double struck capital q (Letterlike Symbols)
U+0052   R   Latin capital letter R Lu L
ref U+211B   ℛ   script capital r (Letterlike Symbols)
ref U+211C   ℜ   black letter capital r (Letterlike Symbols)
ref U+211D   ℝ   double struck capital r (Letterlike Symbols)
U+0053   S   Latin capital letter S Lu L
U+0054   T   Latin capital letter T Lu L
U+0055   U   Latin capital letter U Lu L
U+0056   V   Latin capital letter V Lu L
U+0057   W   Latin capital letter W Lu L
U+0058   X   Latin capital letter X Lu L
U+0059   Y   Latin capital letter Y Lu L
U+005A   Z   Latin capital letter Z Lu L
ref U+2124   ℤ   double struck capital z (Letterlike Symbols)
ref U+2128   ℨ   black letter capital z (Letterlike Symbols)

     ASCII punctuation and symbols

U+005B   [   left square bracket Ps ON
sgml &lbrack; &lsqb;
aka opening square bracket (1.0)
* other bracket characters: 27E6-27EB, 2983-2998, 3008-301B
U+005C   \   reverse solidus Po ON
sgml &bsol; &sbsol;
aka backslash
ref U+20E5   ⃥   combining reverse solidus overlay (Combining Diacritical Marks for Symbols)
ref U+2216   ∖   set minus (Mathematical Operators)
U+005D   ]   right square bracket Pe ON
sgml &rbrack; &rsqb;
aka closing square bracket (1.0)
U+005E   ^   circumflex accent Sk ON
sgml &Hat; &circ;
* this is a spacing character
ref U+02C4   ˄   modifier letter up arrowhead (Spacing Modifier Letters)
ref U+02C6   ˆ   modifier letter circumflex accent (Spacing Modifier Letters)
ref U+0302   ̂   combining circumflex accent (Combining Diacritical Marks)
ref U+2038   ‸   caret (General Punctuation)
ref U+2303   ⌃   up arrowhead (Miscellaneous Technical)
U+005F   _   low line Pc ON
sgml &lowbar;
aka spacing underscore (1.0)
* this is a spacing character
ref U+02CD   ˍ   modifier letter low macron (Spacing Modifier Letters)
ref U+0331   ̱   combining macron below (Combining Diacritical Marks)
ref U+0332   ̲   combining low line (Combining Diacritical Marks)
ref U+2017   ‗   double low line (General Punctuation)
U+0060   `   grave accent Sk ON
sgml &grave;
* this is a spacing character
ref U+02CB   ˋ   modifier letter grave accent (Spacing Modifier Letters)
ref U+0300   ̀   combining grave accent (Combining Diacritical Marks)
ref U+2035   ‵   reversed prime (General Punctuation)

     Lowercase Latin alphabet

U+0061   a   Latin small letter A Ll L
U+0062   b   Latin small letter B Ll L
U+0063   c   Latin small letter C Ll L
U+0064   d   Latin small letter D Ll L
U+0065   e   Latin small letter E Ll L
ref U+212E   ℮   estimated symbol (Letterlike Symbols)
ref U+212F   ℯ   script small e (Letterlike Symbols)
U+0066   f   Latin small letter F Ll L
U+0067   g   Latin small letter G Ll L
ref U+0261   ɡ   Latin small letter script g (IPA Extensions)
ref U+210A   ℊ   script small g (Letterlike Symbols)
U+0068   h   Latin small letter H Ll L
ref U+04BB   һ   Cyrillic small letter shha (Cyrillic)
ref U+210E   ℎ   planck constant (Letterlike Symbols)
U+0069   i   Latin small letter I Ll L
* Turkish and Azerbaijani use 0130 for uppercase
ref U+0131   ı   Latin small letter dotless i (Latin Extended A)
ref U+1D6A4   𝚤   mathematical italic small dotless i (Mathematical Alphanumeric Symbols)
U+006A   j   Latin small letter J Ll L
sgml &jmath;
ref U+0237   ȷ   Latin small letter dotless j (Latin Extended B)
ref U+1D6A5   𝚥   mathematical italic small dotless j (Mathematical Alphanumeric Symbols)
U+006B   k   Latin small letter K Ll L
U+006C   l   Latin small letter L Ll L
ref U+2113   ℓ   script small l (Letterlike Symbols)
ref U+1D4C1   𝓁   mathematical script small l (Mathematical Alphanumeric Symbols)
U+006D   m   Latin small letter M Ll L
U+006E   n   Latin small letter N Ll L
ref U+207F   ⁿ   superscript Latin small letter N (Superscripts and Subscripts)
U+006F   o   Latin small letter O Ll L
ref U+2134   ℴ   script small o (Letterlike Symbols)
U+0070   p   Latin small letter P Ll L
U+0071   q   Latin small letter Q Ll L
U+0072   r   Latin small letter R Ll L
U+0073   s   Latin small letter S Ll L
U+0074   t   Latin small letter T Ll L
U+0075   u   Latin small letter U Ll L
U+0076   v   Latin small letter V Ll L
U+0077   w   Latin small letter W Ll L
U+0078   x   Latin small letter X Ll L
U+0079   y   Latin small letter Y Ll L
U+007A   z   Latin small letter Z Ll L
ref U+01B6   ƶ   Latin small letter Z with stroke (Latin Extended B)

     ASCII punctuation and symbols

U+007B   {   left curly bracket Ps ON
sgml &lbrace; &lcub;
aka opening curly bracket (1.0)
aka left brace
U+007C   |   vertical line Sm ON
sgml &verbar; &vert;
aka vertical bar
* used in pairs to indicate absolute value
ref U+01C0   ǀ   Latin letter dental click (Latin Extended B)
ref U+05C0   ׀   Hebrew punctuation paseq (Hebrew)
ref U+2223   ∣   divides (Mathematical Operators)
ref U+2758   ❘   light vertical bar (Dingbats)
U+007D   }   right curly bracket Pe ON
sgml &rbrace; &rcub;
aka closing curly bracket (1.0)
aka right brace
U+007E   ~   tilde Sm ON
* this is a spacing character
ref U+02DC   ˜   small tilde (Spacing Modifier Letters)
ref U+0303   ̃   combining tilde (Combining Diacritical Marks)
ref U+2053   ⁓   swung dash (General Punctuation)
ref U+223C   ∼   tilde operator (Mathematical Operators)
ref U+FF5E   ~   fullwidth tilde (Halfwidth and Fullwidth Forms)

     Control character

U+007F      delete Cc BN

http://unicode.org
Some prose may have been lifted verbatim from unicode.org,
as is permitted by their terms of use at http://www.unicode.org/copyright.html

Log in or register to write something here or to contact authors.